Digital Event Horizon
OpenAI's Whisper tool, touted as having achieved "human-level robustness" in audio transcription accuracy, has been found to regularly fabricate text, leading to significant problems in high-risk domains such as healthcare. Despite warnings against its use for "high-risk domains," over 30,000 medical workers are using the tool to transcribe patient visits, raising concerns about patient safety and accuracy.
The AI tool's propensity to confabulate is due to its training data, which includes thousands of hours of captioned audio scraped from YouTube videos. This has led to issues with overfitting, where the model produces what its neural network predicts is the most likely output, even if it is incorrect. The incident highlights the need for regulation and certification of AI tools used in high-risk domains such as healthcare.
OpenAI's Whisper AI tool has been found to be error-prone and prone to fabricating text, a phenomenon known as confabulation or hallucination. The tool's predictions may include texts that are not actually spoken in the audio input due to weakly supervised training and large-scale noisy data. Whisper has been used in high-risk domains such as healthcare settings, despite warnings against its use, and has led to problems with medical workers transcribing patient visits. The tool may erase original audio recordings "for data safety reasons," which could lead to problems since doctors cannot verify accuracy against the source material. Whisper has been found to add non-existent violent content and racial commentary to neutral speech in a small percentage of cases. The root cause of Whisper's confabulation lies in its training data, which includes thousands of hours of captioned audio scraped from YouTube videos.
The medical and business worlds are increasingly relying on Artificial Intelligence (AI) transcription tools to convert spoken words into written text. OpenAI's Whisper tool, released in 2022, is one such example, touted as having achieved "human-level robustness" in audio transcription accuracy. However, an investigation by the Associated Press has revealed that these AI tools are not only error-prone but also prone to fabricating text, a phenomenon known as confabulation or hallucination.
OpenAI's Whisper tool, designed to predict the next most likely token (chunk of data) after a sequence of tokens provided by a user, is built on top of Transformer-based technology. In its original model card, OpenAI researchers warned that the tool's predictions may include texts that are not actually spoken in the audio input due to weakly supervised training and large-scale noisy data.
Recent studies have shown that Whisper regularly invents text, often leading to problems in high-risk domains such as healthcare settings. According to an Associated Press report, over 30,000 medical workers now use Whisper-based tools to transcribe patient visits, despite warnings against its use for "high-risk domains." The Mankato Clinic in Minnesota and Children's Hospital Los Angeles are among the 40 health systems using a Whisper-powered AI copilot service from medical tech company Nabla.
One of the most concerning issues with Whisper is that it may erase original audio recordings "for data safety reasons," which could lead to problems since doctors cannot verify accuracy against the source material. This is particularly problematic for deaf patients, who would have no way to know if medical transcript audio is accurate or not.
The potential problems with Whisper extend beyond healthcare. Researchers from Cornell University and the University of Virginia studied thousands of audio samples and found that Whisper added non-existent violent content and racial commentary to neutral speech in 1 percent of cases and 38 percent of those included explicit harms such as perpetuating violence, making up inaccurate associations, or implying false authority.
In one notable case, when a speaker described "two other girls and one lady," Whisper transcribed it to, "were Black." Another example showed the audio saying, "He, the boy, was going to, I’m not sure exactly, take the umbrella" being transcribed as, "He took a big piece of a cross, a teeny, small piece ... I’m sure he didn’t have a terror knife so he killed a number of people."
The root cause of Whisper's confabulation lies in its training data. The model is trained on 680,000 hours of multilingual and multitask supervised data collected from the web, including thousands of hours of captioned audio scraped from YouTube videos. This has led to issues with overfitting, where the AI model produces what its neural network predicts is the most likely output, even if it is incorrect.
OpenAI's spokesperson acknowledged the company's findings and stated that it actively studies how to reduce fabrications and incorporates feedback in updates to the model. However, the incident highlights the need for regulation and certification of AI tools used in high-risk domains such as healthcare.
The medical industry has already seen a trend of using seemingly "good enough" AI tools despite potential risks. Epic Systems has been using GPT-4 for medical records, and UnitedHealth has used a flawed AI model for insurance decisions. It is likely that people are already suffering negative outcomes due to AI mistakes, and fixing them will likely involve some sort of regulation and certification of AI tools used in the medical field.
In conclusion, hospitals' reliance on AI transcription tools such as Whisper poses significant risks to accuracy and patient safety. The use of these tools without proper regulation and oversight may lead to devastating consequences, particularly in high-risk domains like healthcare. It is essential for medical professionals and policymakers to be aware of these risks and take steps to mitigate them.
Related Information:
https://arstechnica.com/ai/2024/10/hospitals-adopt-error-prone-ai-transcription-tools-despite-warnings/
https://www.msn.com/en-us/news/technology/hospitals-ai-transcription-tool-invents-things-no-one-ever-said-researchers-say/ar-AA1t50nG
Published: Mon Oct 28 16:20:33 2024 by llama3.2 3B Q4_K_M