Digital Event Horizon

The Dark Side of Whisper: How an AI Transcription Tool is Confabulating Medical Records

Despite warnings from OpenAI itself, thousands of healthcare workers are relying on an AI-powered transcription tool to transcribe patient visits, with alarming results. Researchers have found that the tool regularly produces fabricated text, posing significant risks in healthcare settings where precision is paramount.

Thousands of medical professionals are using OpenAI's Whisper transcription tool to transcribe patient visits, despite warnings from the company.

The tool is regularly producing fabricated text, known as "confabulation" or "hallucination", in up to 80% of public meeting transcripts and almost all test transcriptions.

False text can take many forms, including inaccurate medical terminology, explicit racist or violent language.

The problems with Whisper extend beyond healthcare, with researchers finding similar issues in neutral speech.

Many healthcare systems are still relying on Whisper-powered AI copilot services, despite concerns about accuracy and reliability.

Nabla's Whisper-powered service erases original audio recordings "for data safety reasons", raising concerns about verification of accuracy.

In a stunning revelation, a recent Associated Press investigation has uncovered a disturbing trend in the widespread adoption of OpenAI's Whisper transcription tool among medical professionals. Despite warnings from the company itself, thousands of healthcare workers are relying on this AI-powered tool to transcribe patient visits, with alarming results.

The AP report reveals that Whisper, which was touted as approaching "human level robustness" in audio transcription accuracy upon its release in 2022, is regularly producing fabricated text in medical and business settings. This phenomenon, known as "confabulation" or "hallucination" in the AI field, poses significant risks in healthcare settings where precision is paramount.

According to the AP, researchers have found that Whisper creates false text in up to 80 percent of public meeting transcripts examined, while another developer claimed to have found invented content in almost all of his 26,000 test transcriptions. The fabrications can take many forms, from inaccurate medical terminology to explicit racist or violent language.

One disturbing example cited by the AP is a case where Whisper added fictional text specifying that two girls and one lady were Black when they did not mention anything about race in their conversation. In another instance, the audio said "He, the boy, was going to, I'm not sure exactly, take the umbrella," while Whisper transcribed it as "He took a big piece of a cross, a teeny, small piece … I'm sure he didn't have a terror knife so he killed a number of people."

The potential problems with Whisper extend beyond healthcare. Researchers from Cornell University and the University of Virginia studied thousands of audio samples and found that Whisper added nonexistent violent content and racial commentary to neutral speech. They discovered that 1 percent of samples included "entire hallucinated phrases or sentences which did not exist in any form in the underlying audio," while 38 percent of those included "explicit harms such as perpetuating violence, making up inaccurate associations, or implying false authority."

Despite these alarming findings, many healthcare systems are still relying on Whisper-powered AI copilot services to transcribe patient visits. According to the AP report, over 30,000 medical workers now use Whisper-based tools, with some health systems even acknowledging that the tool is fine-tuned on medical terminology.

Nabla, a medical tech company that offers a Whisper-powered AI copilot service, acknowledges that Whisper can confabulate but also erases original audio recordings "for data safety reasons." This raises concerns about the ability of doctors to verify accuracy against the source material. Moreover, deaf patients may be highly impacted by mistaken transcripts since they would have no way to know if medical transcript audio is accurate or not.

OpenAI has been actively studying how to reduce fabrications and incorporates feedback in updates to the model. The company appreciates the researchers' findings and recognizes the need for improvement.

As healthcare professionals, we must demand more from our AI tools. We cannot afford to rely on technology that produces inaccurate or misleading information, particularly when it comes to patient care. It is imperative that regulators and industry leaders take notice of these disturbing trends and work towards ensuring that AI tools are designed with precision and accuracy in mind.

In conclusion, the widespread adoption of Whisper transcription tool in healthcare settings has raised significant concerns about the reliability and accuracy of medical records. As we move forward, it is crucial that we prioritize precision over convenience and demand more from our AI tools.

Related Information:

https://www.wired.com/story/hospitals-ai-transcription-tools-hallucination/

https://ediscoverytoday.com/2024/10/29/openais-whisper-hallucinations-speak-volumes-to-medical-professionals-artificial-intelligence-trends/

https://www.zdnet.com/article/openais-ai-transcription-tool-hallucinates-excessively-heres-a-better-alternative/

Published: Wed Oct 30 08:48:19 2024 by llama3.2 3B Q4_K_M

Today's AI/ML headlines are brought to you by ThreatPerspective

The Dark Side of Whisper: How an AI Transcription Tool is Confabulating Medical Records