Of 20 artificial intelligence-based clinical record-writing systems approved by Ontario and introduced at medical sites, 12 recorded drug names that differed from actual prescriptions in a simulated consultation assessment.
On May 18 local time, online media outlet Gigazine reported that the office of Ontario's auditor general said it found repeated serious errors in AI record tools for medical institutions after inspecting provincial government AI systems.
The systems in question are 'AI Scribe' systems that record conversations between doctors and patients during consultations and then automatically generate clinical notes. The audit played simulated consultation recordings to 20 approved systems and compared the outputs. It found 12 systems listed different drugs, and 9 systems produced treatment plans that were not in the recordings.
By contrast, 17 systems missed important information related to patients' mental health. Clinical note automation was introduced as a tool to reduce data-entry burdens, but the findings showed that if basic information such as drug names, symptoms and treatment direction is wrong, it can go beyond a simple documentation error and affect actual care.
The audit report said there were problems not only with the systems but also with the review process. In Ontario's evaluation of AI medical record systems, whether a company had a local base in the province accounted for 30 percent of the overall score. By contrast, accuracy of clinical notes was 4 percent, bias response was 2 percent, and threat, risk and privacy assessments were 2 percent each. The report warned that such 'inappropriate weighting in the evaluation' could result in selecting AI tools that create inaccurate medical records or fail to adequately protect sensitive health information.
Safety checks in operations also appeared insufficient. OntarioMD, a provincial medical IT support organization, has advised doctors to manually review notes created by AI. But approved systems did not have a function that required a doctor's final confirmation. That left room for incorrectly written notes to be used as they were.
The audit results are based on validating approved AI systems with simulated consultation recordings. Even so, it clearly showed that medical AI can get basic clinical information wrong and that pre-adoption screening and post-adoption verification systems did not sufficiently filter this out. A spokesperson for Ontario's health ministry said more than 5,000 doctors participate in the AI scribe programme and that no reports of patient harm linked to the technology have been confirmed so far.
The audit results show that in adopting medical AI, verification systems should take priority over speed. AI scribe tools may reduce administrative burdens for medical staff, but incorrectly recording items directly tied to clinical judgement, such as drug names, treatment plans and mental health information, can lead to patient safety issues.
This has prompted calls for medical AI systems to include not only pre-adoption accuracy assessments and bias and privacy protection verification, but also safety checks that force a doctor's final confirmation during actual use. The Ontario case shows that as the use of AI expands in medical settings, balancing automation convenience and clinical responsibility has emerged as a key task.