According to an experiment by South African researchers, AI models outperformed doctors’ ward diagnoses in a large public hospital.
The result raises the prospect of developing reliable AI tools to help reduce the workloads of overstretched healthcare staff.
Unlike free AI chatbots that have been found to provide unreliable health information, there is growing evidence that commercially available AI systems of the kind tested by the researchers are much more dependable.
“There is really exciting potential to alleviate the huge pressures on low- and middle-income country healthcare workers and the systems they work in,” said Bruce Bassett, distinguished professor of AI at Wits University and lead author of a study describing the work published last month on the preprint server arXiv.
The paper has not been peer reviewed but is broadly in line with a US study published last month in the peer-reviewed journal Science that found AI systems did better than doctors at emergency room diagnosis and triage.
For the South African study, researchers asked pairs of expert doctors to scrutinise 300 sets of in-patient files from Chris Hani Baragwanath Academic Hospital and determine a diagnosis based on their analysis of the written records.
All the cases were complex, with files containing the results of multiple diagnostic tests, including images from X-rays and MRIs, laboratory tests and vital sign measurements such as blood pressure and temperature.
The experts’ findings were used as a benchmark against which to score the diagnoses reached by hospital staff and 10 different AI systems. These included Anthropic’s Claude 4.1 Opus and 4.5 Sonnet; Google’s Gemini 3 Pro, 2.5 Pro and 2.5 Flash; OpenAI’s GPT-5.1, o3 & o4-mini, and GPT-5.1 mini; and xAI’s Grok 4.1 Fast Reasoning.
OpenAI’s GPT-5.1 scored best among the models, while Claude 4.1 scored worst, but all of the models consistently outdid the ward diagnoses made by hospital staff.
The researchers anonymised the patient data to ensure the study was compliant with the Protection of Personal Information Act, said Bassett.
There was a 15% variation in performance between the cheapest and most expensive AI models, which ranged in cost from 1 US cent to 50 US cents. These models were significantly cheaper than the pairs of expert doctors, who cost $40 a case.
The AI models were cheap even when compared to the cost of physicians in countries where public sector salaries are far lower than South Africa. In Nigeria, for example, where physician salaries are about $1,200 a year, the average cost of a case would be $2, the researchers said.
Given the rapid fall in AI costs in recent years, high-quality diagnoses are likely to become even more affordable, they said.
“We are entering the era of cheap, good-quality (AI) diagnosis,” Bassett said.







Would you like to comment on this article?
Sign up (it's quick and free) or sign in now.
Please read our Comment Policy before commenting.