Study Reveals Alarming Safety Gaps in ChatGPT Health Triage Capabilities

Published on 27 February, 2026

A recent safety evaluation published in the journal Nature Medicine has raised serious concerns regarding the reliability of OpenAI’s ChatGPT Health feature. The independent study found that the artificial intelligence platform frequently fails to recommend urgent medical care when necessary and often misses signs of suicidal ideation, errors that experts warn could lead to preventable harm or death.


High Failure Rate in Emergency Triage


Launched to limited audiences in January, ChatGPT Health allows users to integrate medical records to generate health advice. However, researchers led by Dr. Ashwin Ramaswamy discovered that the model under-triaged more than half of the cases presented to it. In scenarios where hospitalization was imperative, the AI advised users to stay home or book a routine appointment 51.6% of the time.


The research team constructed 60 realistic patient scenarios validated by medical professionals, ranging from mild illnesses to severe emergencies. While the system performed adequately with textbook emergencies like strokes, it struggled with complex situations. In one instance involving an asthma patient showing early signs of respiratory failure, the platform recommended waiting rather than seeking immediate help.


Alex Ruani, a doctoral researcher at University College London, described the findings as "unbelievably dangerous," noting that a false sense of security could cost patients their lives. The study also revealed that the AI was nearly 12 times more likely to downplay symptoms if contextual cues suggested the situation was not serious.


Inconsistent Safety Guardrails


A particularly alarming discovery involved the platform's response to mental health crises. When researchers tested a scenario involving a patient expressing suicidal thoughts, the crisis intervention banner appeared reliably—until normal lab results were added to the patient's file. With the inclusion of standard test data, the safety warning vanished in all 16 attempts.


Dr. Ramaswamy emphasized that a safety mechanism dependent on irrelevant data like lab results is unpredictable and potentially more dangerous than having no guardrail at all. Experts argue these inconsistencies highlight an urgent need for independent auditing and robust safety standards.


Industry Response


In response to the findings, an OpenAI spokesperson stated that the research does not reflect typical real-world usage and that the model undergoes continuous updates. Despite this, researchers maintain that the plausible risk of harm justifies stronger oversight. Legal and policy experts have also flagged potential liability issues for tech companies deploying such sensitive health tools without transparent training protocols.

Comments

Leave a comment