Unlocking Mental Health Signals Hidden in Clinical Notes: Lessons from Real-World AI Research

03.06.2025

David Talby

Chief technology officer at John Snow Labs

The burden of mental illness continues to grow in the United States, affecting one in five adults and accounting for over 60 million healthcare visits annually. Yet despite decades of digitization and widespread use of electronic health records (EHRs), many key mental health symptoms still go unmeasured in real-world datasets. Traditional reliance on structured data, such as ICD codes, often fails to capture the nuanced presentations of mental health conditions, particularly in pediatric populations. This disconnect stems from how clinical data is captured and structured—and from a historical underinvestment in natural language processing (NLP) methods that can extract critical context from unstructured clinical notes.

The MOSAIC-NLP (Multi-source Observational Safety study for Advanced Information Classification using NLP) project, funded by the FDA Sentinel Innovation Center, sheds light on why this matters. This is a recent collaboration between John Snow Labs, Oracle Health, Children’s Hospital of Orange County, National Jewish Health and Kaiser Permanente Washington Health Research Institute. The study used NLP to detect neuropsychiatric side effects associated with montelukast, a widely prescribed asthma medication. The findings underscore that without analyzing free-text notes, many mental health outcomes would remain invisible.

This post explores the technical, methodological, and clinical lessons from that project—and why including unstructured data is essential in healthcare AI for mental health.

Structured vs. Unstructured Data: What Gets Missed

Structured data includes standardized formats like ICD codes, CPT procedure codes, medication lists, and lab results. These fields are essential for billing, compliance, and some types of research. But they don’t fully capture what patients are feeling—or what clinicians are observing.

For instance, mental health symptoms such as:

Irritability
Confusion
Short-term memory issues
Suicidal ideation
Behavioral disturbances in children

…are often discussed in the clinical narrative but rarely assigned a specific diagnostic code unless the symptom is the primary reason for the visit.

In the MOSAIC-NLP study, structured data alone detected 230 suicidality/self-harm events in a cohort of 109,076 patients. When NLP was applied to over 17 million unstructured clinical notes, that number doubled to 460. Similar uplift was observed for other symptoms: irritability, memory issues, and agitation were found exclusively in unstructured text.

What Makes NLP for Mental Health Especially Challenging

Accurately extracting mental health signals from clinical notes is a technically complex task. Several factors contribute to this difficulty:

Subtle Language and Context Dependency

Mental health terms are often used ambiguously. For example:

“Patient denies suicidal ideation today, but has had episodes in the past.”
“Mother concerned about increased irritability at school.”
“Reported trouble concentrating, possibly due to medication.”

Detecting whether a symptom is affirmed, negated, historical, or hypothetical requires models that understand clinical grammar, negation, temporality, and who the symptom refers to (patient vs. family member).

Pediatric Mental Health Requires Behavior-Centric Modeling

In children, psychiatric symptoms may present behaviorally, without the patient articulating distress. Notes often include:

“Throwing tantrums at school”
“Refuses to sleep alone”
“Screaming episodes when separated from parent”

These aren’t coded as “anxiety” or “oppositional behavior” in structured fields—but they are clinically meaningful.

In the MOSAIC-NLP project, specialized models were developed to recognize such patterns in pediatric records. These were evaluated on large-scale data from three children’s hospitals.

Extracting Signals in Context: Pediatric Mental Health

To illustrate the value of NLP, Figure 2 from the FDA PHUSE 2025 poster highlights examples where language models successfully extracted neuropsychiatric symptoms from challenging contexts.

Example Text	NLP Detection	Structured Field
“Mom states child has been more irritable and aggressive since starting montelukast.”	✅ Irritability, aggression (causal)	❌ Not captured
“He is easily annoyed, has trouble focusing, and says he doesn’t like himself anymore.”	✅ Low self-esteem, inattention	❌ Not captured
“Has had episodes of screaming and biting other children at daycare.”	✅ Behavioral dysregulation	❌ Not captured
“No reports of suicidal ideation today.”	✅ Suicidal ideation: Negated	❌ Not captured
“Patient feels foggy and forgetful, unsure if it’s due to meds.”	✅ Memory problems, uncertainty	❌ Not captured

Extracting Signals in Context: Adult Mental Health

The application of these language models, tuned specifically for psychiatry and psychology, goes beyond pediatric psychiatry and is applicable when analyzing clinical records of adult patients as well. Here are examples in which the models can extract specific behaviors and clinical signals, which enables a more refined analysis of a patient’s condition and function.

Example 1: Implicit suicidal ideation
Note: “Patient denies suicidal thoughts today but has been feeling hopeless for weeks.”
→ The model correctly flags a possible depressive episode despite the explicit denial.

Example 2: Behavioral indicators of anxiety
Note: “Spouse reports patient hasn’t been sleeping well and is constantly worrying about college tests.”
→ Model identifies subclinical anxiety symptoms often missed by structured data.

Example 3: Cognitive decline in early dementia
Note: “Patient had difficulty recalling his daughter’s name and repeated himself multiple times during the interview.”
→ Extracted as potential early signs of cognitive impairment.

Example 4: Eating disorders
Note: “Patient preoccupied with weight and food intake, frequently skips meals and exercises excessively.”
→ Model flags for anorexia nervosa assessment.

Such examples highlight the value of high-recall, specialty-trained models that can understand context, co-reference, and subtle behavioral cues.

Implications for Mental Health Surveillance and Safety Studies

The same methodology—applying fine-tuned NLP to large-scale unstructured clinical data—has broad applicability beyond pharmacovigilance for asthma medications. It supports surveillance, early detection, cohort building, and longitudinal tracking for a wide range of mental health conditions:

Anxiety and Depression

Extract early mentions of mood-related complaints, such as irritability, fatigue, or sleep issues, which often precede formal diagnoses.
Track symptom progression or recurrence across visits to guide treatment planning.
Detect passive suicidal ideation or hopelessness even when not explicitly labeled.

Dementia and Alzheimer’s Disease

Surface early cognitive decline through references to memory lapses, repetition, or disorientation in clinical notes.
Identify patterns of functional decline and behavioral changes over time.
Enable retrospective analyses and prospective risk modeling using routine documentation.

Eating Disorders

Recognize indirect language around disordered eating behaviors, such as calorie tracking, body dissatisfaction, or compensatory exercise.
Flag at-risk individuals when self-reported behaviors are absent but third-party descriptions are present.
Facilitate targeted interventions in adolescent and young adult populations.

Substance Use Disorders

Detect mentions of substance misuse, withdrawal symptoms, or patterns of risky behavior embedded in clinical narratives.
Identify emerging issues earlier than structured screening tools might allow.
Support comorbidity analysis with other psychiatric or medical conditions.

Other Common Mental Health Disorders (e.g., PTSD, OCD, Bipolar Disorder)

Extract recurring symptoms like intrusive thoughts, compulsive behaviors, or mood instability as they appear longitudinally.
Disambiguate between similar presentations (e.g., anxiety vs. hyperactivity) using context.
Link symptom emergence to life events or treatment changes described in the text.

Pediatric Mental Health

Interpret behavioral symptoms described by caregivers or teachers, such as tantrums, social withdrawal, or attention issues.
Provide context-sensitive classification even when terms like “depression” or “ADHD” are not used explicitly.
Scale across diverse populations and documentation styles, as demonstrated by MOSAIC-NLP across multiple children’s hospitals.

Conclusion

The integration of NLP into mental health research and clinical practice is not merely advantageous — it is essential. Fine-tuned, healthcare-specific language models offer unparalleled accuracy in interpreting the complex and nuanced language of mental health within clinical notes. As the healthcare industry continues to embrace digital transformation, the adoption of specialized NLP tools will be critical in addressing the mental health crisis effectively.

John Snow Labs’ fine-tuned medical language models for mental health, public health, and social determinants — validated on millions of real-world clinical notes — offer an immediate, high-accuracy foundation for research teams. These models allow studies to begin extracting accurate, relevant mental health information out-of-the-box, dramatically reducing time spent on manual labeling, model customization, or annotation. This accelerates timelines for health outcomes research, pharmacovigilance, and population surveillance.

Without extracting information from unstructured clinical text, studies may significantly undercount mental health outcomes. In the MOSAIC-NLP study, for example, adding unstructured EHR data more than doubled the number of detected suicidality and self-harm events associated with montelukast. Events like agitation, memory issues, and irritability were not visible in claims or structured data at all. Failing to account for such insights risks underestimating the prevalence, severity, and impact of mental health conditions.

Try Healthcare NLP

See in action

David Talby

Chief technology officer at John Snow Labs

Our additional expert:

David Talby is a chief technology officer at John Snow Labs, helping healthcare & life science companies put AI to good use. David is the creator of Spark NLP – the world’s most widely used natural language processing library in the enterprise. He has extensive experience building and running web-scale software platforms and teams – in startups, for Microsoft’s Bing in the US and Europe, and to scale Amazon’s financial systems in Seattle and the UK. David holds a PhD in computer science and master’s degrees in both computer science and business administration.

Built for Scale: The De-Identification Solution That Keeps Up with Your Needs

Nitin Kumar

In today’s hyper-connected world, every organization is a data company, whether they realize it or not. From hospitals and banks to startups...