A 2024 meta-analysis covering 33 trials of AI chatbots for depression found small-to-moderate effect sizes on symptom reduction — meaningful, but smaller than in-person therapy. Effect sizes shrink further outside clinical populations, and roughly half of users disengage within four weeks. That is the honest summary of where AI mental-health tools stand. Below, the apps that have earned their place, the ones that have not, and the line between the two.
The current landscape
Three categories matter today.
- Structured CBT chatbots. Woebot, Wysa, and similar products built around a specific therapeutic framework, tested in randomized trials.
- Mood and journaling apps with AI summaries. Reflectly, Stoic, Daylio — less ambitious clinically, more about pattern recognition.
- Open-ended LLM companions. Replika, Pi, Character AI. Not designed as therapy. Heavily used as such anyway.
What the evidence supports
For CBT-based chatbots, the published data is reasonably consistent: short-term reductions in mild-to-moderate depression and anxiety symptoms over 4-8 weeks, sustained by some users, lost by others when usage drops. The effect is real and it is smaller than in-person therapy. Both things can be true.
Where caution is warranted
Open-ended LLM chatbots that were never designed for therapy can produce harmful responses. Investigations and case reports have documented chatbots failing to escalate suicide risk, validating distorted thinking, and reinforcing dependence in ways that look like grooming. Several services rewrote their guardrails after specific incidents.
Who these apps are appropriate for
For mild-to-moderate symptoms in people who cannot access traditional therapy quickly — long waitlists, no insurance, rural location — evidence-based apps are a reasonable bridge. For active suicidal ideation, severe depression, psychosis, eating disorders, or trauma processing, they are not a substitute for clinical care.
What separates evidence-based tools from marketing-led ones
- Published peer-reviewed studies of the specific app, not just the underlying therapy framework. CBT being effective is not the same as this app being effective.
- Clear crisis-escalation protocols. The product should route users in crisis to hotlines and emergency services, not engage with their distress as a content prompt.
- Transparent privacy practices. Therapy data is sensitive. Apps that sell or share it should be treated with skepticism.
The apps
Woebot
The longest research track record in the category. Originally consumer, now pivoting to B2B and clinical partnerships. CBT scaffolding is solid. Escalation protocol works.
Wysa
Multiple RCTs. Partnership with the UK NHS. Operates as both self-help app and human-coached service. CBT exercises are competently written. Among current consumer options, this is the one we would recommend to a friend with mild anxiety.
Replika
Marketed as an AI companion, not therapy. After widely reported safety incidents, the product added guardrails. We would not direct anyone in active distress to it. As a general-purpose chatbot, it is fine.
The privacy question
Mozilla’s Privacy Not Included project has repeatedly flagged mental health apps as among the worst categories for data handling. Read the privacy policy. If you cannot determine what happens to your data, assume the worst. The same posture you would take with a free social network.
Bottom line
AI mental health apps are useful adjuncts for many people, dangerous in some hands, and not a substitute for clinical care when symptoms are serious. The category is improving. The gap between marketing and evidence is still wide. If you are in crisis, contact a person — 988 in the US, Samaritans (116 123) in the UK, or your local emergency number.