AI robots lose effectiveness in long conversations with people—Microsoft's large study confirmed this.

24.02.2026 21 hardware

Microsoft Research and Salesforce Study: How Large AI Models Lose Orientation in Dialogues

What was studied
Which models 200 000+ multi-turn conversations with leading LLMs GPT‑4.1, Gemini 2.5 Pro, Claude 3.7 Sonnet, OpenAI o3, DeepSeek R1, Llama 4

Key findings
Metric Result Accuracy on single prompts 90 % correct answers (GPT‑4.1, Gemini 2.5 Pro) Accuracy in long dialogues ~65 % – nearly a third drop in effectiveness Model behavior Often “reuses” its first incorrect answer as the basis for subsequent replies Response length Increases by 20‑300 % in multi-turn chats, leading to more hallucinations and speculation Reliability Drops to 112 % (models generate answers prematurely without finishing the prompt)

Why does this happen?
1. Reuse of a wrong foundation

The model clings to its first conclusion and builds subsequent responses on it, even if it’s incorrect.

2. Context inflation

With each new question more text is added – this increases the number of “invented” facts that the model treats as true.

3. Token‑thinking problem

Even models with extra “tokens” (o3, DeepSeek R1) couldn’t escape this trap – they still generate answers too early and without sufficient analysis.

What does this mean for users?
- Low reliability in real conversations

AI can “lose the topic,” starting to talk about nonexistent things.

- Risk of misinformation

Moving away from traditional search engines toward generative tools (e.g., Google‑AI reviews) raises the likelihood of receiving unreliable data.

- Importance of quality prompts

Microsoft previously noted low engineering in prompt creation. Poor questions and “bad” prompts can cause AI to fail to unlock its potential.

Conclusion
Large language model technology is still evolving. While they show high accuracy on single queries, their reliability in multi-turn dialogues remains an issue. For safe and effective use of AI it’s important to:

1. Write clear, specific questions.
2. Be ready to correct the model’s answers.
3. Not rely entirely on generative content without fact‑checking.

Ultimately, improving models and increasing their robustness in long conversations is key to making AI a reliable partner for users.

AI robots lose effectiveness in long conversations with people—Microsoft's large study confirmed this.

Related news

Samsung is working on HBM5 with the possibility of using even 2‑nm crystals

DDR5 now brings more profit than HBM, according to leading memory manufacturers

Intel admitted that its new desktop Core Ultra Plus CPUs are almost no faster than Ryzen in games

NASA is working to rescue the falling Swift Observatory, which could leave orbit by the end of the year

Comments (0)

Log in to comment