AI robots lose effectiveness in long conversations with people—Microsoft's large study confirmed this.

AI robots lose effectiveness in long conversations with people—Microsoft's large study confirmed this.

21 hardware

Microsoft Research and Salesforce Study: How Large AI Models Lose Orientation in Dialogues

What was studied
Which models 200 000+ multi-turn conversations with leading LLMs GPT‑4.1, Gemini 2.5 Pro, Claude 3.7 Sonnet, OpenAI o3, DeepSeek R1, Llama 4

Key findings
Metric Result Accuracy on single prompts 90 % correct answers (GPT‑4.1, Gemini 2.5 Pro) Accuracy in long dialogues ~65 % – nearly a third drop in effectiveness Model behavior Often “reuses” its first incorrect answer as the basis for subsequent replies Response length Increases by 20‑300 % in multi-turn chats, leading to more hallucinations and speculation Reliability Drops to 112 % (models generate answers prematurely without finishing the prompt)

Why does this happen?
1. Reuse of a wrong foundation

The model clings to its first conclusion and builds subsequent responses on it, even if it’s incorrect.

2. Context inflation

With each new question more text is added – this increases the number of “invented” facts that the model treats as true.

3. Token‑thinking problem

Even models with extra “tokens” (o3, DeepSeek R1) couldn’t escape this trap – they still generate answers too early and without sufficient analysis.

What does this mean for users?
- Low reliability in real conversations

AI can “lose the topic,” starting to talk about nonexistent things.

- Risk of misinformation

Moving away from traditional search engines toward generative tools (e.g., Google‑AI reviews) raises the likelihood of receiving unreliable data.

- Importance of quality prompts

Microsoft previously noted low engineering in prompt creation. Poor questions and “bad” prompts can cause AI to fail to unlock its potential.

Conclusion
Large language model technology is still evolving. While they show high accuracy on single queries, their reliability in multi-turn dialogues remains an issue. For safe and effective use of AI it’s important to:

1. Write clear, specific questions.
2. Be ready to correct the model’s answers.
3. Not rely entirely on generative content without fact‑checking.

Ultimately, improving models and increasing their robustness in long conversations is key to making AI a reliable partner for users.

Comments (0)

Share your thoughts — please be polite and stay on topic.

No comments yet. Leave a comment — share your opinion!

To leave a comment, please log in.

Log in to comment