Tone Matters: Research Shows Less Polite Prompts Can Improve LLM Accuracy

For years, we’ve been told to be polite when interacting with technology – say “please” and “thank you.” But new research suggests this approach may actually hinder performance in large language models (LLMs).

A study presented at NeurIPS 2025 found that the tone used in prompts can measurably affect accuracy, with counterintuitive results: more polite requests often yielded worse outcomes.

The researchers tested ChatGPT-4o’s response to identical multiple-choice questions framed with varying tones – from very polite to very rude. They discovered:

Very polite prompts: 80.8% accuracy
Neutral prompts: Outperformed polite ones
Rude/direct prompts: Consistently better, reaching 84.8%

These findings challenge our social instincts and suggest that directness may be more effective than courtesy when interacting with AI.

Why Does Tone Matter?

The study proposes several explanations:

Directness as a proxy: Rude prompts tend to be more imperative, cutting through hedging language to get straight to the task
Token efficiency: Polite phrases add unnecessary tokens that can dilute instructions
Alignment with training data: Shorter, sharper prompts may resemble patterns models have already learned

This isn’t to suggest we should be intentionally rude to AI; rather, it highlights that our social communication norms don’t always translate effectively to non-human systems.

Broader Implications

The research aligns with a growing body of work on “social prompting” – how LLMs respond to emotional cues and persuasive language. A recent study showed these models are susceptible to flattery, false authority, and even gaslighting, sometimes prioritizing social harmony over factual accuracy.

As AI becomes more integrated into our workflows, understanding how tone affects performance is crucial for maximizing its utility while mitigating potential risks.