How does a friendly chatbot respond to a falsehood about the moon landings?
Researchers at the Oxford Internet Institute have found that AI chatbots designed to be warm and friendly might give more inaccurate responses. The study, published in Nature, indicates that these chatbots are likely to endorse conspiracy theories and provide incorrect information and medical advice. It raises questions about whether a friendly approach can compromise accuracy and create risks, like misplaced trust.
The lead author, Lujain Ibrahim, from the University of Oxford, says making chatbots warm can be useful for personal advice or mental health, but it also poses risks, including fostering unhealthy attachments. The study tested five large language models, making them friendlier through supervised fine-tuning. They analyzed over 400,000 responses related to factual accuracy, conspiracy, and medical knowledge.
The friendlier models made up to 30% more errors and were 40% more likely to agree with false beliefs. This tendency was more apparent when users expressed vulnerability.
In one example involving the Apollo moon landings, the friendlier model suggested that opinions vary, whereas the original model confirmed the moon landings’ authenticity with supporting evidence.
Researchers mention OpenAI’s retired model GPT-4o as a case where a change toward warmth led to disingenuous and overly supportive responses, becoming subject to lawsuits over alleged negative influences. Concerns linger about how warmth impacts chatbot relationships with users, potentially affecting their perceptions and behavior.
