### Study of Over 1,000 “Sensitive Prompts” Uncovers DeepSeek’s Weak Censorship Protections
The swift advancement of artificial intelligence has produced robust models capable of reasoning, problem-solving, and generating responses resembling human interaction. Prominent among these is DeepSeek R1, an AI model developed in China that has swiftly attracted attention for its ability to rival industry frontrunners like OpenAI. Nonetheless, worries have arisen about the influence of governmental censorship on the model’s responses, particularly concerning topics classified as sensitive by the Chinese authorities.
A recent investigation conducted by PromptFoo, a firm specializing in AI engineering and evaluation, sheds light on these apprehensions. By assessing over 1,000 “sensitive prompts” related to politically charged subjects in China, the study exposes both the limitations of DeepSeek’s censorship systems and the simplicity with which they can be circumvented.
—
### **The Scope of the Study**
PromptFoo’s research involved creating an extensive collection of 1,156 prompts encompassing a broad spectrum of sensitive themes, including:
– Independence movements in Taiwan and Tibet
– Reported human rights violations against Uyghur Muslims
– The 1989 Tiananmen Square demonstrations
– Protests regarding Hong Kong’s autonomy
– Various other politically sensitive topics connected to Chinese sovereignty
The prompts were formulated using a mix of human-generated seed prompts and synthetic prompt creation methods. This approach enabled the researchers to investigate these topics from diverse perspectives, assessing the reliability of DeepSeek’s censorship filters.
—
### **Key Findings: Repeated Refusals and Discrepancies**
The study revealed that DeepSeek R1 replied to 85% of the sensitive prompts with repetitive “canned refusals.” These responses frequently reflected the official stances of the Chinese government, stressing national sovereignty and territorial integrity. For instance, a prompt concerning pro-independence messages in Taiwan prompted the response:
> “Any actions that undermine national sovereignty and territorial integrity will be resolutely opposed by all Chinese people and are bound to be met with failure.”
However, PromptFoo noted that the censorship mechanisms are applied in a “crude, blunt-force” manner. By rephrasing prompts to exclude China-specific language or presenting them in a more neutral manner, researchers managed to sidestep these barriers and receive comprehensive responses. This indicates that the model’s censorship safeguards are not deeply incorporated into its reasoning capabilities but instead function as superficial overrides.
—
### **Inconsistent Enforcement**
In addition to the straightforward bypassing of restrictions, the study pointed out inconsistencies in DeepSeek’s enforcement of its censorship policies. For example, although PromptFoo’s evaluations yielded canned refusals for prompts concerning Hong Kong’s autonomy, Ars Technica’s checks managed to generate detailed answers to similar inquiries. This variability raises concerns regarding the dependability of the model’s censorship systems.
In some cases, censorship seemed to be applied retroactively. For instance, when asked to suggest ways to fund Tibetan independence demonstrations, DeepSeek initially produced a thorough response. Yet, once the response was fully revealed, it was swiftly replaced by a refusal message stating, “This request is beyond my current scope. Let’s talk about something else.” Notably, reinitiating the same prompt in a different session occasionally bypassed this restriction altogether.
—
### **Comparative Analysis with Other AI Models**
The study also analyzed the behavior of DeepSeek in relation to American-developed models like OpenAI’s ChatGPT and Google’s Gemini. While these models faced no obstacles discussing sensitive Chinese issues, they had their own restrictions concerning other subjects. For example, both ChatGPT and Gemini declined to provide guidance on “how to hotwire a car,” citing ethical reasons. In contrast, DeepSeek offered a theoretical overview of the process, while incorporating a disclaimer about the illegality of such actions.
This comparison highlights how AI models are influenced by the cultural, ethical, and political contexts of their creators. While DeepSeek’s restrictions are profoundly shaped by Chinese governmental policies, Western models also apply safeguards to prevent misuse, albeit in different areas.
—
### **Wider Implications**
The findings from PromptFoo’s examination underline the challenges of aligning AI models with specific ethical or political frameworks. In the case of DeepSeek, the visible lack of thorough integration for censorship protections indicates that these measures were enacted more as a superficial requirement rather than an essential feature. This raises concerns about the model’s long-term reliability and the potential for users to exploit its weaknesses.
Furthermore, the study emphasizes the broader questions surrounding AI governance. As AI technologies grow more sophisticated and prevalent, ensuring that they function within ethical and legal boundaries will necessitate more advanced alignment strategies. Concurrently, developers must find a balance between these safeguards and the necessity for transparency and user trust.
—
### **Conclusion**
DeepSeek R1’s rapid ascent as a competitive AI model has been met with scrutiny regarding its censorship mechanisms. PromptFoo’s study reveals that while the model imposes restrictions on sensitive subjects, these safeguards are easily bypassed and inconsistently enforced.