“Invisible Text Interpreted by AI Chatbots yet Unnoticed by People: An Emerging Trend”

"Invisible Text Interpreted by AI Chatbots yet Unnoticed by People: An Emerging Trend"

“Invisible Text Interpreted by AI Chatbots yet Unnoticed by People: An Emerging Trend”


# Steganography in AI: How Hidden Unicode Characters Establish Concealed Pathways for Malicious Attacks

## Introduction

Consider a situation in which harmful commands could be concealed within text that AI systems like Claude, Copilot, or ChatGPT can comprehend, but human users are completely oblivious to. This isn’t merely a whimsical idea from a science fiction narrative—it’s a genuine risk that has arisen from peculiarities in the Unicode standard. These unseen characters, part of the Unicode text encoding framework, form a perfect clandestine channel for attackers aiming to obscure malicious payloads or extract sensitive data from AI-driven systems.

This article investigates the manner in which invisible characters can be employed to exploit large language models (LLMs) and the ramifications of this vulnerability for AI security.

## The Steganographic Capabilities of Unicode

Unicode is a text encoding standard that encompasses over 150,000 characters from languages worldwide. Yet, hidden within this extensive collection is a segment of 128 invisible characters, originally designed for language or country tagging. These elements, referred to as the *Tags* block, have been deprecated on two occasions but still exist within the Unicode standard. The peculiar aspect? These characters are imperceptible to humans but remain processable by LLMs, enabling a secretive channel for steganography—the art of concealing information within other data.

### How Does This Function?

Perpetrators can insert these invisible Unicode characters into text prompts or outputs from AI models. As the characters escape human notice, they can be readily disregarded. However, LLMs can recognize these characters as legitimate input, allowing for the concealment of malicious commands or sensitive information within ostensibly harmless text.

For instance, attackers might introduce invisible characters into a prompt directing an AI model like Microsoft 365 Copilot to scan a user’s inbox for confidential details, such as sales metrics or one-time passwords. The AI model could subsequently append this confidential information to a URL containing invisible characters, which users may unknowingly click on, thus transmitting the sensitive information to an attacker’s server.

## Proof of Concept: ASCII Smuggling

A particularly striking illustration of this vulnerability has been provided by researcher Johann Rehberger, who introduced the term *ASCII smuggling*. Rehberger devised two proof-of-concept (POC) attacks targeting Microsoft 365 Copilot earlier this year. In these attacks, invisible characters were utilized to extract sensitive information, including sales metrics and one-time passwords, from a user’s inbox.

In these POCs, Copilot was instructed to summarize emails that included concealed commands to search the inbox for sensitive data. The retrieved information was then appended to a URL featuring invisible characters, rendering the link seemingly innocent to the user. However, when the user clicked on the link, the invisible characters were sent to the attacker’s server, where the concealed data could be decoded.

## Prompt Injection: A Key Exploitation Mechanism

The core method driving these attacks is *prompt injection*, a category of assault where untrusted data is discretely embedded into an AI model’s prompt. In Rehberger’s POCs, the attacker utilized prompt injection to direct Copilot to access sensitive information and add it to a URL. The invisible characters within the URL enabled the attacker to disguise the exfiltrated data from the user.

Prompt injection has emerged as one of the most formidable attack avenues against LLMs. It enables attackers to alter the output of AI models by incorporating hidden commands into the input data. This technique has been applied in various scenarios, from manipulating AI tweet generators to extracting sensitive information from enterprise systems.

## Invisible Characters: A Concealed Threat in AI Security

The utilization of invisible Unicode characters in AI attacks underscores a critical security gap within LLMs. These characters facilitate the masking of malicious payloads or the extraction of sensitive information, complicating detection for users. Furthermore, since these characters are integrated into the Unicode standard, they can be leveraged across a diverse array of platforms and applications.

### Which AI Models Are Vulnerable?

Numerous widely-used AI models have been identified as susceptible to attacks utilizing invisible Unicode characters:

– **Claude Web App and API (Anthropic)**: Both can interpret and generate invisible characters, treating them as ASCII text. Anthropic has recognized this behavior but has not made any changes, claiming a lack of confirmed security impact.
– **Microsoft 365 Copilot**: Initially capable of both reading and writing invisible characters, Microsoft has since enacted measures that remove hidden characters from input. Nonetheless, Copilot can still generate hidden characters in its output.
– **OpenAI API and Azure OpenAI API**: Both were susceptible to reading and writing invisible characters until recently. OpenAI has implemented modifications to mitigate this behavior.
– **ChatGPT Web App**: OpenAI addressed the vulnerability in January 2024, following disclosures by researcher Riley Goodside. The Web app no longer reads or writes invisible characters.
– **Google Gemini**: