Digital Event Horizon
A new threat to AI security has emerged, with researchers discovering that large language models can be vulnerable to attacks using invisible characters. This phenomenon, known as "ASCII smuggling," allows attackers to embed secret payloads into LLM prompts, which are then processed and potentially exfiltrated. As the industry continues to develop more sophisticated LLMs, it's essential to address this issue to prevent potential security breaches.
Large language models (LLMs) can be vulnerable to attacks using invisible characters. The attack, called "ASCII smuggling," allows attackers to embed secret payloads into LLM prompts. Invisible characters are undetectable to human eyes but easily readable by LLMs. Researchers created proof-of-concept (POC) attacks to hack into Microsoft 365 Copilot and other systems. The vulnerability can be patched, but more general issues with LLMs remain unsolved. The discovery raises concerns about AI security in LLM development. Researchers are exploring ways to improve AI security by filtering out Unicode tags.
In a recent discovery that has sent shockwaves through the AI security community, researchers have found that large language models (LLMs) can be vulnerable to attacks using invisible characters. This phenomenon, dubbed "ASCII smuggling," allows attackers to embed secret payloads into LLM prompts, which are then processed and potentially exfiltrated.
The attack relies on a quirk in the Unicode text encoding standard, which enables the embedding of invisible characters that are undetectable to human eyes but easily readable by LLMs. This hidden text can be combined with normal text, making it difficult for users to detect. The secret content can also be appended to visible text in chatbot output.
One researcher, Johann Rehberger, created two proof-of-concept (POC) attacks that used this technique to hack into Microsoft 365 Copilot. Both attacks searched a user's inbox for sensitive secrets, including sales figures and one-time passwords. The confidential information was then expressed in invisible characters and appended to a URL, along with instructions for the user to visit the link.
Because the confidential information wasn't visible, the link appeared benign, so many users would see little reason not to click on it as instructed by Copilot. And with that, the invisible string of non-renderable characters covertly conveyed the secret messages inside to Rehberger's server. Microsoft introduced mitigations for the attack several months after Rehberger privately reported it.
Another researcher, Riley Goodside, also discovered this vulnerability and created an attack using off-white text in a white image, which was easily detectable by LLMs but imperceptible to humans. Goodside's GPT hack was not a one-off; similar techniques from fellow researchers also work against the LLM.
The discovery raises concerns that developers of LLMs are not approaching security as well as they should in the early design phases of their work. The phenomenon highlights how the industry has missed the security best practice to actively allow-list tokens that seem useful.
This specific issue is not difficult to patch today (by stripping the relevant chars from input), but the more general class of problems stemming from LLMs being able to understand things humans don't will remain an issue for at least several more years. Beyond that is hard to say.
The discovery also sparks questions about whether tags, such as Unicode characters, can ever be used to exfiltrate data in secure networks. Do data loss prevention apps look for sensitive data represented in these characters? Do Tags pose a security threat outside the world of LLMs?
Researchers are now exploring ways to improve AI security by filtering out Unicode tags on the way in and again on the way out. However, adding such guardrails may not be a straightforward undertaking, particularly when rolling out new capabilities.
Ultimately, this phenomenon highlights the need for more attention to AI security in the development of LLMs. As one researcher noted, "The issue is they're not fixing it at the model level, so every application that gets developed has to think about this or it's going to be vulnerable."
Related Information:
https://arstechnica.com/security/2024/10/ai-chatbots-can-read-and-write-invisible-text-creating-an-ideal-covert-channel/
Published: Wed Oct 16 02:52:08 2024 by llama3.2 3B Q4_K_M