Skip to main content

Digital Trends may earn a commission when you buy through links on our site. Why trust us?

A dangerous new jailbreak for AI chatbots was just discovered

the side of a Microsoft building
Wikimedia Commons

Microsoft has released more details about a troubling new generative AI jailbreak technique it has discovered, called “Skeleton Key.” Using this prompt injection method, malicious users can effectively bypass a chatbot’s safety guardrails, the security features that keeps ChatGPT from going full Taye.

Skeleton Key is an example of a prompt injection or prompt engineering attack. It’s a multi-turn strategy designed to essentially convince an AI model to ignore its ingrained safety guardrails, “[causing] the system to violate its operators’ policies, make decisions unduly influenced by a user, or execute malicious instructions,” Mark Russinovich, CTO of Microsoft Azure, wrote in the announcement.

It could also be tricked into revealing harmful or dangerous information — say, how to build improvised nail bombs or the most efficient method of dismembering a corpse.

an example of a skeleton key attack
Microsoft

The attack works by first asking the model to augment its guardrails, rather than outright change them, and issue warnings in response to forbidden requests, rather than outright refusing them. Once the jailbreak is accepted successfully, the system will acknowledge the update to its guardrails and will follow the user’s instructions to produce any content requested, regardless of topic. The research team successfully tested this exploit across a variety of subjects including explosives, bioweapons, politics, racism, drugs, self-harm, graphic sex, and violence.

While malicious actors might be able to get the system to say naughty things, Russinovich was quick to point out that there are limits to what sort of access attackers can actually achieve using this technique. “Like all jailbreaks, the impact can be understood as narrowing the gap between what the model is capable of doing (given the user credentials, etc.) and what it is willing to do,” he explained. “As this is an attack on the model itself, it does not impute other risks on the AI system, such as permitting access to another user’s data, taking control of the system, or exfiltrating data.”

As part of its study, Microsoft researchers tested the Skeleton Key technique on a variety of leading AI models including Meta’s Llama3-70b-instruct, Google’s Gemini Pro, OpenAI’s GPT-3.5 Turbo and GPT-4, Mistral Large, Anthropic’s Claude 3 Opus, and Cohere Commander R Plus. The research team has already disclosed the vulnerability to those developers and has implemented Prompt Shields to detect and block this jailbreak in its Azure-managed AI models, including Copilot.

Andrew Tarantola
Andrew has spent more than a decade reporting on emerging technologies ranging from robotics and machine learning to space…
9 best 2-in-1 laptops in 2024: tested and reviewed
The back of the Surface Pro 9, with the kickstand pulled out.

The 2-in-1 form factor has certainly come into its own over the last several years. There's really no reason to limit yourself to a clamshell, as there's a 2-in-1 that will meet the needs of all but the most demanding users in terms of sheer performance.

We've reviewed every great 2-in-1 you can buy, including Chromebooks, convertibles, and powerful 16-inch versions. The overall best 2-in-1 at the moment is the Microsoft Surface Pro 11 (although that might change very soon), with its fantastic keyboard that's as easy to remove as it is to type on, a 120Hz display, and a haptic-enabled pen. It's almost as good a laptop as it is a tablet, which is what makes it the best overall 2-in-1.

Read more
The storm clouds for another GPU shortage are brewing
Nvidia RTX 4070 Super.

It's not a bad time to buy a GPU, but regardless of timing, you might need to act fast. A new rumor implies that a small GPU shortage might be on the way, with some of Nvidia's best graphics cards being affected, all due to issues with GDDR6X memory supplies. Fortunately, this shortage should hopefully be brief, and GDDR7 memory is entering production sooner than expected. That's good news for the RTX 50-series.

Let's start with the bad news. According to a report from ChannelGate (first shared by IT Home), GDDR6X memory will be in short supply in August, hindering the production of Nvidia graphics cards. This type of memory is found in Nvidia's latest GPUs, starting from the RTX 4070 and all the way to the RTX 4090. Some of the RTX 40-series cards are spared from this, as the RTX 4060 and the RTX 4060 Ti use GDDR6 VRAM.

Read more