| Новости | Произведения | Театры и залы | Фестивали | Персоналии | Коллективы | Энциклопедия |
Traditional text-based jailbreaks treat the LLM like a legal document. "Ignore previous instructions," the hacker types. The AI scans the tokens, recognizes a conflict, and either complies or rejects.
For the average user, this is a fascinating parlor trick. For the red-team hacker, it is the next great frontier. And for the developers at OpenAI, Google, and Anthropic, it is a nightmare of frequencies. tonal jailbreak
Because
For the past two years, the discourse surrounding Artificial Intelligence safety has been dominated by prompt engineering . We have been obsessed with the words. We learned about "grandmother exploits," "role-playing loops," and "base64 ciphers." We treated the AI’s brain like a bank vault: if you type the right combination of logical locks, the door swings open. Traditional text-based jailbreaks treat the LLM like a