Jailbreak

A technique for manipulating an AI model into ignoring its safety guidelines or producing content it was trained to refuse, typically through carefully crafted prompts.

Related terms