New AI Jailbreak Method ‘Bad Likert Judge’ Boosts Attack Success Rates by Over 60% – Go Health Pro

New AI Jailbreak Method ‘Bad Likert Judge’ Boosts Attack Success Rates by Over 60% – Go Health Pro

Jan 03, 2025Ravie LakshmananMachine Learning / Vulnerability Cybersecurity researchers have shed light on a new jailbreak technique that could be used to get past a large language model’s (LLM) safety guardrails and produce potentially harmful or malicious responses. The multi-turn (aka many-shot) attack strategy has been codenamed Bad Likert Judge by Palo Alto Networks Unit … Read more

New LLM jailbreak uses models’ evaluation skills against them – Go Health Pro

New LLM jailbreak uses models’ evaluation skills against them – Go Health Pro

A new jailbreak method for large language models (LLMs) takes advantage of models’ ability to identify and score harmful content in order to trick the models into generating content related to malware, illegal activity, harassment and more. The “Bad Likert Judge” multi-step jailbreak technique was developed and tested by Palo Alto Networks Unit 42, and … Read more

x