New LLM jailbreak uses models’ evaluation skills against them – Go Health Pro
A new jailbreak method for large language models (LLMs) takes advantage of models’ ability to identify and score harmful content in order to trick the models into generating content related to malware, illegal activity, harassment and more. The “Bad Likert Judge” multi-step jailbreak technique was developed and tested by Palo Alto Networks Unit 42, and … Read more