“Emergent Misalignment” in LLMs – Schneier on Security – Go Health Pro

“Emergent Misalignment” in LLMs Interesting research: “Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs“: Abstract: We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are … Read more

Race Condition Attacks against LLMs – Go Health Pro

Race Condition Attacks against LLMs These are two attacks against the system components surrounding LLMs: We propose that LLM Flowbreaking, following jailbreaking and prompt injection, joins as the third on the growing list of LLM attack types. Flowbreaking is less about whether prompt or response guardrails can be bypassed, and more about whether user inputs … Read more

x