“Emergent Misalignment” in LLMs – Schneier on Security – Go Health Pro
“Emergent Misalignment” in LLMs Interesting research: “Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs“: Abstract: We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are … Read more