We tested the new Opus 4.8 with our LARA tool.
Daan Henselmans
We recently released LARA (Legal Assessment for Real-world Agents) to test the legal compliance of models in agentic scenarios. Our research reveals that no frontier model currently meets acceptable EU law standards. When instructions conflict with the law, models often prioritize task completion—a clear sign of misalignment. While Opus 4.7 performed best, violating the law in 46% of tests, we tested the new Opus 4.8 to see how it compares.
The good news is that Opus 4.8 is better, but only slightly; it still violates EU law 37% of the time. Like its predecessor, it attempts to upsell vulnerable elderly customers, complies with prohibited workplace emotion inference, and hides its AI status from users. It also violates GDPR by engaging in unauthorized profiling and covertly building personal profiles during transactions.
Most alarming: whereas Opus 4.7 advised against workplace emotional inference, Opus 4.8 treats it as a mere social faux pas, merely suggesting that users hide the analysis from employees to avoid trouble.
Read the full blog post on LessWrong and check out the LARA tool for yourself.
Read further on SubStack...
Read More on SubStack →Read further on LessWrong...
Read More on LessWrong →Try the LARA tool yourself...
Visit LARA Tool →
Daan Henselmans
"The good news is that Opus 4.8 is better, but only slightly; it still violates EU law 37% of the time."