Stanford HAI 2026 AI Index: safety benchmarks absent for most frontier models as US-China capability gap narrows to 2.7%

Stanford HAI’s 2026 AI Index Report, published this week, documents a widening gap between frontier AI capability benchmarks and responsible AI safety benchmarks. Almost every frontier model developer reports results on capability benchmarks; the same is not true for responsible AI benchmarks covering safety, fairness and human agency. Only Claude Opus 4.5 reports results on more than two responsible AI benchmarks. The report also challenges the assumption of a durable US lead: US and Chinese models have traded the top performance position multiple times since early 2025, with Anthropic’s top model leading by only 2.7% as of March 2026. China now leads in AI publication volume, citation share and patent grants. For EU law enforcement procurement, the benchmark gap has direct AI Act implications: high-risk AI systems under Annex III require documented risk management and testing, but no common external benchmark framework currently exists against which deployers can independently verify vendor safety claims.