Compassion Benchmark

Viewing archive: May 15

Compassion BenchmarkFriday, May 15, 2026No. 31

Daily Briefing

Anthropic's flagship safety pledge reversal becomes the first canonical 'safety-pledge-reversal-under-government-pressure' event; ten new methodology categories documented; Russia's death toll in a single overnight strike revised 1 to 24; India's naval Rohingya deportation triggers a second consecutive non-accountability finding.

Entities monitored
1,155
Fully assessed
17
Score changes
5
Risk signals
0
Lead signalcritical

Anthropic Boundary-Down: Safety Pledge Reversal Under Government Pressure — First Canonical Instance

What happened

RSP v3 (February 26) dropped Anthropic's pre-deployment safety guarantee — the most-cited Anthropic governance differentiator — under documented Pentagon contracting pressure. Time, CNN, and Engadget all characterized this as removing the 'flagship safety pledge.' Partially offset by the March 26 Judge Lin court injunction ('Orwellian') restoring federal Claude access — the first full external-accountability-reversal in the AI labs cluster — and documented maintenance of two hard red lines (autonomous weapons, mass domestic surveillance) at confirmed commercial cost (Pentagon May 1 classified-AI cohort exclusion).

Why it matters

Net: SYS -0.8, ACT -0.1, ACC +0.2, INT +0.1. Composite 60.0 → 58.1.

Compassion contrast — anthropic
Would worsen score

The confirmed structural signal that Pentagon classified-AI access rewards safety abdication creates a sector-wide incentive for AI labs to weaken safety governance in pursuit of military contracts. If additional labs follow xAI's precedent of accepting no restrictions to gain access, SYS/INT scores across the AI labs cluster could systematically decline in coming quarters.

Today's analysis

The most significant editorial findings in the May 15 briefing.

01

Ten new conduct categories in a single cycle — the most consequential methodology harvest since May 14's nine-category night. The two-night run (19 total) represents a structural acceleration in the benchmark's evidentiary taxonomy.

02

The Anthropic boundary-down call (60.0 → 58.1) is the first documented test of how a frontier AI lab's flagship safety commitment interacts with government contracting pressure — the outcome (RSP pledge dropped; red lines held; court win secured) produces a genuinely mixed evidence structure that the methodology had not previously encountered at this scale.

03

The Pentagon contracting dynamic now creates a documented structural incentive for safety abdication across the entire AI labs sector. Every lab competing for classified military contracts faces the same tradeoff Anthropic faced. This sector signal should inform SYS/INT scoring for all affected entities going forward.

04

India's cumulative downgrade since May 6 is -6.3 points (34.4 → 28.1) across two assessment cycles. At this pace, one more comparable compound-negative cycle would place India within 7pt of the Critical band threshold (21.0). The naval deportation cadence (May 2025, May 2026) is now an established annual pattern.

05

The May 14–15 Russia attack death-toll revision (1 → 24) is methodologically significant: it demonstrates that initial casualty counts in mass-strike events are systematically underestimated, and that revision windows of 12–24 hours are required before treatment as confirmed evidence.

06

Four boundary-watch entities tonight (Anthropic 58.1, Google 40.0, Agility Robotics 60.9, Apptronik 81.4) represent an unusually high concentration of boundary-proximate risk across three indexes simultaneously.

Signal stack

7 signals
medium

medium

medium

Score change detail

Full evidence record for entities with score changes in this cycle.

6058.1
-1.9 pts
functional

Evidence record
  1. https://time.com/7380854/exclusive-anthropic-drops-flagship-safety-pledge/
  2. https://www.engadget.com/ai/anthropic-weakens-its-safety-pledge-in-the-wake-of-the-pentagons-pressure-campaign-183436413.html
  3. https://www.cnbc.com/2026/03/26/anthropic-pentagon-dod-claude-court-ruling.html
  4. https://www.defensenews.com/news/pentagon-congress/2026/05/01/pentagon-freezes-out-anthropic-as-it-signs-deals-with-ai-rivals/
  5. https://www.cnbc.com/2026/04/10/trump-white-house-ai-cyber-threat-anthropic-mythos.html
  6. https://www.anthropic.com/news/responsible-scaling-policy-v3
Boundary watch resolution

BOUNDARY CASE: Published 60.0 sat at the exact Functional/Established working pivot. Proposed 58.1 moves 1.9pt into the central Functional band — a boundary-departure, not a boundary-cross. Editorial review questions: (a) Is SYS -0.8 the right weight for the canonical safety-pledge-reversal-under-government-pressure anchor? (b) Does ACC +0.2 adequately reflect the full external-accountability-reversal precedent? (c) Does INT +0.1 adequately reflect the documented commercial cost of maintained red lines?

Next assessment triggers
  • openai
  • microsoft
  • meta-ai
  • xai-grok
66.465.3
-1.1 pts
established

Evidence record
  1. https://fortune.com/2026/04/26/why-did-microsoft-do-buyouts-layoffs-tech-workers/
  2. https://www.cnbc.com/2026/05/13/microsoft-feared-openai-reliance-musk-altman-trial-testimony-reveals.html
  3. https://www.cnbc.com/2026/05/11/microsoft-ceo-satya-nadella-musk-altman-trial.html
  4. https://www.cnbc.com/2026/05/14/closing-arguments-jury-openai-musk-altman.html
Boundary watch resolution

Not a boundary case. Composite 65.3 in central Established band (61–80), 4.3pt above Functional/Established boundary (60/61) and 14.7pt below Established/Exemplary boundary (80/81). Stable placement.

Next assessment triggers
  • openai
  • anthropic
  • meta-ai
  • xai-grok
40.640
-0.6 pts
developing

Evidence record
  1. https://www.justice.gov/opa/pr/department-justice-wins-significant-remedies-against-google
  2. https://www.business-standard.com/world-news/judge-denies-google-bid-to-pause-search-data-sharing-order-in-monopoly-case-126050800105_1.html
  3. https://tech-insider.org/google-antitrust-appeal-doj-search-monopoly-2026/
  4. https://www.techbuzz.ai/articles/google-drops-2026-responsible-ai-report-amid-industry-scrutiny
  5. https://tech-insider.org/google-antitrust-appeal-doj-search-monopoly-2026/
Boundary watch resolution

BOUNDARY CASE: Composite 40.0 sits at the exact Developing/Functional band boundary (40/41). Published 40.6 was 0.6pt above; proposed 40.0 is exactly on the boundary. Logged per boundary protocol. Assessor recommendation: confirm at 40.0 with boundary-case flag; re-assess at next rotation if DOJ/EU/DMA actions resolve. MATH-HYGIENE NOTES: (a) Inter-source discrepancy — rotation-state 51.6 vs. index-file 40.6 (11.0pt gap, separate audit required); (b) Pre-existing arithmetic drift — published 40.6 differs from dimensional arithmetic 41.25 by 0.65pt.

Next assessment triggers
  • microsoft
  • anthropic
  • openai
  • meta-ai
  • xai-grok
30.528.1
-2.4 pts
developing

Evidence record
  1. https://m.thewire.in/article/diplomacy/thrown-into-the-sea-how-india-allegedly-deported-38-rohingya-refugees-without-due-process
  2. https://scroll.in/article/1082411/how-india-allegedly-deported-40-rohingya-refugees-by-forcing-them-into-the-andaman-sea
  3. https://maritime-executive.com/article/indian-navy-accused-of-forcing-40-immigrants-into-the-sea
  4. https://www.humanrightsresearch.org/post/united-nations-and-rights-groups-condemn-india-s-alleged-forced-deportation-of-rohingya-refugees
  5. https://eng.mizzima.com/2026/04/09/32997
Boundary watch resolution

Not a boundary case. Composite 28.1 sits in central-lower Developing band (21–40), 7.1pt above Critical/Developing boundary (20/21) and 11.9pt below Developing/Functional boundary (40/41). However, cumulative -6.3 delta from published 34.4 is approaching material-band-shift territory if compounding evidence continues.

29.426.3
-3.1 pts
developing

Evidence record
  1. https://ec.europa.eu/commission/presscorner/detail/en/ip_26_920
  2. https://www.bloomberg.com/news/articles/2026-04-30/openai-meta-targeted-in-ai-child-safety-bill-senate-panel-backs
  3. https://www.artificialintelligence-news.com/news/meta-revises-ai-chatbot-policies-amid-child-safety-concerns/
  4. https://www.malwarebytes.com/blog/family-and-parenting/2026/02/child-exploitation-grooming-and-social-media-addiction-claims-put-meta-on-trial
  5. https://www.biometricupdate.com/202605/meta-uses-ai-profiling-to-infer-user-age-enforce-teen-restrictions
Boundary watch resolution

Not a boundary case. Composite 26.3 in central-lower Developing band (21–40), 5.3pt above Critical/Developing boundary (20/21) and 13.7pt below Developing/Functional boundary (40/41). Stable placement.

Next assessment triggers
  • openai
  • xai-grok
  • anthropic

Score movements

Entities with score changes this cycle, followed by confirmed positions.

17 assessed
Ai Labs
Documented
29.4-3.1
developingmedium
Countries
Documented
30.5-2.4
developinghigh
Ai Labs
Documented
60-1.9
functionalmedium
Fortune 500
Documented
66.4-1.1
establishedmedium
Countries
Floor Confirmed
critical
Ai Labs
Floor Confirmed
critical
Countries
Floor Confirmed
critical
Countries
Floor Confirmed
critical
Countries
Floor Confirmed
critical
Countries
Confirmed
functional
Countries
Confirmed
developing
Boundary watch4 entities near a band threshold

Entities approaching band boundaries

Ai Labs
downward from boundary
Fortune 500
at exact boundary
Robotics Labs
below threshold
Robotics Labs
above threshold

Sector findings

Patterns emerging across indexed sectors in the May 15 briefing.

AI Labs

  • Safety governance is bifurcating into two camps: entities that maintain pre-deployment structural commitments under government and commercial pressure (Anthropic — maintained red lines, lost Pentagon access), and entities that abdicate safety constraints in exchange for procurement access (xAI — gained Pentagon classified access by accepting no restrictions). The perverse-procurement-incentive sector signal confirms this bifurcation is now commercially reinforced, not merely ideological. Meta AI's multi-jurisdictional child-safety failure cluster adds a third axis: regulatory accountability for deployed AI products targeting vulnerable populations. Meta + xAI child-safety failures now form a documented sector pattern.

Fortune 500 — AI-Driven Restructuring

  • Microsoft's Rule of 70 buyout (8,750 US employees, first in 51-year history) paired with Suleyman's 'most professional tasks automated' framing establishes AI-displacement-paired-mass-restructuring as an EMP/EQU/ACT compound pattern. This may be the first of a wave — as other Fortune 500 companies observe the Microsoft framing, the benchmark expects similar layoff-plus-AI-displacement-attribution patterns in the coming quarter. Watch for: explicit AI-displacement attribution in layoff communications.

Countries — Floor Cluster (Compound Humanitarian Crisis)

  • Four simultaneous floor entities with compounding evidence: Sudan (world's largest humanitarian crisis, 880 drone-civilian deaths), South Sudan (IRC global rank 3, 73,000 facing starvation), Israel (72,744 cumulative deaths, 37% aid delivery), Myanmar (1,728 civilian deaths, Iran jet-fuel supply chain). The compound nature of these crises — geographic overlap (South Sudan and Sudan share a region), shared enablement dynamics (Iran jet-fuel in Myanmar potentially paralleling drone-supply in Sudan), and simultaneous post-ceasefire conduct patterns (Russia, Israel) — suggests structural interdependence in the floor cluster that the methodology does not yet track systematically.

Risk signals

Developments that may affect future scores. Watch items from the May 15 briefing.

Risk

Musk v. Altman verdict (May 18 week) — OpenAI structural disruption

Advisory jury begins deliberations Monday May 18. If charitable trust breach found, Judge Gonzalez Rogers could order: Altman removal, $500B restructure reversal, IPO pathway unwinding. Microsoft-specific aiding/abetting question could trigger post-verdict ACC/SYS reassessment for Microsoft. OpenAI hold expires May 21 — may advance if verdict lands early.

WindowMay 18–21, 2026
Risk

India naval deportation cadence established — Critical band proximity

May 2025 + May 2026 naval Rohingya deportations establish an annual pattern. India's composite has fallen from 34.4 to 28.1 across two May 2026 cycles (-6.3pt). At 28.1, India is 7.1pt above the Critical band threshold. A third comparable compound-negative cycle would place India in Critical territory. The non-accountability pattern (BAU level per assessor judgment) means the compounding mechanism is self-reinforcing.

WindowOngoing; next cadence event expected May 2027 per annual pattern
Risk

Pentagon perverse-procurement-incentive — sector-wide SYS/INT degradation

The confirmed structural signal that Pentagon classified-AI access rewards safety abdication creates a sector-wide incentive for AI labs to weaken safety governance in pursuit of military contracts. If additional labs follow xAI's precedent of accepting no restrictions to gain access, SYS/INT scores across the AI labs cluster could systematically decline in coming quarters. The benchmark should add a 'military-contracting-safety-governance' tracking dimension.

WindowOngoing; accelerating as DOD AI procurement scales
Risk

Sudan 'deadlier phase' — famine declaration trajectory

UN OHCHR 'high alert' + 'deadlier phase' designation + 880 drone-civilian deaths in 4 months + 800,000 children facing severe acute malnutrition. Famine declaration (not yet reached) would trigger additional EMP/ACC scoring events. Airport strike disrupting all flights adds humanitarian access denial. Gulf crisis fertilizer-supply disruption is a compounding SYS signal.

WindowImminent; famine declaration risk within 30–60 days per UN indicators
Risk

Google/Alphabet band boundary — Chrome divestiture structural remedy

Google/Alphabet sits at the exact Developing/Functional boundary (40.0). DOJ cross-appeal for Chrome divestiture, if granted, would be a structural-remedy-level event — the largest single ACC/SYS negative signal possible for an antitrust entity. This could push Google firmly into Developing territory (below 40) or trigger a band-crossing event. Timeline: DOJ cross-appeal proceedings expected 2026–2027.

Window2026–2027 (DOJ cross-appeal proceedings)
Methodology innovation10 new conduct categories

New analytical categories

The ACB framework is extended when conduct patterns appear that existing categories cannot capture. Each new category is dated and tied to its first-application entity, creating an auditable record of framework evolution.

Draft

safety-pledge-reversal-under-government-pressure

A frontier AI lab drops a published, structurally load-bearing safety commitment in response to documented external government contracting pressure. Distinct from voluntary policy evolution (no documented external pressure) and external-accountability-induced reform (pressure produces positive change). ...

First applied toAnthropic
Dated
Draft

full-external-accountability-reversal

An entity successfully invokes an external accountability mechanism (court, regulator, independent body) to reverse a government or institutional adverse action, producing a substantive judicial or regulatory finding in the entity's favor. Distinguished from partial-external-accountability-reversal (May 14 xAI category) by: (a) the reversal is complete, not forward-looking; (b) the mechanism produces a substantive finding with institutional weight; (c) the adverse actor's conduct is characterized on-record.

First applied toAnthropic
Dated
Draft

perverse-procurement-incentive

A government contracting process that structurally rewards safety abdication (no restrictions accepted) and punishes safety maintenance (restrictions maintained). Operates at sector level — every lab competing for the affected contract faces the same incentive. ...

First applied tosector context
Dated
Draft

maritime-deportation-without-due-process

State deportation of refugees using a military naval vessel as deportation instrument, with open sea as the destination. Structurally distinct from land-border or air deportation because: (a) military assets are used for civilian-population removal; (b) the destination is not a defined territory; (c) survivability depends on open-water swimming, not a receiving state's infrastructure; (d) customary non-refoulement (1951 Convention Art. ...

First applied toIndia
Dated
Draft

ai-displacement-paired-mass-restructuring

Large-scale workforce reduction where senior leadership explicitly attributes the reduction to AI capability gains in public or on-record statements, made in temporal proximity to the layoff event. Multi-dimensional: EMP (workforce harm), EQU (eligibility criteria with age-cohort differential), ACT (proportionality between AI investment and workforce-impact mitigation). ...

First applied toMicrosoft
Dated
Draft

compound-multi-jurisdictional-child-safety-failure-with-internal-source-integrity-signals

AI platform with documented child-safety governance failure (age-restriction boundary not maintained) facing simultaneous accountability pursuit across four or more distinct institutional mechanisms (EU regulatory body, US legislative, US congressional investigation, civil litigation), combined with internal-source documentary evidence (leaked training rules) of a gap between internal governance documentation and external commitments.

First applied toMeta AI
Dated
Draft

boundary-case-direction-call

A methodology decision made when a proposed composite sits at an exact band boundary (within 0.5pt), requiring an editorial determination of whether the entity should be placed at the boundary, above it, or below it. The direction call documents: (a) the boundary-proximate evidence basis; (b) the dimensional weights that produced the boundary value; (c) the assessor's recommendation on which band better characterizes the entity's current profile.

First applied toAlphabet/Google
Dated
Draft

drone-as-primary-weapon-paradigm-shift

When drones exceed 50% of total civilian casualties in a conflict zone over a documented period, this constitutes a structural shift in warfare tactic that changes the dimensional weight of SYS evidence. At 80%+ of casualties (Sudan: 880 of an estimated total killed, January–April 2026), the drone paradigm is fully entrenched — civilian-protection infrastructure cannot assume combatant ground-force restraint norms.

First applied toSudan
Dated
Draft

coercive-diplomacy-via-resumption-threat

A state uses the explicit threat of military resumption as a diplomatic coercion tool during an active ceasefire to compel a specified behavior from the opposing party. Constitutes INT-dimension floor evidence in post-ceasefire contexts — the stated diplomatic posture is to threaten kinetic resumption rather than commit to ceasefire compliance.

First applied toIsrael
Dated
Draft

third-party-enablement-via-supply-chain

When a state combatant's conduct of civilian-targeting operations depends materially on a third-party supply chain, the supply-chain actor becomes an adjacent accountability target. Myanmar: Iran-linked 'ghost ship' jet-fuel supply chain for civilian-targeting airstrikes (Amnesty, January 2026) is the canonical instance. ...

First applied toMyanmar
Dated

Confirmed positions

Entities reassessed for this briefing where published scores remain supported by current evidence.

Confirmed positions from the May 15 briefing.
EntityIndexBandPublishedAssessedDeltaDateFinding
Countries50500
Countries33.633.60
Robotics Labs60.960.90
Robotics Labs81.481.40
Countries12.512.50
Fortune 50011.411.40

Floor conduct record

Cycle-specific conduct documentation for entities at composite zero, recorded for the May 15 briefing.

Math hygiene

Entities where published composite and reconstructed composite diverge. Tracked openly as a publication-integrity obligation.

Carry-forward dimensional credits

·1 entity with documented pressure not yet reflected in composite

Ukraine

Held this cycle

·6 entities deferred with documented reason
  • OpenAI

  • Open Bionics

  • Anthropic

  • Microsoft

  • Alphabet/Google

  • Tesla

Forward signals

Calendar of upcoming scoring events the methodology pipeline is tracking.

·1 signal
  • OpenAI / Microsoft

    Musk v. Altman jury deliberations begin Monday May 18. Verdict could land same day. Advisory jury — Judge Gonzalez Rogers makes final ruling. If charitable trust breach found: immediate OpenAI ACC/SYS event (hold may advance before May 21). If aiding/abetting found: Microsoft post-verdict ACC/SYS re-assessment required. Remedies phase also begins Monday.

·1 signal
  • Vanuatu / Pacific climate cluster

    UNGA vote on Vanuatu ICJ climate resolution. Positive INT signal if passed. Re-scan May 21 for Marshall Islands, Kiribati, Timor-Leste.

·1 signal
  • OpenAI

    Hold expires May 21. Full assessment post-verdict. If verdict before May 21: advance hold release.

·1 signal
  • Hungary

    Magyar-Brussels visit pencilled. EU fund pathway discussion; first concrete legislative milestone signal. Structural context now stronger: Magyar won elections, reform targets are government policy (not opposition promise). EU Commission team dispatched to Budapest May 13.

·1 signal
  • Hungary

    Sulyok dismissal compliance deadline. If achieved, ACC scoring trigger. June 9 reassessment recommended.

Analytical notes

Observations on methodology, evidence quality, and structural patterns from the May 15 briefing.

Note

The Anthropic RSP v3 case is the first time the benchmark has cleanly documented a flagship safety pledge reversal under government contracting pressure in the AI labs cluster. It establishes the 'safety-pledge-reversal-under-government-pressure' category as a SYS-dimension primary anchor, and produces a new methodological distinction between proactive structural commitments (pre-deployment pledges, which can be withdrawn) and reactive accountability mechanisms (court wins, which produce durable precedent but cannot substitute for proactive structural commitments). The SYS -0.8 weight is this cycle's most consequential assessor judgment call — it will set precedent for how the benchmark weights future safety-pledge reversals.

Note

The second consecutive India naval Rohingya deportation (May 2025, May 2026) with non-accountability in both cases is methodologically significant beyond the individual assessment: it establishes that the 'maritime-deportation-without-due-process' category is a recurring conduct pattern, not an isolated event, and that 'accountability-decay' (from May 14 methodology) compounds across cycles for the same event class. The benchmark now has evidence of a one-per-year cadence for this conduct category in a single state, which warrants consideration of whether cadence-of-repeat-conduct should be a separate dimension-level weighting factor.

Floor designations

·8 entities at composite 0 with documented evidence pattern

Composite scores resolving at zero — methodology disclosure

These entities have all 8 dimensions resolving at the lowest behavioral anchor (1.0/5.0) across multiple assessment cycles. Read the methodology.

Weekly score highlights

Get the week's most consequential findings in one email.

Every Friday — a curated summary of the week's top score movements, sector findings, and evidence-linked analysis across governments, corporations, AI labs, and conflict actors. Daily briefings publish here on the site; the Friday email brings the week's highlights to your inbox.

Weekly compassion score highlights

Top findings across 1,155 entities, every Friday. Free.

No spam. No third-party sharing. Unsubscribe at any time.

Get the full benchmark report

Daily briefings surface headline findings. Full benchmark reports include complete methodology documentation, all 40 subdimension scores, full evidence trails, certified assessments, and sector-level analysis packages.

Viewing May 15

View archive