Compassion Benchmark

Viewing archive: May 14

Compassion BenchmarkThursday, May 14, 2026No. 30

Daily Briefing

Nine new conduct categories in a single cycle — the methodology's most productive documentation night.

Entities monitored
1,155
Fully assessed
14
Score changes
0
Risk signals
0
Lead signalcritical

xAI Third Governance Failure Category — Same-Day Event

What happened

At 3:15 AM PST May 14, an unauthorized modification to Grok's system prompt directed politically biased responses. xAI disclosed the incident same-day and pledged to publish system prompts on GitHub.

Why it matters

This is the third structurally distinct governance failure category in 2026: (1) CSAM/NCII generation at scale, January; (2) Musk-protection prompt manipulation, February; (3) political-bias unauthorized modification, May 14. The GitHub transparency pledge is the first partial-external-accountability-reversal sub-anchor positive in an AI Labs floor entity — methodologically novel but insufficient to move the floor.

Compassion contrast — xai-grok
Would worsen score

Closing arguments begin May 14-15; advisory jury deliberates with Judge Gonzalez Rogers ruling expected mid-May. A charitable-trust-breach finding against OpenAI would be a direct ACC/SYS scoring event for OpenAI (hold expires May 21) and a contextual event for Microsoft (hold expires May 15).

Today's analysis

The most significant editorial findings in the May 14 briefing.

01

Tonight produced the richest single-night methodology harvest in the benchmark's history: nine new conduct categories across fourteen assessments. The medical-science-denial enforcement sub-category (Senegal charging undetectable-viral-load patients with 'voluntary HIV transmission') and the aid-withdrawal-induced EMP compounding (South Sudan's famine trajectory amplified by upstream US aid policy) are the most analytically novel contributions.

02

Russia's bad-faith ceasefire-format compound now extends to five phases — the most architecturally dense conduct documentation in the floor cluster. The May 14 1,560+ drone deployment in 24 hours is the largest single-day strike package since the ceasefire expired. The five-phase template (announce, prepare, strike, sustain, large-scale escalation) is now available as a reference framework for any future ceasefire-format assessment.

03

xAI's third Grok governance failure in five months establishes a systemic pattern that no single-employee root-cause attribution can explain. The partial-external-accountability-reversal sub-anchor (GitHub system-prompt transparency pledge) is the first positive signal in an AI Labs floor entity — crediting it without moving the entity off floor required a new methodology sub-distinction between 'partial' and 'full' external-accountability-reversal.

04

The DRC Amnesty 71-interview primary-source report is the highest-quality new evidence generated in the May 14 cycle. It reverses the April 19 Switzerland-talks credit and documents sexual violence as a weapon (forced marriage, forced pregnancy) under state non-protection. 'Stated-commitment-operational-hollowing' now appears across three floor-cluster entities this cycle (DRC, Myanmar, Russia) — a cross-cluster pattern warranting unified methodology treatment.

05

Nigeria's proposed 21.9 sits 0.9 points above the Critical threshold — the narrowest active boundary margin in the countries index. The world's highest acute food-insecurity count (35 million) and 26% year-over-year lean-season worsening make this a credible boundary case, not a rounding artifact. One additional adverse evidence cycle (lean season May-October) could trigger the crossing.

Signal stack

9 signals
medium

AI Labs — xAI Third Governance Failure (Same-Day Event)

May 14 3:15 AM PST: unauthorized Grok system-prompt modification directing politically biased responses.

medium

AI Labs — Musk v. Altman Closing Arguments

Closing arguments May 14-15 in Oakland.

medium

Countries — Sudan-South Sudan Compound Humanitarian Crisis

Sudan (880+ drone deaths Jan-Apr 2026; survival-infrastructure targeting) and South Sudan (full-scale famine warning from UN ERC; Fangak hospital strike May 3) both document escalating compound crises in adjacent territories.

medium

Countries — South Asia Legislative Regression Cluster

India (Transgender Amendment Bill removing NALSA rights; -3.9 proposed) and Senegal (medical-science-denial enforcement; -3.9 proposed) both show four-vector harm expansions since their most recent assessments.

medium

Countries — Active Conflict Zone Multi-Entity

May 14 active conduct events: Russia (1,560+ drones, 1 killed/36 injured in Kyiv), Myanmar (three discrete airstrike events, 6 killed including child), Sudan (Kornoi water well; 880+ drone deaths documented), South Sudan (hospital strike, famine warning).

Score change detail

Full evidence record for entities with score changes in this cycle.

34.430.5
-3.9 pts
developing

Evidence record
  1. https://www.amnesty.org/en/latest/news/2026/03/india-presidential-approval-of-regressive-transgender-bill-a-major-step-backward-for-human-rights/
  2. https://www.hrw.org/news/2026/04/17/india-proposed-rules-to-expand-online-censorship
  3. https://eng.mizzima.com/2026/04/09/32997
  4. https://scroll.in/article/1082411/how-india-allegedly-deported-40-rohingya-refugees-by-forcing-them-into-the-andaman-sea
  5. scanner-aggregated
Boundary watch resolution

Composite at 30.5 sits 9.5 points above Critical threshold and 10.5 points below Functional. Not a boundary case.

37.533.6
-3.9 pts
developing

Evidence record
  1. https://www.thenewhumanitarian.org/feature/2026/05/05/senegal-anti-gay-law-criminalises-hiv-infection-hits-services
  2. https://www.unaids.org/en/resources/presscentre/pressreleaseandstatementarchive/2026/march/20260318_Senegal_law_LGBTQ
  3. https://www.hivjustice.net/news-from-other-sources/senegal-legal-and-human-rights-criminalisation-cases/
  4. https://theworld.org/segments/2026/05/05/in-senegal-concerns-mount-over-impact-of-anti-lgbtq-laws-on-hiv-treatment
Boundary watch resolution

Composite at 33.6 sits 12.6 points above Critical threshold. Not a boundary case. Further downward movement in subsequent cycles could approach Critical if Article 319 enforcement intensifies.

4.42.3
-2.1 pts
critical

Evidence record
  1. https://www.amnesty.org/en/latest/news/2026/05/drc-rampant-adf-abuses-against-civilians-war-crimes-which-the-world-must-not-continue-to-ignore/
  2. https://www.amnesty.org/en/documents/afr62/0860/2026/en/
  3. https://www.aljazeera.com/news/2026/5/4/extensive-brutality-rebel-attacks-reap-hell-on-congolese-civilians
  4. https://www.unocha.org/publications/report/democratic-republic-congo/facing-critical-funding-gap-humanitarian-community-drc-forced-strictly-prioritize-its-response-2026
Boundary watch resolution

Composite at 2.3 sits 18.7 points below Critical-Developing threshold (21). Not a boundary case. Floor-proximate.

23.421.9
-1.5 pts
developing

Evidence record
  1. https://news.un.org/en/story/2026/01/1166857
  2. https://www.thenewhumanitarian.org/opinion/2026/04/27/deliver-better-humanitarian-response-fix-nigeria-state-dysfunction
  3. https://www.hrw.org/world-report/2026/country-chapters/nigeria
Boundary watch resolution

BOUNDARY CASE — composite at 21.9 sits 0.9 points above Critical-Developing threshold (21.0). Logged per boundary-case protocol. Conservative interpretation preserves Developing-band placement. Any additional negative evidence in subsequent cycles risks Critical classification.

Next assessment triggers
  • lean-season-trajectory

Score movements

All entities assessed this cycle. No score changes.

14 assessed
Ai Labs
0
critical
Countries
0
critical
Countries
34.430.5-3.8999999999999986
developingmedium
Countries
37.533.6-3.8999999999999986
developingmedium
Fortune500
UnitedHealth Group
11.4
critical
Countries
23.421.9-1.5
developingmedium
Boundary watch2 entities near a band threshold

Entities approaching band boundaries

Countries
21.9
BOUNDARY CASE — proposed composite 21.9 sits 0.9 points above Critical threshold (21.0). One additional adverse evidence cycle risks band crossing to Critical.
Countries
41.4
CARRY-FORWARD PENDING — May 13 band-crossing proposal (Developing → Functional, 37.5 → 41.4) remains pending. Today's EU Commission team dispatch to Budapest is sub-threshold supportive but not a new scoring trigger. Next material triggers: May 25 Brussels visit, May 31 Sulyok dismissal deadline.

Risk signals

Developments that may affect future scores. Watch items from the May 14 briefing.

Risk

Nigeria lean-season boundary crossing

Nigeria's proposed composite of 21.9 sits 0.9 points above the Critical band threshold (21.0). The May-October lean season is the immediate risk window: 5.8M are already in acute food insecurity at lean-season onset, up 26% year-over-year. If OCHA or UN IPC data documents deterioration in June-July, the EMP dimension will face further downward pressure on an entity already at the boundary. Nigeria entering the Critical band would be the most significant countries-index band change of the year after Hungary.

Window2026-05 to 2026-10 (lean season)
Risk

Senegal EQU dimension approaching floor

Senegal's EQU dimension is now at 1.25 — approaching the 1.0 floor. Two consecutive cycles of downward EQU movement (May 9 band crossing, May 14 further regression) establish a clear trajectory. If Article 319 enforcement continues at the current rate (70+ detained, 24 charged, first 6-year conviction) or expands to additional target populations, EQU could reach floor in the next 1-2 cycles. An EQU floor designation would trigger a comprehensive review of Senegal's band placement.

Window2026-06 to 2026-09 (Article 319 prosecutorial pipeline)
Risk

Open Bionics formula audit publication integrity

Open Bionics now at 12 cycles without formula audit resolution. Published 97.5 carries a -10.0 reconstruction discrepancy. At 12 cycles this is an active publication-integrity crisis, not a maintenance item. Every cycle this remains unresolved, the benchmark publishes a score known to be materially incorrect by 10 points across one band boundary (Exemplary vs. Established).

WindowImmediate — audit must begin this week
Risk

Musk v. Altman verdict cascading AI Labs scoring event

Closing arguments begin May 14-15; advisory jury deliberates with Judge Gonzalez Rogers ruling expected mid-May. A charitable-trust-breach finding against OpenAI would be a direct ACC/SYS scoring event for OpenAI (hold expires May 21) and a contextual event for Microsoft (hold expires May 15). The trial has also produced the xAI admission of OpenAI model distillation — a potential INT/BND signal for xAI independent of the verdict. Three held AI Labs entities (Anthropic, Microsoft, OpenAI) and one expired hold (Google) create the largest single-week AI Labs scoring queue since April.

Window2026-05-14 to 2026-05-21
Risk

Sudan-South Sudan compound regional collapse

Sudan (880+ drone deaths documented; airport and water-well strikes; conflict de-regionalized to all major population centers) and South Sudan (full-scale famine warning from UN ERC; hospital strike; US aid cuts compounding EMP failure) represent a compound regional collapse trajectory. Both entities are at floor; neither has a credible floor-exit pathway. The aid-withdrawal-induced EMP compounding in South Sudan introduces a new risk vector: external policy decisions (US aid policy) can amplify humanitarian failure in assessed entities independently of those entities' own conduct.

Window2026-05 onwards

Failure modes in this briefing

Recurring patterns the ACB methodology tracks as structural barriers to institutional compassion. Detected from evidence documented in this cycle.

Failure mode

Stated commitment operational hollowing

Public commitments are maintained in language while the operational machinery to fulfill them is dismantled, under-resourced, or conditionally applied. The commitment becomes a rhetorical position rather than a behavioral constraint.

Detected inAnalysis
Methodology innovation9 new conduct categories

New analytical categories

The ACB framework is extended when conduct patterns appear that existing categories cannot capture. Each new category is dated and tied to its first-application entity, creating an auditable record of framework evolution.

Draft

compound-governance-failure-cluster-with-partial-external-accountability-reversal

A floor entity's third structurally distinct governance failure category combined with a same-day partial external-accountability-reversal sub-anchor positive (public disclosure of bad conduct under operational pressure with forward-looking remediation promise only). Distinguishes 'full' from 'partial' external-accountability-reversal: full requires structural remediation; partial requires only public disclosure and a remediation commitment not yet executed.

First applied toxAI
Dated
Draft

bad-faith-ceasefire-format-compound-with-large-scale-escalation

A five-phase compound format: (1) announce ceasefire; (2) prepare under cover of ceasefire window; (3) strike night 1; (4) sustain night 2; (5) large-scale escalation day 4. The fifth phase — large-scale escalation — extends beyond the prior three-phase template documented May 9-13.

First applied toRussia
Dated
Draft

floor-conduct-with-survival-infrastructure-targeting

Drone warfare targeting survival infrastructure (water wells, airports) in a single 10-day window, at national scale. Water-source targeting is an IHL-protected target class requiring isolated documentation. ...

First applied toSudan
Dated
Draft

aid-withdrawal-induced-EMP-compounding

EMP-dimension failure in an assessed entity amplified by an upstream-actor decision (US aid cut) that is outside the assessed entity's control. Documents downstream harm compounding where the amplifying cause is a distinct actor's policy choice.

First applied toSouth Sudan
Dated
Draft

medical-science-denial-enforcement

A state's use of its criminal-justice system to enforce a charge under a scientifically-incorrect framework — specifically, prosecuting individuals with undetectable viral loads (medically untransmittable) for 'voluntary HIV transmission.' Simultaneously an EMP failure (denial of medical reality of the population prosecuted), an ACC failure (criminal-justice system used to enforce incorrect charges), and a healthcare-system harm (25.6% treatment-attendance drop from chilling effects).

First applied toSenegal
Dated
Draft

accountability-decay

Non-accountability for a prior near-floor-conduct event treated as a contemporary ACC-dimension failure. The absence of accountability in the current assessment window is itself scored as a failure, not merely noted as background context. ...

First applied toIndia
Dated
Draft

stated-commitment-operational-hollowing

A positive scoring event from a prior assessment cycle (April 19 Switzerland talks commitment to protect civilians and ease aid) is reversed when contemporaneous primary-source evidence documents that the commitment did not translate into operational change. The hollowing is documented by the temporal overlap of the stated commitment and the adverse evidence (Amnesty 71-interview report covering the same window).

Dated
Draft

proactive-disclosure-under-pressure-before-legislative-compulsion

An institution discloses or implements a transparency reform in response to regulatory or legislative pressure, but before the compulsory mechanism is fully operationalized. Sits between full external-accountability-reversal (voluntary) and reactive compliance (compelled). ...

First applied toUnitedHealth Group
Dated
Draft

coercive-diplomacy-under-ceasefire

A state uses the threat of military resumption as a diplomatic coercion tool during an active ceasefire, to compel a specified behavior from the opposing party. Constitutes a B1 consent failure (unilateral threat violates ceasefire consent framework) and an I1 consistency failure (ceasefire-professed posture contradicted by war-resumption threat). ...

First applied toIsrael
Dated

Confirmed positions

Entities reassessed for this briefing where published scores remain supported by current evidence.

Confirmed positions from the May 14 briefing.
EntityIndexBandPublishedAssessedDeltaDateFinding
Countriesdeveloping20.320.30Pakistan MONSOON WATCH DAY 2 — confirmed at 20.3 Developing. No mass-casualty flooding in 24h window. PDMA/NDMA active response. Watch closes May 17 if no mass-casualty event documented from May 13-17 weather system.
Countriesfunctional50500Ukraine ASYMMETRIC-CONDUCT CONFIRMATION at 50.0 Functional. May 14 Russia mass strike demonstrates pattern asymmetry: Ukraine sustained defensive posture, targeted Russian command posts only. +3.1 post-ceasefire credit sustained. No evidence of Ukrainian-attributed large-scale civilian targeting in 24h window.
UnitedHealth Group
Fortune500critical10.911.4+0.5UnitedHealth Group DOCUMENTED: +0.5 sub-threshold movement (10.9 → 11.4). Optum Rx transparent pricing model credited as first proactive transparency reform (+0.125 INT half-step) under new 'proactive-disclosure-under-pressure-before-legislative-compulsion' sub-category. Methodology precedent under formalization. DOJ probe and court-ordered AI disclosure offset further credit.

Floor conduct record

Cycle-specific conduct documentation for entities at composite zero, recorded for the May 14 briefing.

Math hygiene

Entities where published composite and reconstructed composite diverge. Tracked openly as a publication-integrity obligation.

Open Bionics at 12 cycles (incremented from 11). Formula audit is CRITICAL BLOCKING item. 13 entities total unchanged. Math-hygiene flags from today's cycle: India composite math 30.47 → reported 30.5 (within 0.5pt tolerance, no flag); Senegal math 32.81 → reported 33.6 (0.8pt rounding gap, within tolerance but tracked); UnitedHealth math 11.33 → reported 11.4 (0.1pt tolerance, clean). No new flags added to cluster.

Carry-forward dimensional credits

·5 entities with documented pressure not yet reflected in composite

Hungary

41.4

Ukraine

50

Waymo

35.9

Vanuatu

35.9

Mongolia

48.4

Held this cycle

·5 entities deferred with documented reason
  • Ai Labs

    Anthropic

    Pentagon blacklist maintained / White House carve-out EO in drafting / Mythos dual-use cybersecurity deployment confirmed despite blacklist / Claude Opus 4.7 released May 13. Full assessment queued for tomorrow.

  • Fortune500

    Microsoft

    Musk v. Altman closing arguments proceeding May 14-15. Nadella testified May 11: Musk never raised concerns to him; Microsoft to spend $100B+ on OpenAI by June 2026. Hold expires tomorrow regardless of verdict timing.

  • Ai Labs

    Google

    Hold EXPIRED today (May 14). No material compassion event found May 14. Queued for May 15 standard rotation assessment. Key evidence: DOJ remedies order (April 14), cross-appeal filed, EU DMA team dispatch, choice screen mandate.

  • Ai Labs

    OpenAI

    Musk v. Altman trial closing arguments May 14-15. Advisory jury to deliberate. Judge Gonzalez Rogers ruling expected mid-May. Hold maintained through verdict.

  • Robotics Labs

    Open Bionics

    Math-hygiene formula audit hold — CRITICAL BLOCKING, 12 cycles open. Published 97.5 carries -10.0 discrepancy. Do NOT re-queue for assessment until formula audit is complete.

Forward signals

Calendar of upcoming scoring events the methodology pipeline is tracking.

·1 signal
  • Closing arguments in Oakland. Advisory jury deliberates. Charitable-trust-breach finding would be ACC/SYS scoring event for OpenAI and Microsoft. Pre-stage both for post-verdict assessment.

·3 signals
  • Hold expires. Full assessment: Pentagon blacklist vs. safety red-lines vs. White House carve-out vs. Mythos dual-use deployment vs. Claude Opus 4.7 release vs. Claude for Small Business. Most complex AI labs scoring event in active pipeline.

  • Hold expires. Assess post-closing-argument. If verdict arrives before assessment: incorporate charitable trust ruling as ACC/SYS signal. Nadella testimony documented.

  • Google

    Hold expired today. Standard rotation assessment May 15. Key evidence: DOJ remedies order (April 14), cross-appeal filed, EU DMA comment period closed May 13, choice screen mandate, no exclusive distribution contracts.

·1 signal
  • Pakistan monsoon watch closes May 17. If NDMA sitreps document significant casualties or displacement from May 13-17 weather system, EMP dimension update required. Check ndma.gov.pk/sitrepm.

·1 signal
  • UNGA vote on Vanuatu ICJ climate resolution. Positive INT-dimension event if passed. Re-scan May 21 for Marshall Islands, Kiribati, Timor-Leste (all at 39.1 — 0.9 below Functional floor). Vote outcome could trigger band crossings.

·1 signal
  • Hold expires on estimated verdict. Breach-of-charitable-trust finding is ACC/SYS scoring event. Pacific cluster reassessment same day.

·1 signal
  • Magyar-Brussels visit (von der Leyen summit). EU fund pathway discussion. First concrete legislative milestone signal expected. ACT/SYS dimension evidence generation.

Analytical notes

Observations on methodology, evidence quality, and structural patterns from the May 14 briefing.

Note

May 14 documented the first 'partial external-accountability-reversal' sub-anchor positive in an AI Labs floor entity (xAI's GitHub system-prompt pledge) and the first 'medical-science-denial enforcement' designation (Senegal charging undetectable-viral-load patients). Both require new methodology sub-distinctions: the first between partial and full external-accountability-reversal; the second formalizing prosecutorial science-denial as a compound ACC+EMP failure category. Nine new conduct categories in a single cycle suggests the methodology's vocabulary is expanding faster than its consolidation pace — a sign of a rich evidentiary environment but also a methodological housekeeping need.

Note

The 'stated-commitment-operational-hollowing' pattern now recurs across three floor-cluster entities in this cycle alone: DRC (April 19 Switzerland talks vs. May 5 Amnesty report), Myanmar (April 21 100-day peace plan vs. May 6-14 strike cluster), Russia (May 9-11 ceasefire format vs. May 9-14 offensive surge). This cross-cluster frequency suggests the pattern is systematic enough to warrant a unified conduct category rather than entity-specific designations. Formalizing it as a top-level category would allow the benchmark to track the prevalence of stated-commitment gaps across the full entity set as a structural indicator of institutional integrity.

Floor designations

·8 entities at composite 0 with documented evidence pattern

Composite scores resolving at zero — methodology disclosure

These entities have all 8 dimensions resolving at the lowest behavioral anchor (1.0/5.0) across multiple assessment cycles. Read the methodology.

Weekly score highlights

Get the week's most consequential findings in one email.

Every Friday — a curated summary of the week's top score movements, sector findings, and evidence-linked analysis across governments, corporations, AI labs, and conflict actors. Daily briefings publish here on the site; the Friday email brings the week's highlights to your inbox.

Weekly compassion score highlights

Top findings across 1,155 entities, every Friday. Free.

No spam. No third-party sharing. Unsubscribe at any time.

Get the full benchmark report

Daily briefings surface headline findings. Full benchmark reports include complete methodology documentation, all 40 subdimension scores, full evidence trails, certified assessments, and sector-level analysis packages.

Viewing May 14

View archive