Compassion Benchmark

Viewing archive: May 5

Compassion BenchmarkTuesday, May 5, 2026No. 21

Daily Briefing

Daily compassion intelligence across 1,155 indexed entities.

Entities monitored
1,156
Fully assessed
14
Score changes
0
Risk signals
0

Today's analysis

The most significant editorial findings in the May 5 briefing.

01

Math-hygiene cluster grew from 3 to 8 entities in one cycle, now spanning all five bands — bidirectional discrepancies rule out a single formula error; bulk data-team audit overdue.

02

Four floor entities documented new conduct categories: South Sudan (UNMISS troop cut during civil war reignition — most structurally significant), xAI/Grok (Musk under-oath distillation admission — highest evidentiary weight), Israel (second judicial detention extension, custodial mistreatment), Myanmar (definitive UN SR airstrike trajectory: 9 → 1,140 in four years).

03

Brockman and Musk produced sworn contradictions of prior public representations within five days in the same trial — trial-record evidence is qualitatively different from self-reported conduct and represents a recurring high-evidentiary-weight source.

04

Ethiopia as floor-designation precursor: TPLF council reinstatement triggered EU 'imperative' warning; Pretoria Agreement collapse risk is now the most acute non-floor governance crisis in the countries index.

05

14 entities assessed, 0 proposals generated, 5 holds respected (Anthropic, Hungary, Microsoft, Meta Platforms, Palantir AI) — all floor-entity composite scores held at 0 per floor protocol despite new conduct documentation.

Signal stack

6 signals
medium

Countries — Competing Ceasefires

Russia and Ukraine declared competing ceasefire frameworks (Russia: May 8–9; Ukraine: May 5–6).

medium

AI Labs — Governance Transparency Crisis

Brockman testimony (May 4–5) adds journal-entry evidence that OpenAI leadership knew the for-profit conversion 'would make us liars.' First public IPO confirmation.

medium

Countries — Conflict Floor Cluster

All four floor or floor-proximate entities received new conduct documentation this cycle.

medium

Data Integrity — Math-Hygiene Cluster

Math-hygiene cluster grew from 3 to 8 entities in one cycle, now spanning all five bands.

medium

Countries — Hungary Structural Transition

Magyar swearing-in confirmed for May 9.

medium

AI Labs — Anthropic Dual Trigger

May 15 Phase 1 safeguard inventory delivery deadline and May 19 DC Circuit oral arguments (Pentagon blacklist).

Score movements

All entities assessed this cycle. No score changes.

14 assessed
Countries
63.9
established

Evidence ledger

Primary sources reviewed in this briefing cycle. 10 sources linked.

Primary sources reviewed in this briefing: domain, source type, entity linked, dimension, and external link.
SourceTypeEntityDimensionLink
euronews.comNewsCountries — Competing CeasefiresOpen
aljazeera.comNewsCountries — Competing CeasefiresOpen
creati.aiSourceAI Labs — Governance Transparency CrisisOpen
techcrunch.comSourceAI Labs — Governance Transparency CrisisOpen
press.un.orgGovernmentCountries — Conflict Floor ClusterOpen
aljazeera.comNewsCountries — Conflict Floor ClusterOpen
ohchr.orgSourceCountries — Conflict Floor ClusterOpen
borkena.comSourceCountries — Conflict Floor ClusterOpen
english.nv.uaSourceCountries — Hungary Structural TransitionOpen
mlex.comSourceAI Labs — Anthropic Dual TriggerOpen

Sector findings

Patterns emerging across indexed sectors in the May 5 briefing.

AI Labs — Governance Transparency Under Oath

  • Two sworn testimonies within five days have produced the highest-evidentiary-weight entries in their respective floor or near-floor records. On April 30, Musk admitted under oath that xAI used OpenAI model distillation to train Grok. On May 4–5, Brockman's personal journal was admitted into evidence, recording his internal awareness that the for-profit conversion 'would make us liars.' These are not press-release retractions or leaked communications — they are primary-source trial-record evidence with full evidentiary weight under oath. The pattern is analytically significant: two senior AI sector figures, in the same trial, produced sworn contradictions of prior public representations within five days.
  • The Brockman $30B equity stake disclosure and IPO confirmation are structurally distinct from the journal-entry evidence. The governance architecture of a non-profit-origin organization providing its president a $30B equity stake without capital contribution is a SYS finding, not an INT finding. The trial has now produced material evidence across both dimensions for OpenAI, reinforcing the sub-threshold but evidentially rich -1.5 assessed delta.
  • xAI/Grok's distillation admission has dual relevance: it is floor-notation evidence for xAI (INT category: public model-lineage framing contradicted by under-oath testimony) and collateral corroborating evidence for OpenAI (confirming that OpenAI training material was extracted without license by a competitor). The OpenAI assessment correctly weights this as supporting context, not scored evidence.

AI Labs — Safety Standards: Verified Worst-Case Cluster

  • DeepSeek: 0% harmful prompt blocking per Cisco study. US State Department global diplomatic warning naming DeepSeek. Anthropic fraudulent-account extraction allegation. Hidden data-transmission capability to China Mobile. The cumulative safety and integrity record is the worst-documented in the AI labs index.
  • Mistral AI: 60x more likely to generate CSEM than OpenAI models; 18–40x more likely to generate CBRN content per independent Enkrypt AI study (Euronews, May 2025, verified as prior cycle evidence). French Ministry of Armed Forces 3-year defense framework adds defense-sector deployment scope to the CSEM/CBRN baseline.
  • xAI/Grok: CSAM safeguard failure documented in January 2026 California investigation; under-oath distillation admission April 30; no publicly documented remediation for either finding. These three entities constitute a safety-standard cluster that stands in systematic contrast to the rest of the AI labs index. Their combined profile warrants coordinated review rather than individual cycle-by-cycle assessment.

Countries — Conflict Floor Cluster: Structural Deterioration

  • All four floor-designated or floor-proximate entities received new conduct documentation this cycle, but each in a qualitatively distinct category. Israel: judicial extension of detention and documented mistreatment in custody of foreign civilian humanitarian actors — the extraterritorial and custodial dimensions are new. Myanmar: definitive quantified four-year airstrike escalation trajectory (9 → 1,140 airstrikes) from UN SR final report. South Sudan: UNMISS troop ceiling reduction by 26% concurrent with active civil war reignition — a structural international protection regression. Ethiopia: TPLF council reinstatement threatening Pretoria Agreement collapse — a peace-accord integrity event.
  • South Sudan is the most structurally significant of the four. The UNSC formally reduced the authorized peacekeeping capacity in an active conflict zone — this is not a failure of on-the-ground implementation but a deliberate Security Council decision. The combination of UNMISS troop reduction, civil war reignition, and over a year of humanitarian access denial in Nasir County represents the worst protection regression in the floor cluster this cycle.
  • Ethiopia's Pretoria Agreement trajectory is the operationally critical monitoring item beyond the floor cluster. If the federal government responds to the TPLF council reinstatement with force, or if TPLF advances beyond administrative assertion into territorial recapture, Ethiopia crosses the floor-designation threshold. The EU 'imperative' warning on May 1 reflects external recognition of this risk. Ethiopia at 10.6 is the most probable floor-designation candidate in the pipeline.

Countries — Microstate Rotation: Math-Hygiene Cluster Cross-Band Expansion

  • The math-hygiene cluster grew from 3 to 8 entities in this cycle alone. The five new flags added today — Nigeria (−5.0, Critical band), Ethiopia (canonical published score reconciled), San Marino (−3.0, Established band), Seychelles (−3.0, Established band), Malta (−3.0, Established band) — join the three existing flags from the May 4 cycle: Open Bionics (Exemplary), Costco (Established), PayPal (Established).
  • The bidirectional character of the discrepancies rules out a single formula error in one direction. Upper-band entities (Open Bionics, Costco, PayPal, San Marino, Seychelles, Malta) show published composites higher than dimension reconstructions. Nigeria shows the reverse. The cross-band spread — Critical through Exemplary — and bidirectional discrepancy pattern indicates at least two distinct formula variants or evidence-source patterns were used in original scoring.
  • The microstate rotation backfill is the trigger event for this cycle's expansion. San Marino, Seychelles, and Malta all showed the same −3.0 discrepancy, suggesting a possible systematic offset applied to European microstate scoring that did not propagate into the composite formula. Data team audit of all microstate composites in the Established and Exemplary bands is the minimum recommended response.

Risk signals

Developments that may affect future scores. Watch items from the May 5 briefing.

Risk

Math-hygiene cluster bulk audit operationally overdue (HIGH — data integrity). The cluster grew from 3 to 8 entities in one cycle spanning all five bands. If discovery continues at this rate — 5 new flags per rotation backfill batch — the pipeline will accumulate 20+ open flags before a bulk review can complete. Open Bionics has now carried for 4 cycles; this cannot continue indefinitely. Data team should prioritize a complete audit of all published composites against dimension reconstruction before the next backfill cycle.

Risk

Ethiopia floor-designation trigger (HIGH — countries index). TPLF council reinstatement is now the most acute Pretoria Agreement integrity event to date. If the federal government responds with force, or if TPLF advances beyond administrative assertion into territorial recapture, the assessor faces a floor-designation threshold call. Scanner should flag all Ethiopia/Tigray conflict reporting as emergency re-queue triggers.

Risk

Ukraine May 9 ceasefire verification (HIGH — countries index). Emergency re-queue protocol is active. Russia's May 8–9 window and Ukraine's May 5–6 window are running in sequence, not simultaneously. If either window produces a de facto cessation of hostilities, the May 9–10 assessment must evaluate whether the combined effect constitutes a meaningful humanitarian signal. If both windows are violated, the assessment logic shifts to compound-escalation territory.

Risk

OpenAI May 21 advisory verdict (HIGH — AI labs index). The evidentiary record has been materially strengthened by Brockman's journal-entry testimony. The advisory verdict is the highest-consequence binary event in the pipeline. Breach of charitable trust: ACC/SYS event. Fraudulent-inducement finding: INT event. No-breach finding: stabilizes current assessed range.

Risk

Anthropic dual trigger (MEDIUM — AI labs index). May 15 Phase 1 safeguard inventory delivery and May 19 DC Circuit oral arguments are the two near-term determinants. Anthropic at exactly 60.0 is the boundary case with the highest confirmed volatility flag in the pipeline — any material evidence in either direction produces a band change.

Risk

Hungary May 9 structural transition (MEDIUM — countries index). Magyar swearing-in confirmed. First-day actions (state media suspension, new ministry creation, EU dialogue initiation) are the operative scoring inputs. First major potential positive band-change event in the countries index in several cycles.

Risk

EU AI Act August 2 enforcement deadline (MEDIUM — AI labs index, systemic). The August 2 deadline for prohibited AI system prohibitions under the EU AI Act is 89 days away. DeepSeek, Mistral military contract, and xAI have the highest documented exposure. Scanner should pre-stage a compliance-status review for the AI labs index.

Confirmed positions

Entities reassessed for this briefing where published scores remain supported by current evidence.

Confirmed positions from the May 5 briefing.
EntityIndexBandPublishedAssessedDeltaDateFinding
Countriescritical000Israel floor confirmed with two new procedural conduct categories. May 5: a second judicial extension of detention was issued for the approximately 175 flotilla crew members seized in international waters April 29–30. Amnesty International documents a structured mistreatment pattern: blindfolding during medical examinations, continuous 24/7 lighting, isolation from other detainees, beating to unconsciousness (activist Thiago Ávila twice), and a reported hunger strike. Spain and Brazil formally characterized the interception as 'kidnapping' in diplomatic communications — a materially stronger legal framing than 'illegal interception.' New conduct categories appended to the floor-exit criteria record: (1) second judicially authorized extension of detention of foreign civilian humanitarian actors seized in international waters; (2) documented isolation, sensory manipulation, and physical mistreatment in custody. Composite remains 0; no floor-exit criteria met.
Countriescritical000Myanmar floor confirmed with new quantified airstrike trajectory documentation. UN Special Rapporteur Tom Andrews' April 2026 final report provides the most comprehensive quantification of the junta's air campaign to date: 9 airstrikes in 2021 (coup year) scaling to 1,140 airstrikes in 2025 — a 127x escalation over four years. Total cumulative airstrikes since the coup: 9,400+; 3,800+ civilian deaths from airstrikes alone. 3.6M+ internally displaced; 12M+ people in acute hunger. The SR explicitly called for international action. April 10: Min Aung Hlaing was sworn in as president, closing the 'transitional' framing the junta had maintained since 2021. April 26: martial law imposed across 60 townships following inauguration. Junta controls fewer than 40% of townships nationwide. Floor classification confirmed; composite remains 0.
Countriescritical000South Sudan floor confirmed — most structurally significant floor documentation of this cycle. Three simultaneous developments mark a qualitative deterioration in international protection architecture: (1) Civil war has reignited in Jonglei State, with 267,000+ newly displaced as of late April 2026. (2) UNSC Resolution 2824 (April 30) extended UNMISS to April 2027 but cut the troop ceiling from 17,000 to 12,500 — a 26% reduction in authorized peacekeeping capacity at precisely the moment conflict has re-escalated. Pakistan publicly warned the reduction was unjustified. (3) Humanitarian access has been denied to Nasir County for more than one year. The UNMISS troop ceiling cut while civil war is active is the most structurally significant international protection regression in the floor cluster this cycle: it represents the UN Security Council formally reducing its commitment to civilian protection in an active conflict zone.
Ai Labscritical000xAI/Grok floor confirmed with highest-evidentiary-weight entry in the floor record to date. On April 30, Elon Musk testified under oath in the Musk v. Altman trial that xAI used OpenAI model distillation to train Grok. When asked directly, Musk responded 'Partly.' This constitutes primary-source trial-record testimony directly contradicting xAI's prior public framing of independent model development. The evidentiary weight of sworn trial testimony is categorically higher than press releases, blog posts, or executive statements. INT floor notation added: the distillation admission means that xAI's public claims about Grok's independent lineage were materially false. This is also collateral evidence for the OpenAI assessment: it corroborates that OpenAI's training data and model outputs were used without license by a competitor. Floor composite remains 0; no floor-exit criteria met.
Ai Labsdeveloping27.526-1.5OpenAI assessed at 26.0 (-1.5 sub-threshold; Developing band sustained). Greg Brockman's trial testimony on May 4–5 adds three materially significant primary-source disclosures: (1) Brockman received approximately $30B in equity stake without proportionate capital contribution — a governance structure that INT and SYS dimensions must weigh; (2) Brockman's personal journal entry, admitted into evidence, records that the for-profit conversion would make him and Altman 'liars' — a first-party admission of internal awareness that public commitments were being breached; (3) Brockman confirmed on the stand that an IPO is planned, the first public confirmation by a company officer. Per methodology: verdict-pending hold is preserved for the May 21 advisory verdict. The journal admission is the most consequential primary-source OpenAI disclosure since the trial began. INT -0.1 (sworn evidence of internal awareness of commitment breach), SYS -0.1 (governance structure generating $30B stake without capital). Musk v. Altman advisory verdict expected approximately May 21.
Countriesfunctional50500Ukraine confirmed at 50.0 under emergency re-queue protocol. Russia and Ukraine declared competing (not coordinated) ceasefire frameworks: Russia announced a unilateral May 8–9 Victory Day pause; Ukraine proposed a May 5–6 ceasefire window. These are adversarial framings, not a coordinated cessation. Per bad-faith-ceasefire protocol: unilateral declarations with adversarial framing are signal-level not resolution-level events. The April 10 Orthodox Easter ceasefire (32 hours, held without documented violations) remains the only successfully implemented ceasefire of the conflict. Score held at 50.0. Emergency re-queue mandatory for May 9–10 following Victory Day; any de facto cessation of hostilities around that window triggers immediate reassessment.
Countriescritical10.910.6-0.3Ethiopia confirmed at 10.6 (-0.3 sub-threshold; Critical band sustained). The April 20–28 TPLF unilateral reinstatement of the pre-coup Tigray regional council is the most acute Pretoria Agreement integrity event since the peace accord was signed. Tigray Interim Administration President Getachew Reda stated the move 'nullifies the Pretoria Agreement.' The EU issued an 'imperative' warning on May 1, calling on all parties to avoid another devastating conflict. Active EU and US diplomatic engagement is underway. BND -0.3 dock for peace-accord stability erosion. Ethiopia remains the most probable floor-designation candidate among non-floor-designated active-conflict countries: the Pretoria Agreement framework is the sole structural positive holding the score above zero. Math-hygiene note: assessor reconstruction from documented dimension scores yields 10.9; clobbered-file publishedScore of 5.9 was an artifact of the prepare-updates error. Canonical published score is 10.9.
Ai Labsfunctional46.945.6-1.3Mistral AI assessed at 45.6 (-1.3 sub-threshold; Functional band sustained). French Ministry of Armed Forces 3-year defense framework (signed December 16 2025, announced January 8 2026) covers all armed services, intelligence units, and research agencies (CEA, ONERA, naval hydrographic units). Deployment architecture is sovereign (on-premises, private-cloud, or self-hosted via AMIAD — the ministerial AI defense agency), which partially mitigates the consent-boundary concerns of commercial cloud defense contracts. BND -0.1 for defense-framework scope expansion; SYS -0.1 for AMIAD integration without public accountability mechanism. The CSEM/CBRN Enkrypt AI findings (prior cycle, April 16 baseline) are not re-scored here — no double-counting. Cumulative baseline remains: April 16 -29.5 composite downgrade (76.4 → 46.9) from CSEM/CBRN findings, Thorn partnership context, and EU AI Act compliance preparation.
Ai Labscritical18.816.5-2.3DeepSeek assessed at 16.5 (-2.3 sub-threshold; Critical band sustained). Three new developments compound the prior baseline: (1) US State Department issued a global diplomatic cable warning on April 25 naming DeepSeek among AI models distilled from US proprietary sources without license; this is the first government-level formal international warning against a specific AI lab in this pipeline. (2) Anthropic in February 2026 alleged DeepSeek used thousands of fraudulent accounts to extract millions of Claude conversations — automated extraction at scale consistent with model distillation. (3) Cisco safety study confirms 0% harmful prompt blocking — DeepSeek failed to block a single harmful prompt across the full test suite, against GPT-4o at 86% and Gemini at 64%. ACT -0.1 (state-level warning and distillation allegations), BND -0.1 (0% harmful-prompt blocking), INT -0.1 (fraudulent account allegations).
Countriescritical18.418.40Nigeria first baseline established at 18.4 in the Critical band. HRW World Report 2026 documents Boko Haram (JAS faction) resurgence in Borno: May 2025 Mallam Karamti/Kwatandashi attacks (57+ killed, approximately 70 missing); September 2025 Darul Jamal/Bama LGA attack (60+ killed including soldiers). ISWAP continued northeast operations. Nigerian security forces implicated in airstrike civilian deaths and widespread security failures. The Tinubu administration has commenced a review of the VAPP (Violence Against Persons Prohibition) Act. On May 4, the Borno Police Commissioner publicly warned officers against torture and rights abuses — an internal acknowledgment-of-conduct signal that is assessable. Math-hygiene flag: published composite 18.4 cannot be reconstructed from documented dimension means (reconstruction yields 23.4, a +5.0 discrepancy). Referred to data team for review; Critical band designation is not in dispute. Population scale: approximately 220 million — Africa's most populous country.
Countriesestablished65.565.50San Marino first baseline established at 65.5 in the Established band (rotation backfill). Stable parliamentary democratic republic; Council of Europe member; EU-aligned framework via 2018 cooperation agreement; UN member state. Universal healthcare; strong education indicators. No active news in the May 2026 window. Math-hygiene flag: published composite 65.5 cannot be reconstructed from documented dimension means (reconstruction yields 62.5, a −3.0 discrepancy). Referred to data team for review; Established band designation is not in dispute. Confidence rated low reflecting absence of primary-source in-window evidence for a microstate with limited news coverage.
Countriesestablished63.963.90Seychelles first baseline established at 63.9 in the Established band (rotation backfill). Small island developing state; Commonwealth and African Union member. The 2020 presidential election represented the first peaceful power transfer since independence — a significant SYS milestone. Freedom House 'Free' designation. SIDS climate-vulnerability framework engagement; UNEP partnerships. Universal healthcare; high HDI ranking among African states. No active news in the May 2026 window. Math-hygiene flag: published composite 63.9 cannot be reconstructed from documented dimension means (reconstruction yields 60.9, a −3.0 discrepancy). Referred to data team for review; Established band designation is not in dispute.
Countriesestablished63.963.90Malta first baseline established at 63.9 in the Established band (rotation backfill). EU member state with full Schengen integration. ILGA-Europe Rainbow Map consistently ranks Malta #1 in LGBTQI+ rights protection — the strongest documented EQU positive in the microstate cluster. Universal healthcare; high HDI. The 2017 assassination of journalist Daphne Caruana Galizia is the operative ACC deficit: the ongoing accountability and judicial reform process for her killing has not produced full accountability for those who ordered it, constituting a sustained press-freedom and accountability gap. No active news in the May 2026 window. Math-hygiene flag: published composite 63.9 cannot be reconstructed from documented dimension means (reconstruction yields 60.9, a −3.0 discrepancy). Referred to data team for review; Established band designation is not in dispute.
Countriesexemplary84.484.40Switzerland confirmed at 84.4; staleness rotation cleared. April 17 2026 baseline sustained: Geneva humanitarian-hub status, ICRC partnership, UN Geneva headquarters, mandatory health insurance system. Ukraine S-status protection extended to March 2026; refugee university pilot launched March 2025. Neutrality framework under ongoing parliamentary debate; SNP restrictions on military alignment maintained. UN humanitarian funding cuts in 2025 impacted Switzerland-based organizations — a sustained negative signal. No active news in the May 2026 window. No math-hygiene flag: assessed composite reconstructs consistently from documented dimension scores.

Analytical notes

Observations on methodology, evidence quality, and structural patterns from the May 5 briefing.

Note

The math-hygiene cluster's cross-band expansion is the most significant data-integrity event in the pipeline to date. When the cluster was three entities (all Established or Exemplary), it was plausible that upper-band original scoring had applied a different composite formula. With eight entities spanning all five bands and bidirectional discrepancies, the evidence now points to at least two distinct formula variants or two evidence-source conventions used in original scoring. This is not a cosmetic issue: it means some published scores are systematically not comparable with others, which undermines the cross-entity rankings that are the pipeline's primary product.

Note

The four floor-entity conduct documentations this cycle share a structural feature: each records a new procedural or evidentiary category, not merely additional instances of existing conduct. Israel adds judicially authorized extended detention and documented custodial mistreatment of foreign humanitarian actors. Myanmar adds a quantified four-year airstrike escalation trajectory from the UN SR final report. South Sudan adds a formal Security Council authorization reducing peacekeeping capacity in an active conflict. xAI/Grok adds under-oath primary-source testimony contradicting public model-lineage framing. New categories individually complicate floor-exit criteria — they are not incrementally cumulative.

Note

The Brockman and Musk sworn-testimony parallel is analytically significant beyond the individual entity scores. Two of the most prominent figures in the AI sector produced, within five days in the same trial, primary-source contradictions of prior public representations — one about governance commitments, one about model lineage. The pattern is not coincidental: it reflects the structural effect of adversarial legal discovery on an industry whose public-facing conduct record has been largely self-reported. Trial-record evidence is qualitatively different from press releases and may become a recurring source of high-evidentiary-weight findings.

Note

Ukraine's competing-ceasefire situation is precisely the event the May 4 emergency re-queue flag was designed to detect. The bad-faith-ceasefire rule's application here is analytically correct: dual unilateral declarations with adversarial framing are signal-level, not resolution-level. The pipeline's ability to hold the score at 50.0 while accurately characterizing the significance of the diplomatic moment — and setting a precise re-queue trigger — is the correct methodological posture. The May 9–10 assessment will be the determinative one.

Floor designations

·8 entities at composite 0 with documented evidence pattern

Composite scores resolving at zero — methodology disclosure

These entities have all 8 dimensions resolving at the lowest behavioral anchor (1.0/5.0) across multiple assessment cycles. Read the methodology.

Weekly score highlights

Get the week's most consequential findings in one email.

Every Friday — a curated summary of the week's top score movements, sector findings, and evidence-linked analysis across governments, corporations, AI labs, and conflict actors. Daily briefings publish here on the site; the Friday email brings the week's highlights to your inbox.

Weekly compassion score highlights

Top findings across 1,155 entities, every Friday. Free.

No spam. No third-party sharing. Unsubscribe at any time.

Get the full benchmark report

Daily briefings surface headline findings. Full benchmark reports include complete methodology documentation, all 40 subdimension scores, full evidence trails, certified assessments, and sector-level analysis packages.

Viewing May 5

View archive