Compassion Benchmark
Special BriefingFlagship Annual (inaugural; thereafter yearly) — the benchmark's signature state-of-the-field reportJune 16, 2026

The State of Institutional Compassion — 2026

This is the first comprehensive read on how institutions worldwide recognize, respond to, and reduce suffering. Across seven indexes, 1,156 institutions — every kind, from sovereign states to single-product labs — are scored on one shared 0–100 framework. The headline is sobering and consistent: the modal institution is mediocre, the tails are thin, and almost every institution on Earth, from the worst to the very best, is weakest at the same thing — fairness to those with the least power. This is the state of the field as of mid-2026.

Scope: All 7 indexes, 1,156 scored entities — governments, corporations, AI labs, robotics labs, US states, US cities, and global cities — as of mid-2026. Synthesizes the full field distribution, the cross-cutting Equity pattern, the nine deep-dive Special Briefings published this year, and the ~20 applied scoring cycles since April 2026.

Cohort: 1,156 entities scored across 7 indexes: countries (193), Fortune 500 (448), global cities (250), AI labs (50), robotics labs (50), US states (21), US cities (144). · The five-band distribution: Critical 177 (15.3%) · Developing 535 (46.3%) · Functional 248 (21.5%) · Established 133 (11.5%) · Exemplary 63 (5.4%). · The modal institution is Developing; the field median composite is 35.9 (a high-Developing score); the mean is 38.9. · Equity is the weakest or tied-weakest dimension for 1,046 of 1,156 entities — 90.5% of the field — and the lone below-threshold dimension for 36 of the 63 Exemplary entities.

If you remember one thing

The state of the field is a thin top, a thin bottom, and a vast mediocre middle. Two-thirds of all assessed institutions (783 of 1,156, 67.7%) sit in the Developing or Functional band. The most common single band is Developing, holding 535 entities — nearly half the field. The dramatic floor and ceiling cases together are under 9% of the record. The institution a reader is most likely to encounter is a middling one.

Key Findings

  1. The state of the field is a thin top, a thin bottom, and a vast mediocre middle. Two-thirds of all assessed institutions (783 of 1,156, 67.7%) sit in the Developing or Functional band. The most common single band is Developing, holding 535 entities — nearly half the field. The dramatic floor and ceiling cases together are under 9% of the record. The institution a reader is most likely to encounter is a middling one.
  2. Equity is the single most diagnostic finding in the entire record. It is the weakest or tied-weakest of the eight dimensions for 90.5% of all entities and the strictly lowest for 51.3%. No other dimension comes close. The pattern holds for Switzerland (97.5) and Sudan (0.0) alike — it is not a property of bad institutions, it is a near-constant of institutional conduct. The world's best institutions are, almost without exception, weakest at fairness to those with the least power.
  3. The same 0–100 scale judges a sovereign state, a corporation, and a single-product lab — and that cross-type comparability is the institution's unique asset and its sharpest limit. No other benchmark scores robotics labs on harm accountability at all. But severity and excellence are calibrated *within* a type, not across it: a health insurer at 10.2 and an authoritarian state at 10.3 share a number, not a kind of failure.
  4. The fastest route to the bottom of the scale in 2026 is a signature, not a massacre. The deepest movers this year — Bolivia (a four-cycle descent from 35.9 to 6.3), the United States (into Critical at 17.5), Turkey (10.3), India (15.6) — fell on documented *legal* acts: emergency-powers laws, judicial removal of opposition, enforcement rollbacks adjudicated unlawful. Democratic backsliding, scored on governance evidence, is the live entry pattern.
  5. The benchmark prices proof, not accusation — and the discipline runs both ways. A 42-state subpoena did not move OpenAI's score; one documented coercive-severance clause moved Oracle into Critical. The same rule that discounts an allegation also withholds a top score from forced improvement: Microsoft's compelled human-rights remedy is held below a self-initiated one, and Hungary's genuine multi-cycle recovery reaches only the middle. The benchmark grades sustained, proven, self-initiated conduct.
  6. What an institution builds tracks how it scores. In both technology indexes, sorting by what the product is *for* produces a single gradient: defense, surveillance, and weapons cluster at the floor; healthcare, accessibility, and assistive technology cluster at the ceiling. The four entities at the absolute 0.0 floor across both indexes — Ghost Robotics, Palantir AI, xAI/Grok, Character AI — are floored by what they build, not merely how they behave.
  7. Rare upgrades show the field is not frozen. Against a year of downgrades, a handful of institutions climbed on documented, self-initiated reform: Bangladesh exited the world's ten-worst for workers' rights for the first time since 2017 (+5.4); Hungary recovered across multiple cycles to the middle; Botswana and Venezuela posted gains. The benchmark credits genuine reversal — but only when it is proven and sustained, and most recovery stories stall in the middle.

The field

1,156 entities across the five bands — the full distribution this briefing draws from.

17715%53947%24521%13211%Critical 0–20Developing 20–40Functional 40–60Established 60–80Exemplary 80–100
Source: Compassion Benchmark · CC-BY

1. Frame

This is the inaugural edition of the Compassion Benchmark's flagship annual report — the institution's signature artifact, the read-it-first synthesis of everything the benchmark measures. Where the daily pipeline scans all entities for what changed in the last fortnight, and the year's Special Briefings each took one pattern and went deep, this report does the opposite of both: it pulls back to the whole field and asks the simplest, hardest question the benchmark exists to answer.

As of mid-2026, how do the world's institutions recognize, respond to, and reduce suffering — and what does the complete record actually show?

The benchmark applies one 0–100 framework, built on eight dimensions of conduct, to every institution it assesses: a sovereign state, a Fortune 500 corporation, an AI lab, a robotics lab, a US state, a US city, a global city. As of this writing it has scored 1,156 entities across seven indexes. That single shared scale is the institution's defining bet: that "how an institution treats those it affects" is a coherent thing to measure across radically different kinds of institution — and that putting them on one axis reveals patterns no single-sector ranking can.

This report is an honest first snapshot. The benchmark's structured data is 2026; this is not a multi-year trend report and makes no claim to be one. What it can do, and does, is three things at once: describe the state of the field (the full distribution across the five bands), name the one cross-cutting pattern that defines it (the Equity gap), and synthesize the year's evidence — the nine deep-dives and the roughly twenty scoring cycles applied since April — into a single, citable read.

The central thesis is plain and uncomfortable: the state of institutional compassion in 2026 is mediocrity at scale. The modal institution is Developing; the field median is 35.9; the tails — the genuinely failed and the genuinely exemplary — are thin. And underneath the distribution sits one near-universal pattern: institutions of every kind, at every level of overall performance, are weakest at equity — fairness toward those with the greatest need and the least power. That single finding unifies the top and the bottom of the scale and is, by some distance, the most diagnostic thing the record contains.


2. The state of the field — the signature finding

All counts in this section are recomputed directly from rankings[] in each index JSON, using the canonical band cutoffs (Critical ≤ 20, Developing 20–40, Functional 40–60, Established 60–80, Exemplary ≥ 80). They reconcile exactly with the published per-index data and with the field figures reported in this year's Middle of the Scale briefing.

2.1 The five-band distribution

BandComposite rangeEntitiesShare of field
Critical0–2017715.3%
Developing20–4053546.3%
Functional40–6024821.5%
Established60–8013311.5%
Exemplary80–100635.4%
Total1,156100%

The shape of the field is the headline. The modal institution is Developing — 535 entities, nearly half the record, sit in the 20–40 band. Add the Functional band and 783 of 1,156 entities (67.7%) live in the middle. The field median composite is 35.9 — a high-Developing score — and the mean is 38.9. The dramatic cases the public hears about — the 23 institutions at the absolute floor, the 63 in the Exemplary band — together account for fewer than one entity in eleven.

This is the state-of-the-field finding, and it is sobering precisely because it is undramatic: the world's institutions, measured on how they treat those they affect, are overwhelmingly mediocre. Not collapsed, not exemplary — partial. A Developing or Functional score means, in the benchmark's own terms, that core practices may exist and meet a basic bar, with significant gaps remaining: systems on paper or in pilot, without the disaggregated data, independent audit, or sustained-under-pressure evidence that the higher bands require. It is the band of we started, but we cannot prove it worked. The tails are exceptional; the middle is the world.

2.2 The distribution by index

Each index has its own center of gravity, and the differences are themselves a finding.

IndexnCriticalDevelopingFunctionalEstablishedExemplaryMean composite
countries193457929261436.5
fortune-5004485321511756739.4
global-cities2506810536251635.2
ai-labs50518158443.6
robotics-labs5021010151356.7
us-states214900846.9
us-cities144099413138.1
Total1,1561775352481336338.9

Three structural facts stand out:

  • The Fortune 500 is the largest and the most middling index. 448 corporations cluster heavily in Developing (215) and Functional (117), with only 7 Exemplary (1.6%) and zero at the floor. The corporate field is a dense mediocre middle with very few standouts in either direction.
  • Robotics labs are the highest-scoring index (mean 56.7; 26% Exemplary) and countries and global cities are the lowest-centered (means 35.2–36.5; the deepest Critical concentrations). This is not because robotics labs are "better institutions" than states — it is because a single pro-social product line clears the band on a far narrower evidentiary surface than a whole-of-population sovereign test does (see §4).
  • The bands are an artifact of evidentiary surface as much as of conduct. US states show a barbell (8 Exemplary, 4 Critical, nothing in Functional/Established) reflecting a partial index; US cities have zero Critical entities even though the United States itself is now Critical, because cities are scored on domestic-service delivery insulated from the federal-conduct downgrade. The distribution must be read per-type, not as one league table.

2.3 The thin tails

  • The bottom: 177 entities are Critical, and 23 sit at the absolute 0.0 floor — every one carrying an identical profile, all eight dimensions collapsed to the minimum anchor. The floor is not a low score; it is a discrete state, reserved by ruling for conduct that leaves no remediation surface to credit: active in-territory state perpetration (Russia, Myanmar, North Korea, Eritrea, Turkmenistan, Belarus, Syria), conflict-driven famine (Sudan, South Sudan, Yemen), formally recognized structural atrocity (Afghanistan, Israel), and product-is-the-harm labs (xAI/Grok, Character AI, Palantir AI, Ghost Robotics).
  • The top: 63 entities are Exemplary (composite ≥ 80). They are led by Nordic and Alpine cities (Copenhagen, Helsinki, Oslo, Stockholm, Vienna all at 100), Switzerland (97.5) among countries, assistive-robotics labs (Open Bionics 97.5, Ottobock 95.9), and a handful of corporations (Target 92.8). Exemplary is an integration achievement, not a peak score — it is reached by having no weak dimension, not by excelling at one.

The single most important structural fact about the tails is their symmetry: the floor and the ceiling are produced by the same formula mechanic run in opposite directions (total collapse vs. uniform high conduct), and both are reached on different evidentiary surfaces across types. That makes the cross-type read — "this lab is as good as that country," "this insurer is as bad as that state" — the one thing the framework cannot support, at either end.


3. The cross-cutting pattern — the Equity gap

If §2 is the shape of the field, this is the single thread that runs through all of it. It is the most diagnostic finding the benchmark has produced, and it is the throughline of this report.

The benchmark scores eight dimensions: Awareness, Empathy, Action, Equity, Boundaries, Accountability, Systemic Thinking, and Integrity. Equity (EQU) measures whether care is distributed fairly, with explicit priority for those who have the greatest need and the least power — its subdimensions are Universality, Priority for Vulnerable, Bias Awareness, Access Design, and Historical Harm Acknowledgment. It asks not "does this institution help?" but "does it help the people who are hardest to reach, and can it prove it?"

Equity is the weakest or tied-weakest dimension for 1,046 of 1,156 entities — 90.5% of the field — and the strictly lowest single dimension for 593 (51.3%). No other dimension is the minimum for anything approaching a majority. Recomputed directly from scores{} across all seven indexes:

IndexnEQU weakest/tiedEQU %Lowest-mean dimension
countries19318093%EQU
fortune-50044842896%EQU
global-cities25018474%BND (EQU 2nd)
ai-labs504488%EQU
robotics-labs5050100%EQU
us-states212095%EQU
us-cities14414097%EQU
Total1,1561,04690.5%EQU (2.21 global mean)

The global Equity mean is 2.21 on the 1–5 scale — the lowest of all eight dimensions (next-lowest is Integrity at 2.45; Action peaks at 2.69). It is the lowest-mean dimension in six of the seven indexes; the lone exception is global cities, where Boundaries edges it by a sliver and Equity is a close second.

Two facts make this the report's signature cross-cutting finding:

  • It is universal across the quality spectrum. The gap between an entity's other-seven-dimension average and its Equity score holds at every level of overall performance. Switzerland (composite 97.5) carries it; Sudan (composite 0.0) carries it. It is not a marker of failed institutions — it is a near-constant of institutional conduct as the benchmark measures it.
  • It unifies the top and the bottom. At the floor, collapsed equity sits alongside collapsed accountability and integrity. At the ceiling, equity is the lone thing still missing: of the 63 Exemplary entities, 61 have Equity as their weakest dimension, and 36 carry Equity as their single below-threshold score. The benchmark's top band is, in practice, "excellent at everything except fairness." From either end of the scale, Equity is the dimension institutions reach last.

The bar Equity sets is genuinely demanding — its top anchors require disaggregated outcome data, independent audits, and co-design with affected communities — and those are uncommon in the real world and, by at least one external measure, becoming rarer. The unavoidable open question is whether the near-universal equity gap is a finding about how institutions actually behave or an artifact of a bar almost no one meets. The report's view is that it is mostly the former (the gap holds across the entire quality range, including institutions that clear high bars everywhere else), but the question is real and is filed for methodology review.


4. Cross-type comparability — the institution's unique asset, and its sharpest limit

The benchmark's defining feature is that it scores governments, corporations, AI labs, robotics labs, and cities on one framework. This is a genuine and unusual asset. No other institution scores robotics labs on harm accountability at all — the purpose-to-score gradient in that index (§6) is a finding only this record can produce. The shared eight-dimension framework lets a reader ask, for the first time, whether a trillion-dollar AI lab and a sovereign state recognize and reduce suffering on comparable terms.

But the year's deep-dives establish, repeatedly, that comparability is within-type, not cross-type — and saying so explicitly is part of what makes the benchmark citable rather than glib.

  • The three dominant types fail differently and are judged on different evidence. States fail by active perpetration or codified impunity and can reach the 0.0 floor. Corporations fail by accumulated, unadjudicated enforcement — and are held not scorable until a merits adjudication, which produces a dense corporate cluster jammed just above the Critical line and zero corporations at the floor. AI and robotics labs fail by governance posture and contracting, and reach the floor only when the product itself is the harm.
  • A shared number is not a shared meaning. UnitedHealth (10.2, a Fortune 500 health insurer) and Turkey (10.3, a sovereign state) sit within a tenth of a point. Turkey's failure concentrates in political boundaries, accountability, and integrity — governance failure distributed across a state serving ~85 million people. UnitedHealth's concentrates in accountability, empathy, and equity in claims handling — stakeholder harm bounded within a commercial relationship. The number is comparable; the kind of suffering is not.
  • The same logic holds at the top. An exemplary robotics lab is judged on whether one assistive product is safe and accessible; an exemplary country on whether a whole state treats millions fairly. Both can score 81–98, but "Exemplary" rests on a far narrower evidentiary surface for a single-product lab than for a sovereign. Robotics is 26% Exemplary — far above any other index — partly because a narrow pro-social mandate satisfies the band easily.

The institution's unique asset is therefore double-edged: the one framework is what makes the comparison possible, and the within-type calibration is what keeps it honest. The report's recommendation, consistent with the year's briefings, is that cross-type "league table" framing be avoided at the extremes specifically, and that band placement be published with a per-type interpretive frame. Filed for methodology review.


5. The year in scoring — what moved, and why

Since the first applied cycle on 2026-04-15, roughly twenty scoring cycles have been logged in research/APPLIED_CHANGES.md. As a mid-2026 snapshot this is the benchmark's first scoring season, not a multi-year trend; but within 2026 the direction is unmistakable. The dominant motion is downward, and the few upgrades are hard-won and tightly disciplined. Two patterns account for most of the movement.

5.1 The downgrades — democratic backsliding and corporate conduct

The deepest and most concentrated movement is democratic backsliding among formerly-Developing states, scored on governance evidence — courts, opposition, franchise, civil society — rather than on conflict or famine:

EntityMovement (2026)What drove it
Bolivia35.9 → 6.3 across four cyclesHumanitarian-blockade crisis, then a signed state-of-exception law repealing the 60-day emergency cap and deployment restrictions; ACC and INT to the 1.0 floor. The benchmark's steepest sustained single-country descent.
United States35.5 → 25.0 → 23.4 → 17.5 (into Critical)Federal courts ruled fast-track/third-country deportation unlawful; record ICE in-custody deaths; two US citizens fatally shot by federal officers. Scored under ADJUDICATED-UNLAWFUL-CONDUCT-IS-SCORABLE.
UAE23.4 → 18.4 (into Critical)First application of ACTIVE-COMPLICITY-IN-MASS-ATROCITY-BY-PROXY: documented as primary external financier and arms supplier to the RSF after the UN found "hallmarks of genocide" in Darfur. Capped above the floor because external sponsor, not in-territory perpetrator, and the ICJ case is unadjudicated.
Israel27.8 → 8.8 (then to the 0.0 floor)Authorized-resumption-with-systematic-denial; in-territory perpetration places it among the floor states.
India34.4 → 30.5 → 15.6Removal of self-identification rights for transgender people; online-censorship expansion; deportation without due process. Critical, not floored, because functioning courts still reverse some conduct.

On the corporate side, the year's clearest new doctrine is the layoffs-despite-profits line, which set two anchors in a single window:

  • Procter & Gamble: 86.1 → 79.0 (Exemplary → Established). 7,000 cuts announced while profitable and projecting profits, with a tariff-cost-protection rationale. The downgrade lands on exactly two dimensions — Boundaries and Integrity — and turns on the distinction between distress-driven and profit-protection layoffs, not on headcount.
  • Oracle: 20.6 → 14.7 (Developing → Critical). 30,000 cuts wrapped in a coercive-severance architecture — sign a legal release waiving the right to sue or forfeit all severance. The decisive element was structure, not scale: the first application of a coercive-severance trigger, hitting seven of eight dimensions at once.

Other notable 2026 downgrades, each grounded in a documented event, include State Street (92.5 → 60.2), Abbott (87.4 → 57.8), Microsoft (87.8 → 66.4), CVS Health (50.0 → 25.6), Johnson & Johnson (48.4 → 27.5), and the AI-lab repricings of OpenAI, Anthropic, and Mistral away from inflated first baselines.

5.2 The upgrades — rare, documented, and capped

Against that tide, the upgrades are few and instructive. The benchmark credits genuine reversal — but only when it is proven, self-initiated, and sustained, and it never confers a top score on momentum:

  • Bangladesh: 34.4 → 39.8 (+5.4, rank +35). The clearest 2026 upgrade. Bangladesh exited the ITUC's ten-worst countries for workers' rights for the first time since 2017, after November 2025 labour-law reforms (union formation eased to 20 workers, 120-day maternity leave, anti-discrimination protections) and ratification of ILO Conventions C155, C187, and C190. Tempered: it retains ITUC Rating 5 ("no guarantee of rights"), and SEZ workers remain excluded — so the gain is real but lands just below Functional.
  • Hungary: 28.1 → 50.2 across multiple cycles. A documented, self-initiated institutional recovery (16-minute cabinet, audit ordered, ICC reversal sustained) — the year's control case for what reversal looks like. It reaches only the middle, two full bands below "good," precisely because the benchmark grades sustained conduct, not trajectory.
  • Smaller documented gains: Venezuela (4.4 → 18.0), Botswana (60.9 → 67.5), Saudi Arabia (4.4 → 9.4), New York City (48.4 → 56.3).

The asymmetry between §5.1 and §5.2 — many disciplined downgrades, few capped upgrades — is the scoring story of 2026.

5.3 The discipline that makes it citable

What unifies every entry above is a pre-adjudication discipline applied symmetrically. The benchmark prices proof, not accusation. A 42-state attorney-general subpoena did not move OpenAI's score (allegation, not adjudicated); Oracle's documented severance clause did (in-effect conduct). UnitedHealth faces a coordinated multi-state AG investigation, a DOJ criminal probe, and a shareholder suit at once — and is held at 10.2, because density of investigation is not adjudicated harm. The same rule runs in the good-news direction: Microsoft's compelled human-rights remedy is held below a self-initiated one, and Hungary's genuine recovery is credited as conduct but stops at the middle. This discipline — and not any single score — is the load-bearing reason the record is defensible and citable.


6. The year's deep-dives — nine lenses on one record

This report sits atop nine Special Briefings published this year. Each took one pattern and went deep; together they triangulate the field. Read as a set, they are the evidentiary spine of this synthesis.

  1. The Floor and the Critical Band — 177 Critical, 23 at the 0.0 floor; the floor is a discrete "total-collapse" state reached by four entity types on different evidentiary bars, and corporations are exempted from it by a formula mechanic rather than a stated rule.
  2. What Good Looks Like (Exemplars) — 63 Exemplary; "good" is an integration achievement (no weak dimension), the cleanest exemplars are assistive-robotics labs (the mirror of the floor), and even the best are weakest at equity.
  3. The Equity Tax — the foundational dimension-level finding this report elevates to its cross-cutting pattern: Equity is weakest for 90.5% of the field and the lone sub-band score for most exemplars.
  4. AI Governance Under Pressure — a government-mandated shutdown, a 42-state subpoena, and a union vote in one fortnight, none of which moved a composite; external governance is testing AI labs faster than their internal practice is maturing.
  5. Layoffs Despite Profits — the P&G and Oracle anchors; the benchmark now distinguishes distress-driven from profit-protection layoffs and prices coercive-severance structure as scorable harm.
  6. State of Exception — the codified-impunity pattern: governments legislating their way to the bottom; Bolivia's 35.9 → 6.3 arc as the first "predicted-trigger-realized" sequence.
  7. What the Product Is For — sorting the 100 technology labs by end-use yields a single gradient: defense/surveillance/weapons at the floor, healthcare/accessibility at the ceiling; Boston Dynamics scores 65.6 (research) and 20.3 (weaponized demo) — a 45-point gap inside one institution.
  8. Allegation, Indictment, Ruling — the pre-adjudication discipline made explicit across six clean test cases; the benchmark scores proof, discounts allegation, and disciplines good news too.
  9. The Middle of the Scale — the on-ramp: 783 entities (67.7%) in the Developing/Functional middle, where the integration premium is switched off and the composite is the plain dimension average; the median institution is mediocre, not lopsided.

The throughlines across all nine are the three this report foregrounds: the mediocre middle, the universal equity gap, and the within-type discipline that keeps cross-type comparison honest.


Forward view

This is a snapshot, and snapshots have edges. The standing patterns most likely to define the next edition:

  • Democratic backsliding is the live entry pattern into the bottom of the scale, and it is accelerating on legal grounds, not battlefield ones. The "Dismantler" cohort just above the Critical line (Croatia, Slovakia, Bulgaria, Italy) is the next-likely set of entrants; Hungary's upgrade arc is the control case for what reversal looks like. Watch the gap between codified impunity (which floors a score) and contested-but-reversible conduct (which holds it in Critical) — a functioning court is, repeatedly, the difference between Critical and the floor.
  • AI governance is testing labs faster than their practice is maturing. Subpoenas, compelled shutdowns, and union votes are arriving as forward indicators ahead of any scored event. The open question is what converts external pressure — especially worker voice — into a scorable accountability signal, and the first adjudication of a major AI enforcement action (the OpenAI multi-AG probe is the leading candidate) will set the precedent.
  • The corporate boundary cluster is a coiled spring. A dense band of Fortune 500 entities sits just above the Critical line under unadjudicated enforcement; a single merits adjudication could reprice several into Critical at once — and would directly test whether a corporation that crosses into Critical ever faces a floor question, which the framework currently does not answer.
  • The Equity gap is the standing structural finding — the one pattern that will almost certainly survive into every future edition, and the one most worth watching for movement. An institution that closes its equity gap while holding the other seven dimensions is the rarest and most significant possible motion in the record; as of mid-2026, only Open Bionics and Switzerland have cleared it at the top of the scale.

The honest summary of the inaugural state of the field is this: the world's institutions are, on the whole, partial — they have started, and cannot yet prove it worked — and they are most partial, almost universally, at fairness to those with the least power. That is the baseline. Future editions will measure the field against it.


Sources

  • Canonical scores (ground truth): site/src/data/indexes/{countries,fortune-500,global-cities,ai-labs,robotics-labs,us-states,us-cities}.json — the five-band distribution (177/535/248/133/63), the per-index breakdowns, the field median (35.9) and mean (38.9), the Equity diagnostics (1,046/1,156 = 90.5% weakest/tied; 593 strictly lowest; global EQU mean 2.21; 61/63 and 36/63 among Exemplary), and the 23-entity floor roster were all recomputed directly from rankings[] and scores{} and reconcile exactly with the published data (no drift).
  • The year's scoring record: research/APPLIED_CHANGES.md (all ~20 cycles since 2026-04-15: Bolivia ×4, United States, UAE, Israel, India, P&G, Oracle, Bangladesh, Hungary, and the corporate and AI-lab repricings, each with its applied delta, dimensions, and ruling rationale); research/change-proposals/*.json (per-entity proposals).
  • The nine Special Briefings synthesized here: research/special-briefings/{floor-and-critical-2026-06-11, exemplars-2026-06-11, equity-tax-2026-06-16, ai-governance-2026-06-15, layoffs-despite-profits-2026-06-15, state-of-exception-2026-06-16, what-the-product-is-for-2026-06-16, allegation-indictment-ruling-2026-06-16, middle-of-the-scale-2026-06-16}.md — their publicSummaries supply the cited per-pattern findings (the floor taxonomy, the integration-premium mechanic, the purpose-to-score gradient, the pre-adjudication discipline, the middle-of-the-scale field figures).
  • Formula / methodology: site/scripts/lib/scoring.mjs::computeCompositeFromDimensions (canonical composite), site/src/data/dimensions.ts (the 8 dimensions, 40 subdimensions, 5 bands), and the floor/integration-premium logic in the assessor agent definitions.
  • Fresh web evidence (external grounding for cited facts):
  • Bangladesh exiting the ITUC ten-worst list: New Age — "Bangladesh exits world's worst 10 for workers' rights"; ITUC Global Rights Index.
  • UN "hallmarks of genocide" finding in Darfur and UAE arms supply to the RSF (basis for the UAE downgrade): Human Rights Watch — "UN Body Finds 'Hallmarks of Genocide' in Darfur"; allAfrica — "UN Accuses UAE of Funnelling British-Made Arms to Militias in Sudan".
  • Independent corroboration of the top-of-scale geography (Nordic/Alpine clustering): Gallup / World Happiness Report rankings — convergent (Finland, Iceland, Denmark, Sweden, Norway lead; Switzerland top-tier), not circular evidence for the benchmark's distinct conduct-based methodology.
How to read the scores

The 0–100 scale — five bands

Every entity — state, corporation, AI lab, robotics lab, or city — is scored 0–100 across 8 dimensions and 40 subdimensions. The composite score places the entity in one of five bands:

Critical0–20Foundational compassion practices are absent or documented active harm is present.
Developing20–40Some practices are emerging but remain inconsistent, reactive, or unevenly applied.
Functional40–60Core practices exist and meet a basic bar, with significant gaps remaining.
Established60–80Practices are systematic, documented, and supported by consistent evidence.
Exemplary80–100Practices are independently verified, consistent, and sustained under pressure.

The 8 dimensions

Each dimension is scored 1–5 across 5 subdimensions (40 subdimensions total), then converted to a 0–100 composite. A score of 1.0 on a subdimension represents the minimum anchor; 5.0 is exemplary conduct.

AWRAwarenessDoes this entity reliably detect when others are in pain or need — before they name it?
EMPEmpathyDoes this entity genuinely connect with the inner experience of those it serves?
ACTActionDoes compassionate understanding translate into real, proportional, effective help?
EQUEquityIs care distributed fairly — especially toward those with greatest need and least power?
BNDBoundariesIs helping sustainable, ethical, and autonomy-preserving — not dependency-creating?
ACCAccountabilityDoes this entity own its failures, correct course, and make genuine repair?
SYSSystemic ThinkingDoes compassion extend to root causes and structural change — not only symptom relief?
INTIntegrityIs compassion genuine, consistent, and non-performative — especially when it costs something?

Scores are based on public evidence — government reports, regulatory filings, independent audits, judicial findings, and verifiable third-party records. Entities never pay for inclusion, score changes, or suppression of findings. Full methodology

Continue reading

Companion

June 16, 2026

Allegation, Indictment, Ruling — How the Benchmark Scores Accusations vs Proof

In a single fortnight, OpenAI was hit by a 42-state attorney-general subpoena and its score did not move; Oracle's documented severance terms moved it into the Critical band. That is not inconsistency — it is the discipline that keeps the benchmark citable. This briefing examines six entities to show the exact line the record draws between what is alleged and what is proven, and between conduct an institution chose and conduct a government forced on it.

Read briefing
Companion

June 16, 2026

The Equity Tax — The One Dimension That Drags Almost Everyone Down

The benchmark scores eight dimensions of institutional conduct. One of them — Equity, the fair distribution of care toward those with the greatest need and least power — is the weakest score for nine of every ten entities assessed, from authoritarian states to model corporations. This briefing measures that pattern across all 1,156 entities, shows the exact mechanism by which a single weak equity score caps an otherwise strong profile, and asks what it means that the institutions which get everything else right still fail the most vulnerable.

Read briefing
Companion

June 16, 2026

The Middle of the Scale — What a 50 Actually Means

The benchmark's two foundational briefings spent the extremes: the 23 at the floor and the 64 at the top, together under 9% of the field. But almost every entity a reader looks up — their employer, their city, their country — lives in the vast Developing and Functional middle. This briefing is the on-ramp: what a middling score actually measures, why a balanced 50 and a spiky 50 are not the same thing, and why the "boring" middle is the hardest band to read.

Read briefing
Companion

June 16, 2026

State of Exception — When Governments Codify Impunity

A cluster of governments is not falling to the bottom of the scale through single atrocities. It is legislating its way there — converting emergency powers, "extremist" designations, and election repression into durable, signed-into-law impunity. This briefing tracks that pattern across the Critical-band countries and examines its sharpest case: Bolivia's descent from 28.4 to 6.3 across four scoring cycles, the benchmark's first sequence in which a predicted trigger was named in advance and then realized.

Read briefing
Companion

June 16, 2026

What the Product Is For — Robotics and AI at the Harm Frontier

Sort the 50 robotics labs and 50 AI labs not by rank but by what their core product is *for*, and one gradient appears in both indexes at once: defense, surveillance, and weapons cluster at the floor; healthcare, accessibility, and assistive technology cluster at the ceiling. Compassion Benchmark is the only institution that scores robotics labs at all — there is no comparator. This briefing examines what that gradient is actually measuring, and where conduct and purpose come apart.

Read briefing
Companion

June 15, 2026

AI Governance Under Pressure — What a Shutdown, a Subpoena, and a Union Vote Actually Tell the Benchmark

In a single fortnight, the US government forced Anthropic to pull its two most powerful models, 42 state attorneys general subpoenaed OpenAI, and Google DeepMind's UK staff voted to unionize over military AI. The benchmark scores how institutions recognize and reduce suffering — not how much external pressure they attract. This briefing examines what each of those events does, and does not, say about an AI lab's compassion score.

Read briefing
Companion

June 15, 2026

Layoffs Despite Profits — When a Layoff Becomes a Compassion Failure

A 2026 Fortune 500 restructuring wave is testing a boundary the benchmark is only beginning to price: the difference between a layoff forced by distress and a layoff that protects margin while profits rise. Two cases set the new anchors — Procter & Gamble, downgraded out of the top tier for cutting 7,000 jobs "despite increasing profits," and Oracle, dropped into the Critical band for a 30,000-person cut wrapped in a "sign the release or forfeit your severance" ultimatum. This briefing examines what separates a Boundaries-neutral business decision from a scorable harm.

Read briefing
Companion

June 11, 2026

What Good Looks Like — Exemplars Across Entity Types

The same 0–100 scale that judges the worst also names the best. At the top, 64 entities across states, corporations, AI and robotics labs, and cities reach the Exemplary band. This briefing asks what high compassion actually looks like in the record — what dimension profile produces it, whether it is earned the same way across entity types, and why even the best institutions share a single, universal soft spot.

Read briefing
Companion

June 11, 2026

The Floor and the Critical Band — How the Benchmark Judges the Worst

A single 0–100 scale ranks states, corporations, AI and robotics labs, and cities together. At the bottom, that shared scale meets four entity types that fail in structurally different ways — and reach the bottom by different mechanics. This briefing examines the 176 entities in the Critical band and the 23 at the absolute floor, and asks what the record actually shows about how the worst are judged.

Read briefing

Related daily briefing

June 16, 2026 — daily benchmark

Cite this briefing

Copy-ready citation string for journalism, research, or academic use.

Compassion Benchmark. "The State of Institutional Compassion — 2026." compassionbenchmark.com/updates/special/state-of-institutional-compassion-2026. Accessed [Month Year]. Independent — entities never pay for inclusion, score changes, or suppression of findings.

For methodology, see compassionbenchmark.com/methodology. Data terms: /data-licenses. Press resources: /media.

You just read a Special Briefing.

Weekly score highlights — institutional compassion findings

The week's top score movements and evidence-linked findings across 1,156 entities, delivered every Friday. Daily briefings publish on the site. Free.

No spam. Unsubscribe anytime. Your email is never shared.