Allegation, Indictment, Ruling — How the Benchmark Scores Accusations vs Proof
In a single fortnight, OpenAI was hit by a 42-state attorney-general subpoena and its score did not move; Oracle's documented severance terms moved it into the Critical band. That is not inconsistency — it is the discipline that keeps the benchmark citable. This briefing examines six entities to show the exact line the record draws between what is alleged and what is proven, and between conduct an institution chose and conduct a government forced on it.
Scope: A curated cross-index cohort of six entities — OpenAI (27.5), Oracle (14.7), UnitedHealth Group (10.2), Anthropic (59.1), Microsoft (65.3), and Hungary (50.2) — chosen because each is a clean test of one rule: how the benchmark separates an *allegation* from an *adjudicated finding* from *documented conduct* from *compelled action*.
Cohort: 6 entities across 3 indexes, each a clean test of one evidentiary rule. · OpenAI 27.5 (Developing) — 42-state AG subpoena, held sub-threshold as allegation-not-adjudicated. · Oracle 14.7 (Critical) — documented coercive-severance terms, moved Developing → Critical. · UnitedHealth 10.2 (Critical) — coordinated multi-AG investigation + DOJ probe, held pre-adjudication. · Anthropic 59.1 (Functional) — government-compelled model shutdown, scored on conduct, held. · Microsoft 65.3 (Established) — compelled human-rights remedy, held sub-threshold. · Hungary 50.2 (Functional) — documented self-initiated recovery (28.1 → 50.2), reaches Functional, not the top.
If you remember one thing
Forty-two attorneys general did not move OpenAI's score; one documented severance clause moved Oracle into Critical. The difference is not the number of accusers — it is the evidentiary stage. An AG subpoena is an allegation under investigation; Oracle's "sign a release or forfeit severance" terms are documented, in-effect conduct. The benchmark prices the second and discounts the first.
Key Findings
- Forty-two attorneys general did not move OpenAI's score; one documented severance clause moved Oracle into Critical. The difference is not the number of accusers — it is the evidentiary stage. An AG subpoena is an allegation under investigation; Oracle's "sign a release or forfeit severance" terms are documented, in-effect conduct. The benchmark prices the second and discounts the first.
- An allegation is weighed, not ignored — it just cannot move the composite by itself. The OpenAI probe was reconstructed as a −1.6 pressure on Accountability and Awareness; it fell below the 5-point threshold and stayed inside the band. The accusation registers as sub-dimension pressure and as a pre-registered watch trigger, but it does not become a score event until it is adjudicated.
- Density of investigations is not adjudicated harm. UnitedHealth faces a coordinated multi-state AG investigation, a DOJ criminal probe, and a shareholder suit at once — and the record holds it at 10.2, explicitly ruling that breadth of investigation raises enforcement *density* but does not cross into proof. The same rule applies to a near-floor insurer and a trillion-dollar AI lab.
- Compelled action is scored on conduct, not on the compulsion. When the US government forced Anthropic to suspend two flagship models, the benchmark scored Anthropic's *behavior in the event* — prompt disclosure, an apology, a stated restoration effort — as mildly positive, not as self-inflicted harm. The same doctrine, run in the other direction, holds Microsoft's *compelled* human-rights remedy below a self-initiated one.
- The discipline is symmetric — it disciplines good news too. Hungary's documented, self-initiated recovery moved it from a 28.1 baseline to 50.2, a genuine multi-cycle climb. It lands at Functional, not the top of the scale: trajectory and reform-in-progress are credited as conduct, but a top score is reserved for sustained, self-initiated practice — never conferred on momentum or on a forced settlement.
- The endpoints of these discounts are where the live risk sits. Each held score carries a pre-registered conversion trigger — an enforcement action, a settlement, an adjudicated finding, a verified operative effect. The score does not move on the accusation; it moves the day the accusation becomes proof.
The field
1,156 entities across the five bands — the full distribution this briefing draws from.
1. Frame
The Compassion Benchmark is an evidence institution before it is a ranking. Its credibility rests less on any individual number than on a single, boring-sounding discipline: the benchmark distinguishes what is alleged from what is proven, and it scores documented conduct rather than reputation, sentiment, or the volume of accusation. Get that line wrong in either direction and the institution fails — score on accusations and it becomes a rumor mill that any motivated complainant can move; ignore accusations entirely and it becomes a whitewash that misses real harm until it is too late.
This briefing takes that line as its subject. It assembles six entities — across the AI-labs, Fortune-500, and countries indexes — each of which isolates one move in the evidentiary logic:
- OpenAI (27.5) — a 42-state attorney-general subpoena that did not move the score. The pure allegation case.
- Oracle (14.7) — documented severance terms that did move the score, into Critical. The documented-conduct case.
- UnitedHealth (10.2) — a coordinated multi-AG investigation plus a DOJ probe, held pre-adjudication. The enforcement-density case.
- Anthropic (59.1) — a government-compelled model shutdown scored on conduct. The compelled-restriction case.
- Microsoft (65.3) — a compelled human-rights remedy held below self-correction. The compelled-remedy case.
- Hungary (50.2) — a documented self-initiated recovery that reaches Functional, not the top. The symmetric-discipline case.
The central thesis is one sentence: allegations are weighed but discounted until adjudicated; documented conduct moves scores; and compelled action is scored on conduct, not on the compulsion — and the same three rules govern a trillion-dollar AI lab, a near-floor insurer, and a recovering democracy alike. That uniformity is the point. None of these scores is re-examined here; this is an interpretation of why the record reads the way it does.
2. The cohort — six clean tests of one line
Recomputed directly from rankings[] in each index. Every published composite below reconciles exactly with the canonical JSON — no drift.
| Entity | Index | Composite | Band | The evidentiary stage it tests | The governing ruling |
|---|---|---|---|---|---|
| OpenAI | ai-labs | 27.5 | Developing | Allegation under investigation | ALLEGATION-NOT-ADJUDICATED |
| Oracle | fortune-500 | 14.7 | Critical | Documented, in-effect conduct | COERCIVE-SEVERANCE-STRUCTURE |
| UnitedHealth Group | fortune-500 | 10.2 | Critical | Coordinated investigation, pre-merits | FILED-BUT-UNADJUDICATED |
| Anthropic | ai-labs | 59.1 | Functional | Government-compelled restriction | COMPELLED-RESTRICTION-SCORED-ON-CONDUCT |
| Microsoft | fortune-500 | 65.3 | Established | Government/expose-compelled remedy | COMPELLED-REMEDY-NOT-SELF-CORRECTION |
| Hungary | countries | 50.2 | Functional | Documented self-initiated recovery | (trajectory ≠ top score) |
These six sort cleanly into a three-stage evidentiary ladder, plus a compelled-conduct axis that cuts across it:
- Stage 1 — Allegation (discounted): OpenAI, UnitedHealth. An accusation exists and is credible enough to investigate, but no merits finding has issued. Weighed as sub-dimension pressure and a watch trigger; it cannot move the composite by itself.
- Stage 2 — Documented conduct (scored): Oracle, Hungary. The conduct itself is on the record and in effect — Oracle's severance terms, Hungary's enacted (and not-yet-enacted) reforms. This is what the benchmark actually prices.
- Stage 3 — Adjudicated finding (the trigger, not yet pulled): none of the six has crossed it on the in-window facts. Each instead carries a pre-registered trigger that would convert an allegation into a score event.
- The compelled-conduct axis: Anthropic and Microsoft are scored on what they did when an external actor forced their hand — not on the forcing. This axis is orthogonal: a compelled action can be conduct-positive (Anthropic) or held-below-self-correction (Microsoft).
The single most important structural fact in this cohort: the discount is applied at the same threshold for everyone. OpenAI's 42-state probe and UnitedHealth's multi-AG investigation are both held at exactly the same evidentiary bar as a single Fortune-500 enforcement filing. The benchmark does not let the count of accusers, the prominence of the accused, or the severity of the allegation substitute for adjudication.
3. The allegation discount — why 42 attorneys general did not move OpenAI
OpenAI sits at 27.5 (Developing, rank 42). In the window of June 12–14, 2026, a coalition of 42 state attorneys general, led by New York AG Letitia James, served OpenAI a subpoena four days after its confidential S-1 IPO filing. The subpoena's scope is broad and serious: consumer and health data, marketing to vulnerable populations (children and seniors), age verification, safety-testing policy, and — notably — model sycophancy named as a design flaw. A separate Florida civil suit (June 1) alleges ChatGPT validated a 16-year-old's suicidal ideation and supplied self-harm methods.
The score did not move. The ruling — ALLEGATION-NOT-ADJUDICATED — is explicit about why, and the reasoning is the load-bearing part:
"an AG subpoena/investigation is an allegation under investigation, not an adjudicated finding … allegations carry a Tier discount and do not by themselves move the composite."
Crucially, the allegation was not ignored. The assessor ran a conservative reconstruction as if the allegation were weighted — applying downward pressure on Accountability (1.9 → 1.7, for marketing to vulnerable populations without age verification) and Awareness (2.2 → 2.0, for anticipatory awareness of harm to minors). That reconstruction produced a composite of 25.9 — a delta of −1.6, below the 5-point movement threshold. The probe is therefore recorded as a sub-dimension intensifier within the Developing band, not a band-moving event. It registers; it just does not clear the bar to become a score change.
This is the discipline doing exactly what it is for. Three features distinguish a discounted allegation from a scored one:
- It is weighed, then thresholded. The pressure is reconstructed in good faith. The discount is not "ignore the accusation"; it is "the accusation, weighted honestly, does not cross the movement threshold without proof."
- It is corroboration, not novelty. The 42-state subpoena materially broadens the single Florida suit already weighed in the June 8 assessment — but breadth that re-confirms a known pattern is not the same as new adjudicated harm.
- It carries a pre-registered endpoint. The assessment's Watch section names the exact conversion conditions: "Any AG enforcement action, settlement, or adjudicated finding (vs. the current investigative subpoena) → scorable ACC/INT downgrade." The discount has a defined off-ramp; it is not a permanent shield.
The skeptical reading — "the benchmark let a trillion-dollar lab off the hook" — gets the discipline backwards. The benchmark would be less credible, not more, if 42 signatures on a subpoena could move a score that the same 42 offices have not yet proven in any forum.
4. The documented-conduct contrast — one severance clause moved Oracle into Critical
Set OpenAI beside Oracle, which crossed Developing → Critical (20.6 → 14.7, −5.9) on June 14, applied after founder review. The contrast is the whole argument of this briefing.
Oracle's downgrade did not rest on an accusation, an investigation, or a regulator's complaint. It rested on documented, in-effect conduct — the actual terms of a finalizing 30,000-person layoff (~18% of global headcount, completing June 15, 2026):
"employees must sign a release waiving their right to sue in order to receive any benefit at all." — Tech Times, 2026-06-01
"Oracle included the 60-day WARN notice pay within its existing severance calculation rather than paying it on top of severance." — Tech Times, 2026-06-01
"We are choosing the chips. Anyone whose job is not making the chips run faster for our customers is at risk in this industry." — Larry Ellison, via Tech Times, 2026-06-01
These are not allegations awaiting a finding. They are the operative terms of the severance program itself — a sign-or-forfeit consent structure, WARN notice pay absorbed into (rather than added to) severance, forfeited unvested shares, and leadership explicitly framing margin over workforce. The conduct is the evidence; there is nothing left to adjudicate about what the terms are. The composite moved accordingly, with Boundaries, Accountability, and Integrity all marked down.
The OpenAI / Oracle pairing isolates the variable with unusual cleanliness:
| OpenAI | Oracle | |
|---|---|---|
| Pressure type | 42-state AG subpoena | Documented severance terms |
| Evidentiary stage | Allegation under investigation | Conduct in effect |
| Weighted delta | −1.6 (reconstructed) | −5.9 (applied) |
| Threshold | Below 5pt — held | Above 5pt — crossed |
| Band | Developing (held) | Developing → Critical |
| Source tier | News reporting of an investigation | Tier-2 reporting of program terms |
The lesson is not that Oracle is "worse" than OpenAI in some absolute sense — they sit in different indexes and are not directly comparable on the bottom band (a point the Floor-and-Critical briefing made at length). The lesson is about evidence stage, holding severity aside: documented conduct is scored; an investigation into possibly-worse conduct is not, until it is proven.
5. Enforcement density is not proof — the UnitedHealth case
If OpenAI is the single-investigation allegation case, UnitedHealth Group (10.2, Critical, rank 445) is the stress test of the same rule: what happens when the accusations pile up?
In June 2026 UnitedHealth faced, simultaneously: a coordinated investigation by multiple state attorneys general, a continuing DOJ criminal probe into Medicare Advantage risk-score inflation, a shareholder lawsuit alleging withheld material information after the Thompson shooting, and multiple federal/state cases over AI-powered claim denials and mental-health coverage failures. A CEO resignation landed in the same window. By any journalistic measure this is a company in crisis.
The score held at 10.2. The ruling answers the exact question a critic would ask:
"does coordinated sovereign escalation cross from filed-but-unadjudicated to scorable? Ruling: NO. A coordinated multi-AG investigation is an investigation, not a merits adjudication, settlement, or charge … the breadth (multi-state coordination) raises enforcement density and corroborates the decade-long upcoding/claim-denial pattern, but density of investigations is not adjudicated harm."
Two refinements make this case more instructive than OpenAI's:
- Density ≠ proof. Ten investigations into the same alleged pattern are still zero merits findings. The benchmark explicitly refuses to let aggregation substitute for adjudication — the failure mode where "everyone is investigating them, so it must be true" silently becomes a score.
- The discount is not absolution. UnitedHealth is at 10.2 — near the floor — because its documented conduct (the priced ACC 1.125 / EQU 1.25 profile, an entrenched claim-denial and upcoding record) already sits there on prior evidence. The allegation discount governs only the new in-window escalation; it does not launder the company's existing low score upward. The CEO resignation, similarly, is ruled "governance churn, not scorable harm."
The Watch trigger is identical in structure to OpenAI's: "Merits adjudication / settlement of the DOJ MA-fraud probe or multi-AG action → scorable downgrade." A near-floor insurer and a trillion-dollar AI lab are governed by the same off-ramp. That uniformity — the rule does not bend for size, sympathy, or sector — is what makes it a rule rather than a reaction.
6. The compelled-conduct doctrine — scoring behavior, not the force behind it
The third move is the subtlest, and it cuts across the allegation ladder rather than sitting on it. When an external actor — a government, a court, a regulator — forces an institution to act, the benchmark scores the institution's conduct in the event, not the external compulsion. Two entities in the cohort run this doctrine in opposite directions.
Anthropic (59.1, Functional) — compelled restriction, scored conduct-positive. On June 12, 2026 the US Commerce Department issued an export-control directive requiring Anthropic to suspend its Fable 5 and Mythos 5 models for all foreign nationals (including foreign-national staff), citing national security and a reported jailbreak. Anthropic disabled both models for all customers within a day. The methodological treatment is explicit:
"This is a government-mandated shutdown, not a failure of Anthropic's own compassion conduct. The benchmark scores the entity's behavior — recognizing, responding to, and reducing suffering — not external sentiment about a regulatory action. Anthropic's conduct in the event (prompt public disclosure, an apology to customers, disclosure of the government's stated reasoning, a stated effort to restore access) is mildly positive on Accountability/Transparency."
The shutdown could have been read naively as "Anthropic's product is now restricted, therefore downgrade." The benchmark refused that read: the compulsion is not the entity's conduct. Accountability was nudged 3.5 → 3.6 for the transparent handling; the composite held at 59.1. The principle: an institution should not be penalized (or rewarded) for what a government forces on it — only for how it behaves when forced.
Microsoft (65.3, Established) — compelled remedy, held below self-correction. The same doctrine, mirrored at the positive end. Following an external inquiry into Unit 8200's use of Azure for mass surveillance of Palestinian mobile calls, Microsoft terminated that access and created a Human Rights Review Board, strengthened pre-contract review, and added anonymous reporting channels. Genuine, documented reform — yet the ruling (COMPELLED-REMEDY-NOT-SELF-CORRECTION) holds it sub-threshold:
"The Human Rights Review Board's creation IS the compelled, scope-limited, prospective remediation … expose-driven, not self-initiated — AB2 anchor 3 (course-correction-under-pressure), not anchor 4-5. Max defensible uplift remains sub-threshold (+1.3)."
A remedy extracted by exposure is credited — but not as the same thing as a remedy an institution reaches on its own. Microsoft is held at 65.3 with a POSITIVE-WATCH and a pre-registered upgrade trigger: "Verified HR Review Board blocking/exiting a harmful national-security contract → scorable UPGRADE." The compelled remedy earns the watch; only verified operative effect earns the score.
Together, Anthropic and Microsoft establish the doctrine symmetrically: compelled restriction is not self-inflicted harm; compelled remedy is not self-initiated virtue. In both cases the benchmark scores the conduct and discounts the compulsion.
8. Forward view — where the discounts convert
Every held score in this cohort is held pending a trigger. The forward view is therefore a list of the exact events that would turn an allegation into a score change — the points where this discipline gets tested in public.
- OpenAI — the subpoena→enforcement line. The single highest-value forward trigger in the cohort. Any AG enforcement action, settlement, or adjudicated finding (as opposed to the current investigative subpoena), or adjudication/settlement of the Florida Adam Raine suit, converts the discounted allegation into a scorable ACC/INT downgrade. The IPO timing (subpoena four days after the confidential S-1) makes a public, near-term adjudication path plausible.
- UnitedHealth — the merits line. A merits adjudication or settlement of the DOJ Medicare-Advantage fraud probe or the multi-AG action would move a near-floor score with modest residual headroom. This is the case most likely to test whether the density discount holds under sustained pressure.
- Anthropic — the compelled-conduct off-ramp. An adjudicated DC Circuit ruling on the Pentagon supply-chain-risk designation, or enacted audit-mandate compliance/non-compliance, could become scorable in either direction — the cleanest forward test of whether "scored on conduct" holds when the compelled event resolves.
- Microsoft — the operative-effect trigger. The compelled HR Review Board converts from sub-threshold remedy to scorable upgrade only on verified operative effect: a documented contract the Board actually blocks or exits. Quiet reinstatement of terminated Unit 8200 access would convert the other way.
- Hungary — the self-initiation test. The recovery arc (28.1 → 50.2) is the cohort's control case for what credited conduct looks like. Whether it climbs further toward (and the rule says, only toward, not into) the top band depends on enacted, sustained, self-initiated reform — the structural commitments materializing rather than being announced.
The through-line: none of these scores moves on the accusation. Each moves the day the accusation becomes proof — or the day documented conduct changes. That is the event to watch, and the discipline this briefing documents is precisely the rule that decides which day that is.
Sources
- Canonical scores (ground truth):
site/src/data/indexes/{ai-labs,fortune-500,countries}.json— the six composites (OpenAI 27.5, Oracle 14.7, UnitedHealth 10.2, Anthropic 59.1, Microsoft 65.3, Hungary 50.2), bands, and ranks were recomputed directly fromrankings[]and reconcile exactly with the published values (no drift). - Ruling provenance (assessments):
research/assessments/openai-2026-06-15.md(ALLEGATION-NOT-ADJUDICATED; 42-state subpoena; −1.6 reconstruction);research/change-proposals/oracle-2026-06-12.json(COERCIVE-SEVERANCE-STRUCTURE; applied Developing → Critical, founder-approved 2026-06-14);research/assessments/unitedhealth-group-2026-06-09.md(FILED-BUT-UNADJUDICATED / density-not-adjudication);research/assessments/anthropic-2026-06-13.md(compelled Fable 5 / Mythos 5 shutdown, scored on conduct);research/assessments/microsoft-2026-06-09.md(COMPELLED-REMEDY-NOT-SELF-CORRECTION);research/assessments/hungary-2026-04-30.md+site/public/data/history/hungary.json(28.1 baseline → 50.2 recovery arc). - Applied-change record:
research/APPLIED_CHANGES.md(2026-06-14 batch — Oracle Developing → Critical, band-count deltas). - Methodology corpus:
research/PENDING_CHANGES.md(, , — the pre-registration questions referenced in §7); prior Special Briefingsresearch/special-briefings/ai-governance-2026-06-15.mdandresearch/special-briefings/layoffs-despite-profits-2026-06-15.md(conduct-vs-coercion and pre-adjudication framing). - Primary web evidence (entity events):
- OpenAI 42-state subpoena — Tom's Hardware, TechTimes, The Next Web.
- Oracle severance terms — Tech Times.
- UnitedHealth probes — Healthcare Finance News, Medical Economics.
- Microsoft compelled remedy — The National.
- Hungary recovery context — HRW (rule-of-law agenda), HRW (ICC road back).
How to read the scores
The 0–100 scale — five bands
Every entity — state, corporation, AI lab, robotics lab, or city — is scored 0–100 across 8 dimensions and 40 subdimensions. The composite score places the entity in one of five bands:
The 8 dimensions
Each dimension is scored 1–5 across 5 subdimensions (40 subdimensions total), then converted to a 0–100 composite. A score of 1.0 on a subdimension represents the minimum anchor; 5.0 is exemplary conduct.
Scores are based on public evidence — government reports, regulatory filings, independent audits, judicial findings, and verifiable third-party records. Entities never pay for inclusion, score changes, or suppression of findings. Full methodology
Continue reading
June 16, 2026
The Equity Tax — The One Dimension That Drags Almost Everyone Down
The benchmark scores eight dimensions of institutional conduct. One of them — Equity, the fair distribution of care toward those with the greatest need and least power — is the weakest score for nine of every ten entities assessed, from authoritarian states to model corporations. This briefing measures that pattern across all 1,156 entities, shows the exact mechanism by which a single weak equity score caps an otherwise strong profile, and asks what it means that the institutions which get everything else right still fail the most vulnerable.
Read briefingJune 16, 2026
The Middle of the Scale — What a 50 Actually Means
The benchmark's two foundational briefings spent the extremes: the 23 at the floor and the 64 at the top, together under 9% of the field. But almost every entity a reader looks up — their employer, their city, their country — lives in the vast Developing and Functional middle. This briefing is the on-ramp: what a middling score actually measures, why a balanced 50 and a spiky 50 are not the same thing, and why the "boring" middle is the hardest band to read.
Read briefingJune 16, 2026
State of Exception — When Governments Codify Impunity
A cluster of governments is not falling to the bottom of the scale through single atrocities. It is legislating its way there — converting emergency powers, "extremist" designations, and election repression into durable, signed-into-law impunity. This briefing tracks that pattern across the Critical-band countries and examines its sharpest case: Bolivia's descent from 28.4 to 6.3 across four scoring cycles, the benchmark's first sequence in which a predicted trigger was named in advance and then realized.
Read briefingJune 16, 2026
The State of Institutional Compassion — 2026
This is the first comprehensive read on how institutions worldwide recognize, respond to, and reduce suffering. Across seven indexes, 1,156 institutions — every kind, from sovereign states to single-product labs — are scored on one shared 0–100 framework. The headline is sobering and consistent: the modal institution is mediocre, the tails are thin, and almost every institution on Earth, from the worst to the very best, is weakest at the same thing — fairness to those with the least power. This is the state of the field as of mid-2026.
Read briefingJune 16, 2026
What the Product Is For — Robotics and AI at the Harm Frontier
Sort the 50 robotics labs and 50 AI labs not by rank but by what their core product is *for*, and one gradient appears in both indexes at once: defense, surveillance, and weapons cluster at the floor; healthcare, accessibility, and assistive technology cluster at the ceiling. Compassion Benchmark is the only institution that scores robotics labs at all — there is no comparator. This briefing examines what that gradient is actually measuring, and where conduct and purpose come apart.
Read briefingJune 15, 2026
AI Governance Under Pressure — What a Shutdown, a Subpoena, and a Union Vote Actually Tell the Benchmark
In a single fortnight, the US government forced Anthropic to pull its two most powerful models, 42 state attorneys general subpoenaed OpenAI, and Google DeepMind's UK staff voted to unionize over military AI. The benchmark scores how institutions recognize and reduce suffering — not how much external pressure they attract. This briefing examines what each of those events does, and does not, say about an AI lab's compassion score.
Read briefingJune 15, 2026
Layoffs Despite Profits — When a Layoff Becomes a Compassion Failure
A 2026 Fortune 500 restructuring wave is testing a boundary the benchmark is only beginning to price: the difference between a layoff forced by distress and a layoff that protects margin while profits rise. Two cases set the new anchors — Procter & Gamble, downgraded out of the top tier for cutting 7,000 jobs "despite increasing profits," and Oracle, dropped into the Critical band for a 30,000-person cut wrapped in a "sign the release or forfeit your severance" ultimatum. This briefing examines what separates a Boundaries-neutral business decision from a scorable harm.
Read briefingJune 11, 2026
What Good Looks Like — Exemplars Across Entity Types
The same 0–100 scale that judges the worst also names the best. At the top, 64 entities across states, corporations, AI and robotics labs, and cities reach the Exemplary band. This briefing asks what high compassion actually looks like in the record — what dimension profile produces it, whether it is earned the same way across entity types, and why even the best institutions share a single, universal soft spot.
Read briefingJune 11, 2026
The Floor and the Critical Band — How the Benchmark Judges the Worst
A single 0–100 scale ranks states, corporations, AI and robotics labs, and cities together. At the bottom, that shared scale meets four entity types that fail in structurally different ways — and reach the bottom by different mechanics. This briefing examines the 176 entities in the Critical band and the 23 at the absolute floor, and asks what the record actually shows about how the worst are judged.
Read briefingRelated daily briefing
June 16, 2026 — daily benchmark
Cite this briefing
Copy-ready citation string for journalism, research, or academic use.
Compassion Benchmark. "Allegation, Indictment, Ruling — How the Benchmark Scores Accusations vs Proof." compassionbenchmark.com/updates/special/allegation-indictment-ruling-2026-06-16. Accessed [Month Year]. Independent — entities never pay for inclusion, score changes, or suppression of findings.
For methodology, see compassionbenchmark.com/methodology. Data terms: /data-licenses. Press resources: /media.
You just read a Special Briefing.
Weekly score highlights — institutional compassion findings
The week's top score movements and evidence-linked findings across 1,156 entities, delivered every Friday. Daily briefings publish on the site. Free.
No spam. Unsubscribe anytime. Your email is never shared.