The Middle of the Scale — What a 50 Actually Means
The benchmark's two foundational briefings spent the extremes: the 23 at the floor and the 64 at the top, together under 9% of the field. But almost every entity a reader looks up — their employer, their city, their country — lives in the vast Developing and Functional middle. This briefing is the on-ramp: what a middling score actually measures, why a balanced 50 and a spiky 50 are not the same thing, and why the "boring" middle is the hardest band to read.
Scope: All 7 indexes, 1,156 entities. The 783 entities in the Developing band (composite 20–40) and the Functional band (composite 40–60) — the 67.7% the floor and exemplars briefings deliberately skipped.
Cohort: 1,156 entities assessed across 7 indexes. · 783 sit in the middle — Developing (20–40) or Functional (40–60): 67.7% of the field. · The single most common band is Developing: 535 entities (46.3%). The field median composite is 35.9 — a high-Developing score. The typical assessed institution in the world is Developing, not Functional. · Per band: Critical 177 (15.3%) · Developing 535 (46.3%) · Functional 248 (21.5%) · Established 133 (11.5%) · Exemplary 63 (5.4%).
If you remember one thing
The middle is the benchmark — the extremes are the exception. Two-thirds of all assessed entities (783 of 1,156) sit in the Developing or Functional band, and the field median is 35.9. The dramatic floor and ceiling cases are real, but they describe one entity in five. The score a reader is most likely to encounter is a middling one, and it is the least explained.
Key Findings
- The middle is the benchmark — the extremes are the exception. Two-thirds of all assessed entities (783 of 1,156) sit in the Developing or Functional band, and the field median is 35.9. The dramatic floor and ceiling cases are real, but they describe one entity in five. The score a reader is most likely to encounter is a middling one, and it is the least explained.
- A 50 is not "half compassionate" — it is a profile of partial systems. A Functional score means, in the benchmark's own words, that "core practices exist and meet a basic bar, with significant gaps remaining." It is the band of *we started, but we cannot prove it worked* — systems that exist on paper or in pilot, without the disaggregated data, independent audit, or sustained-under-pressure evidence that the higher bands require.
- In the entire middle of the scale, the consistency bonus is switched off. The composite is `base score + integration premium`. For all 783 mid-band entities, that premium is **0.0** — because the bonus requires strong dimensions, and 778 of the 783 have every one of their eight dimensions scoring below the 4.0 "strong" threshold. In the middle, the composite *is* the plain average of the eight dimensions. The premium that shapes the top of the scale is a top-of-scale mechanic.
- A balanced 50 and a spiky 50 mean different things — but the middle is almost never spiky. Because the premium is dead in the middle, two entities with the same dimension average land at nearly the same composite regardless of shape. In practice mid-band profiles are strikingly flat: the "spikiest" entities near 50 still vary by only a quarter-point across all eight dimensions. The middle is where institutions are *uniformly mediocre*, not lopsided.
- The difference between a 39 and a 41 is a band boundary, not a margin of merit. Alphabet/Google sits at exactly 40.0 — one-tenth of a point inside Developing. Croatia at 40.6 is Functional; Greece at 39.1 is Developing. These neighbors differ by a rounding step, not by a meaningful gap in conduct. The band label is a useful summary, not a verdict on the last decimal.
- A Functional score can describe a country being actively attacked. Ukraine holds at 50.0 while under sustained assault, because the benchmark scores Ukraine's *own* conduct toward its people, not the harm inflicted on it. The middle is full of these contestable, context-heavy judgments — which is exactly why it is harder to read than the extremes.
- The fastest climb in the record still only reached the middle. Hungary recovered across multiple cycles to 50.2 — and remains Functional, two full bands below "good." The benchmark does not grade on trajectory or effort; it grades on proven, sustained conduct. Reaching the middle is not the same as being good, and the middle is where most recovery stories stall.
The field
1,156 entities across the five bands — the full distribution this briefing draws from.
1. Frame
The Compassion Benchmark has, to date, told its story through the extremes. The foundational briefing examined the 23 entities at the 0.0 floor and the 176 in the Critical band; the exemplars briefing examined the 64 at the top. These are the dramatic cases, and they are genuinely instructive. But together the floor-and-ceiling story covers a minority of the field. The two extreme bands — Critical and Exemplary — account for 240 of 1,156 entities (20.8%). The other 79.2% live between them.
And the bulk of that is the middle: the Developing band (composite 20–40) and the Functional band (composite 40–60) together hold 783 entities — 67.7% of everything the benchmark assesses. This is where a reader's own employer, city, and country almost certainly sit. It is the highest-traffic, most-searched region of the record, and it is the least explained.
This briefing is the on-ramp — how to read a score — and it asks three questions of the existing record:
- What does a middling score actually measure — is a 50 "half compassionate," or something else entirely?
- Why is the middle the hardest band to read — and what mechanic of the composite formula explains it?
- When does the difference between two mid-band scores matter, and when is it noise — a 39 vs. a 41, a balanced 50 vs. a spiky 50?
The central thesis: a middle score is not a fraction of goodness — it is a portrait of partial systems, and in the middle the composite collapses to the plain average of the eight dimensions because the consistency bonus that shapes the top of the scale is switched off everywhere below it. Once a reader understands that, the middle stops being "the boring part" and becomes the most legible — and most contestable — region of the benchmark.
2. The cohort
Recomputed directly from rankings[] in each index and reconciled against the canonical composite formula (scoring.mjs::computeCompositeFromDimensions). Mid-band = Developing (composite > 20 and ≤ 40) + Functional (composite > 40 and ≤ 60).
| Index | Total | Developing | Functional | Mid total | Mid % | Median composite |
|---|---|---|---|---|---|---|
| countries | 193 | 79 | 29 | 108 | 56.0% | 35.9 |
| fortune-500 | 448 | 215 | 117 | 332 | 74.1% | 35.9 |
| global-cities | 250 | 105 | 36 | 141 | 56.4% | 31.3 |
| ai-labs | 50 | 18 | 15 | 33 | 66.0% | 46.9 |
| robotics-labs | 50 | 10 | 10 | 20 | 40.0% | 60.9 |
| us-states | 21 | 9 | 0 | 9 | 42.9% | 25.0 |
| us-cities | 144 | 99 | 41 | 140 | 97.2% | 35.9 |
| Total | 1,156 | 535 | 248 | 783 | 67.7% | 35.9 (field) |
The single most important framing fact: the modal band is Developing, not Functional. 535 entities (46.3%) are Developing — nearly half the entire field — and the field-wide median composite is 35.9, a high-Developing score. The intuition that institutions average out somewhere around "Functional / 50" is wrong by a full band. The center of gravity of the assessed world sits at roughly 36.
Two structural notes on the cohort, both load-bearing for §3–§4:
- The middle is heavily quantized. The mid-band composites are not smoothly distributed: 219 entities sit at exactly 35.9 and 127 at exactly 48.4 — the two most common values in the entire middle. These correspond to near-uniform dimension profiles (all dimensions near 2.5 → 35.9; all near 3.0 → 48.4). Only 54 distinct composite values cover all 783 mid-band entities. The middle is a set of plateaus, not a gradient.
- us-cities is almost entirely mid-band (97.2%). US cities are scored on a domestic-service basis that rarely reaches either extreme; they are the purest expression of the "uniformly middling" pattern. By contrast, robotics-labs is the least mid-band index (40.0%) — its scores bunch at the harm-frontier extremes instead (the subject of a separate candidate briefing).
3. What a 50 actually means — the band vocabulary
The benchmark's five bands are defined once, canonically, in dimensions.ts (BANDS / BAND_DESCS). Read in order, they describe an evidentiary ladder — not a moral percentage:
| Band | Range | Canonical one-line definition |
|---|---|---|
| Critical | 0–20 | "Foundational compassion practices are absent or documented active harm is present." |
| Developing | 20–40 | "Some practices are emerging but remain inconsistent, reactive, or unevenly applied." |
| Functional | 40–60 | "Core practices exist and meet a basic bar, with significant gaps remaining." |
| Established | 60–80 | "Practices are systematic, documented, and supported by consistent evidence." |
| Exemplary | 80–100 | "Practices are independently verified, consistent, and sustained under pressure." |
Three things follow that a newcomer almost always gets wrong:
(a) A 50 is not "half compassionate." The scale is not a percentage of goodness. A Functional 50 means the institution has core practices that exist and meet a basic bar — and that those practices have significant gaps. The defining feature of the middle is not "half-effort" but partial systems: a process that exists on paper but is not consistently applied (Developing), or one that genuinely operates but lacks the disaggregated data, independent audit, or under-pressure track record the higher bands demand (Functional).
(b) The middle is the band of "we started, but we can't prove it worked." Look at what moves an entity from the middle to Established. Across the subdimension anchors in dimensions.ts, the jump from the mid-tier anchor (level 3) to the Established anchor (level 4) is almost always a jump from existence to evidence: from "some proactive mechanisms, inconsistent" to "multiple channels, formal pathways, regular review" (Suffering Detection); from "≥1 report disclosing an unflattering finding" to "annual report includes failures, gaps, corrective actions" (Transparency). The middle is where systems exist but the proof of efficacy is thin. That is the single most useful sentence a reader can carry away: a middle score means the systems are present and the evidence is not.
(c) Developing and Functional differ by consistency, not by intent. Developing is "emerging but inconsistent, reactive, unevenly applied." Functional is "core practices meet a basic bar." The boundary between them — the 40-point line — marks the transition from ad-hoc and reactive to reliably meeting a floor. A high-Developing entity has good practices that fire unevenly; a low-Functional entity has adequate practices that fire reliably. That distinction is real, but it is a distinction of reliability, and at the boundary it is fine-grained (see §5).
4. Why the middle is the hardest band to read — the dead premium
This is the mechanical heart of the briefing, and the most under-taught fact in the entire benchmark.
The canonical composite is two terms added together:
composite = baseComposite + integrationPremium
where baseComposite = ((meanof8_dimensions − 1) / 4) × 100 — the plain, rescaled average of the eight dimension scores — and the integration premium is a bonus of up to +10 points for an entity whose dimensions are both strong and even. The premium is 10 × consistencyMult × weaknessFactor, where:
consistencyMultrewards low variance across the eight dimensions, andweaknessFactor = max(0, 1 − weakDims × 0.2), whereweakDimsis the count of dimensions scoring below 4.0.
That weaknessFactor is the key. It hits zero the moment an entity has five or more dimensions below 4.0:
| Dimensions below 4.0 | weaknessFactor | Max possible premium |
|---|---|---|
| 0 | 1.0 | +10.0 |
| 1 | 0.8 | +8.0 |
| 2 | 0.6 | +6.0 |
| 3 | 0.4 | +4.0 |
| 4 | 0.2 | +2.0 |
| 5 or more | 0.0 | 0.0 |
Now the decisive empirical fact, recomputed across the cohort: of the 783 mid-band entities, the integration premium is 0.0 for all 783 — every single one. The reason is direct: 778 of the 783 have all eight dimensions scoring below 4.0 (weakDims = 8, weaknessFactor = 0), and the remaining five have six or seven weak dimensions (weaknessFactor still 0). A dimension at 4.0+ is the benchmark's threshold for systematic, evidenced, "Established"-grade performance — and a mid-band entity, by definition, does not have that on any dimension.
The consequence is the single most important reading rule for the middle:
In the Developing and Functional bands, the composite is exactly the rescaled average of the eight dimensions. The integration premium — the bonus that shapes the top of the scale — is switched off everywhere below ~60.
This is why the middle is the hardest band to read. At the top, the premium does real work: a balanced 70/70/70 profile beats a spiky 90/40 one, and the composite encodes consistency as well as level. A reader at the top must reason about shape. In the middle, there is no premium to reason about — the composite is the mean, full stop — and so the only information in the number is the average level of partial systems. The number is simpler, but it is also flatter and less discriminating: a 35.9 tells you "uniformly Developing," and almost nothing about which dimensions are the problem. To read a mid-band entity at all, you must look past the composite to the dimension profile — which is precisely the move newcomers don't make.
A worked illustration of the formula behavior makes the asymmetry concrete:
| Uniform profile (all 8 dims) | baseComposite | premium | composite | band |
|---|---|---|---|---|
| 2.5 | 37.5 | 0.0 | 37.5 | Developing |
| 3.0 | 50.0 | 0.0 | 50.0 | Functional |
| 3.5 | 62.5 | 0.0* | 62.5 | Established |
| 4.0 | 75.0 | +10.0 | 85.0 | Exemplary |
At 3.5 uniform, all dimensions are still below 4.0, so the premium remains 0 — the premium only ignites once dimensions cross the 4.0 "strong" line, which is why it is effectively an Established-and-above mechanic. The jump from a uniform 3.5 to a uniform 4.0 moves the composite by 22.5 points (12.5 of base + 10 of newly-unlocked premium): the entire architecture of the scale rewards crossing into evidenced* performance, and the middle is everything before that crossing.
5. Balanced vs. spiky, and the band-boundary question
Two practical reading questions follow from the dead premium.
(a) Does a balanced 50 differ from a spiky 50? In principle, yes — but the mechanic that would distinguish them is mostly inert in the middle, and in the record the middle is almost never spiky. Examine every entity sitting at composite 49–51:
- The flattest — Omnicom Group, 1X Technologies, Together AI — are perfectly uniform (
3/3/3/3/3/3/3/3, standard deviation 0.0). - The "spikiest" near 50 — Kyiv, Schunk, SoftBank Robotics, Ståubli Robotics — vary by only a quarter-point across all eight dimensions (standard deviation 0.25), almost always a single Equity dimension pulled down to 2.5 against an otherwise-flat 3.0.
There is no genuinely lopsided entity in the middle of the scale. The middle is where institutions are uniformly mediocre — adequate-ish across the board — rather than excellent-on-some-and-failing-on-others. (Genuine spikiness, where the premium would do real work, is a feature of the top of the scale.) The one subtlety worth flagging: because the premium can flicker positive once an entity gets four or fewer weak dimensions, a profile that pushes several dimensions to 4.0+ while leaving others low can actually outscore a flat profile of the same mean — but no current mid-band entity is shaped that way, which is itself a finding about how institutions improve (broadly, not in spikes).
(b) Is the difference between a 39 and a 41 meaningful? This is the band-boundary question, and the honest answer is: the band label flips, but the conduct barely differs. Twenty entities sit within two points of the Developing/Functional line (38–42). At the line itself:
| Entity | Composite | Band | Note |
|---|---|---|---|
| Greece (country) | 39.1 | Developing | one rounding step below the line |
| Alphabet/Google (F500) | 40.0 | Developing | exactly one-tenth inside Developing |
| Croatia (country) | 40.6 | Functional | one rounding step above the line |
| General Motors (F500) | 40.6 | Functional | — |
Alphabet/Google at exactly 40.0 is the cleanest case: a tenth of a point higher and the public-facing band label would read "Functional" instead of "Developing." The dimension profiles of Greece (39.1) and Croatia (40.6) are near-identical bundles of low-3s and high-2s. The band is a useful summary; the last decimal is not a verdict. A reader should treat a band as a bucket of conduct that looks broadly alike, and should never read a 1–2 point gap across the boundary as a meaningful difference in compassion. (This is the within-band analog of the cross-type incommensurability the floor briefing raised: the number can be more precise than the judgment it represents.)
6. The middle's marquee cases — and why they are contestable
The extremes are dramatic but simple; the middle is undramatic but contestable, and that is what makes it interesting. Three marquee mid-band cases, all verified from the index JSONs, show the kinds of judgment the middle is full of:
| Entity | Composite | Band | Profile (AWR/EMP/ACT/EQU/BND/ACC/SYS/INT) |
|---|---|---|---|
| Ukraine (country) | 50.0 | Functional | 3 / 3 / 3.5 / 2.5 / 3 / 3 / 3 / 3 |
| Hungary (country) | 50.2 | Functional | 3 / 2.9 / 3 / 2.7 / 3.1 / 3.1 / 3.1 / 3.15 |
| Microsoft (F500) | 65.3 | Established | 4 / 3.2 / 3.9 / 3.2 / 3.8 / 3.8 / 4 / 3 |
- Ukraine, 50.0, Functional — under active attack. Ukraine holds a Functional score while being attacked, because the benchmark scores Ukraine's own conduct toward its people, not the harm inflicted on it by an external aggressor. The June 14–15 daily record confirms the hold: a June 2 Russian mass strike killing 22+ civilians is documented, but the delta is 0 — "no structural change to state compassion infrastructure in-window." The naive read ("a country at war must be near the floor") is exactly wrong, and the middle is where that attribution logic is most visible.
- Hungary, 50.2, Functional — the recovery that stalled in the middle. Hungary's tracked history climbs across cycles (from the low-40s in mid-May to 50.2) on documented improvement — and it is still Functional, two bands below "good." The benchmark does not grade on trajectory or effort; it grades on proven, sustained conduct, and the fastest climbs in the record top out in the middle. (Hungary's profile is also the textbook flat mid-band shape: every dimension within a third of a point of 3.0.)
- Microsoft, 65.3, Established — but with a dead premium. Just above the middle, Microsoft is the cusp case that proves the §4 rule. Its premium is 0.0 — six of its eight dimensions are still below 4.0 (EMP 3.2, EQU 3.2, INT 3.0 among them) — so even an Established score here is pure base, carried entirely by two dimensions reaching 4.0 (AWR, SYS) lifting the average. Microsoft shows that the premium does not switch on the instant you leave the middle; it stays off until an entity assembles several evidenced dimensions, not just one or two.
The throughline: every mid-band score is a judgment about partial systems under specific context, and the context (a war, a recovery, a single strong dimension) is doing as much work as the number. That is why the middle is harder to read than the floor — and why reading it well means reading the profile, not the composite.
8. Forward view — what to watch
- The Developing/Functional boundary cluster. The 20 entities within two points of the 40-point line (Greece 39.1, Alphabet/Google 40.0, Croatia 40.6, General Motors 40.6, and the rest) are the entities whose public band label is most fragile. A single documented improvement or deterioration can flip the label without much changing the underlying conduct — the band crossings most likely to generate "why did X change band?" reader questions. This is the cohort to watch for label volatility, not score volatility.
- The first premium-earner in the middle. No mid-band entity currently earns any integration premium. The first entity to assemble four or fewer weak dimensions while remaining below 60 — i.e., to develop several genuinely evidenced (4.0+) dimensions alongside weaker ones — would be the first real test of whether the premium behaves sensibly near the band boundaries. It would also be a signal of uneven improvement (some dimensions racing ahead), a pattern the record does not yet contain.
- The recovery cases. Hungary (50.2) and the broader set of mid-band entities on documented upward trajectories are the live test of "does climbing toward good stall in the middle?" If a recovery case crosses 60 into Established, it will be the first to switch the premium on — and the first chance to watch the mid-to-upper transition mechanic in a real entity rather than a hypothetical.
- Quantization decay. As first-baseline placeholder profiles (35.9, 48.4 clusters) are replaced by measured assessments, the middle should de-quantize — spreading off the plateaus into a smoother distribution. The rate at which the 35.9 cluster (219 entities) shrinks is a direct measure of how much of the middle is genuinely measured vs. still provisional.
Sources
- Canonical scores (ground truth):
site/src/data/indexes/{countries,fortune-500,global-cities,ai-labs,robotics-labs,us-states,us-cities}.json— all cohort counts (783 mid-band; 535 Developing / 248 Functional), the per-index medians, the field median of 35.9, the quantization histogram (219 at 35.9, 127 at 48.4), and every named dimension profile (Ukraine, Hungary, Microsoft, Alphabet/Google, Croatia, Greece) were recomputed directly fromrankings[]and reconcile with the canonical formula (sparse top-band ruling caps aside, the mid-band reconciles within rounding). - Formula / methodology (the mechanic this briefing teaches):
site/scripts/lib/scoring.mjsandsite/src/lib/scoring.ts(computeCompositeFromDimensions:baseComposite + integrationPremium,weaknessFactor = max(0, 1 − weakDims × 0.2), the harm flag);site/src/data/dimensions.ts(the canonicalBANDS/BANDDESCSfive-band vocabulary;INTEGRATIONPREMIUM.short/.detail; the 8 dimensions / 40 subdimension anchors showing the existence-to-evidence jump from level 3 to level 4). - Longitudinal / provenance:
site/public/data/history/{ukraine,hungary,alphabet-google}.json(Ukraine's June 14–15 zero-delta hold under attack; Hungary's multi-cycle climb to 50.2; Alphabet/Google's hold at 40.0); the publishedfloor-and-criticalandexemplarsbriefings (the extremes this briefing complements) inresearch/special-briefings/. - Methodology questions: existing SBQ series in
research/PENDING_CHANGES.md(max ; this draft proposes through but does not file them while unpublished). - Fresh web evidence: none required for this edition. This is a structural/comprehension analysis of the existing record and the canonical formula; no external grounding was needed, and the web-search budget was not drawn down.
How to read the scores
The 0–100 scale — five bands
Every entity — state, corporation, AI lab, robotics lab, or city — is scored 0–100 across 8 dimensions and 40 subdimensions. The composite score places the entity in one of five bands:
The 8 dimensions
Each dimension is scored 1–5 across 5 subdimensions (40 subdimensions total), then converted to a 0–100 composite. A score of 1.0 on a subdimension represents the minimum anchor; 5.0 is exemplary conduct.
Scores are based on public evidence — government reports, regulatory filings, independent audits, judicial findings, and verifiable third-party records. Entities never pay for inclusion, score changes, or suppression of findings. Full methodology
Continue reading
June 16, 2026
Allegation, Indictment, Ruling — How the Benchmark Scores Accusations vs Proof
In a single fortnight, OpenAI was hit by a 42-state attorney-general subpoena and its score did not move; Oracle's documented severance terms moved it into the Critical band. That is not inconsistency — it is the discipline that keeps the benchmark citable. This briefing examines six entities to show the exact line the record draws between what is alleged and what is proven, and between conduct an institution chose and conduct a government forced on it.
Read briefingJune 16, 2026
The Equity Tax — The One Dimension That Drags Almost Everyone Down
The benchmark scores eight dimensions of institutional conduct. One of them — Equity, the fair distribution of care toward those with the greatest need and least power — is the weakest score for nine of every ten entities assessed, from authoritarian states to model corporations. This briefing measures that pattern across all 1,156 entities, shows the exact mechanism by which a single weak equity score caps an otherwise strong profile, and asks what it means that the institutions which get everything else right still fail the most vulnerable.
Read briefingJune 16, 2026
State of Exception — When Governments Codify Impunity
A cluster of governments is not falling to the bottom of the scale through single atrocities. It is legislating its way there — converting emergency powers, "extremist" designations, and election repression into durable, signed-into-law impunity. This briefing tracks that pattern across the Critical-band countries and examines its sharpest case: Bolivia's descent from 28.4 to 6.3 across four scoring cycles, the benchmark's first sequence in which a predicted trigger was named in advance and then realized.
Read briefingJune 16, 2026
The State of Institutional Compassion — 2026
This is the first comprehensive read on how institutions worldwide recognize, respond to, and reduce suffering. Across seven indexes, 1,156 institutions — every kind, from sovereign states to single-product labs — are scored on one shared 0–100 framework. The headline is sobering and consistent: the modal institution is mediocre, the tails are thin, and almost every institution on Earth, from the worst to the very best, is weakest at the same thing — fairness to those with the least power. This is the state of the field as of mid-2026.
Read briefingJune 16, 2026
What the Product Is For — Robotics and AI at the Harm Frontier
Sort the 50 robotics labs and 50 AI labs not by rank but by what their core product is *for*, and one gradient appears in both indexes at once: defense, surveillance, and weapons cluster at the floor; healthcare, accessibility, and assistive technology cluster at the ceiling. Compassion Benchmark is the only institution that scores robotics labs at all — there is no comparator. This briefing examines what that gradient is actually measuring, and where conduct and purpose come apart.
Read briefingJune 15, 2026
AI Governance Under Pressure — What a Shutdown, a Subpoena, and a Union Vote Actually Tell the Benchmark
In a single fortnight, the US government forced Anthropic to pull its two most powerful models, 42 state attorneys general subpoenaed OpenAI, and Google DeepMind's UK staff voted to unionize over military AI. The benchmark scores how institutions recognize and reduce suffering — not how much external pressure they attract. This briefing examines what each of those events does, and does not, say about an AI lab's compassion score.
Read briefingJune 15, 2026
Layoffs Despite Profits — When a Layoff Becomes a Compassion Failure
A 2026 Fortune 500 restructuring wave is testing a boundary the benchmark is only beginning to price: the difference between a layoff forced by distress and a layoff that protects margin while profits rise. Two cases set the new anchors — Procter & Gamble, downgraded out of the top tier for cutting 7,000 jobs "despite increasing profits," and Oracle, dropped into the Critical band for a 30,000-person cut wrapped in a "sign the release or forfeit your severance" ultimatum. This briefing examines what separates a Boundaries-neutral business decision from a scorable harm.
Read briefingJune 11, 2026
What Good Looks Like — Exemplars Across Entity Types
The same 0–100 scale that judges the worst also names the best. At the top, 64 entities across states, corporations, AI and robotics labs, and cities reach the Exemplary band. This briefing asks what high compassion actually looks like in the record — what dimension profile produces it, whether it is earned the same way across entity types, and why even the best institutions share a single, universal soft spot.
Read briefingJune 11, 2026
The Floor and the Critical Band — How the Benchmark Judges the Worst
A single 0–100 scale ranks states, corporations, AI and robotics labs, and cities together. At the bottom, that shared scale meets four entity types that fail in structurally different ways — and reach the bottom by different mechanics. This briefing examines the 176 entities in the Critical band and the 23 at the absolute floor, and asks what the record actually shows about how the worst are judged.
Read briefingRelated daily briefing
June 16, 2026 — daily benchmark
Cite this briefing
Copy-ready citation string for journalism, research, or academic use.
Compassion Benchmark. "The Middle of the Scale — What a 50 Actually Means." compassionbenchmark.com/updates/special/middle-of-the-scale-2026-06-16. Accessed [Month Year]. Independent — entities never pay for inclusion, score changes, or suppression of findings.
For methodology, see compassionbenchmark.com/methodology. Data terms: /data-licenses. Press resources: /media.
You just read a Special Briefing.
Weekly score highlights — institutional compassion findings
The week's top score movements and evidence-linked findings across 1,156 entities, delivered every Friday. Daily briefings publish on the site. Free.
No spam. Unsubscribe anytime. Your email is never shared.