Special BriefingThematic (event-triggered; thereafter monthly while the pressure cycle runs)June 15, 2026

AI Governance Under Pressure — What a Shutdown, a Subpoena, and a Union Vote Actually Tell the Benchmark

In a single fortnight, the US government forced Anthropic to pull its two most powerful models, 42 state attorneys general subpoenaed OpenAI, and Google DeepMind's UK staff voted to unionize over military AI. The benchmark scores how institutions recognize and reduce suffering — not how much external pressure they attract. This briefing examines what each of those events does, and does not, say about an AI lab's compassion score.

Scope: The 50-entity AI Labs index, plus the six big-tech AI actors carried in adjacent indexes (Microsoft, Alphabet/Google, Amazon, Meta — Fortune 500; OpenAI, Anthropic, DeepMind/Google, Meta AI, Amazon AWS AI — ai-labs). The analysis centers on the four labs that took the cycle's heaviest external pressure: Anthropic (59.1), OpenAI (27.5), DeepMind/Google (56.9), and the Alphabet/Microsoft/Amazon big-tech cluster.

Cohort: 50 labs in the AI Labs index — mean composite 43.6; 4 Exemplary, 8 Established, 15 Functional, 18 Developing, 5 Critical (3 at the 0.0 floor: xAI/Grok, Palantir AI, Character AI). · The four pressure-tested labs: Anthropic 59.1 (Functional), DeepMind/Google 56.9 (Functional), OpenAI 27.5 (Developing), and the Alphabet/Google corporate entity 40.0 (Developing). · Big-tech AI cross-carry: Microsoft 65.3 (Established), Alphabet/Google 40.0, Amazon 12.8 (Critical), Meta Platforms 7.8 (Critical) — all Fortune-500 composites.

If you remember one thing

Pressure arrived faster than practice matured. In a single fortnight the four marquee labs absorbed a government-mandated model shutdown, a 42-state subpoena, and a worker union vote — none of which moved a published composite. External governance is now testing these labs faster than their internal compassion infrastructure is changing.

Key Findings

Pressure arrived faster than practice matured. In a single fortnight the four marquee labs absorbed a government-mandated model shutdown, a 42-state subpoena, and a worker union vote — none of which moved a published composite. External governance is now testing these labs faster than their internal compassion infrastructure is changing.
A government-compelled shutdown is not self-inflicted harm. Anthropic disabling Fable 5 and Mythos 5 under a US Commerce export-control order is scored on Anthropic's *conduct in the event* — prompt compliance, public disclosure, an apology, and stated disagreement — which is mildly positive on Accountability. It held at 59.1 (one sub-threshold −0.3 day), not downgraded.
A 42-state subpoena is an allegation, not a ruling. The coalition probe into OpenAI — naming model sycophancy a "design flaw" and targeting marketing to minors and seniors — is the cycle's broadest accountability signal, but it is pre-adjudication. It applied downward sub-dimension pressure (a conservative reconstruction of 25.9) that stayed below the scoring threshold; OpenAI held at 27.5.
The same word — "pressure" — covers three different evidentiary bars. A signed export order (operative, compelled), a subpoena (alleged, unadjudicated), and a union vote (worker-voice signal, prospective) are not interchangeable evidence. The benchmark scores operative conduct, discounts allegations, and treats worker-voice as a forward indicator — and the record applies all three distinctions in this one cycle.
Refusal that costs money reads as integrity; acceptance of the same terms reads as a gap. Anthropic's refusal of Pentagon terms (a documented +1 Integrity signal even as the Pentagon labeled it a supply-chain risk) and OpenAI's acceptance of terms Anthropic refused (a −1.7 Integrity-gap downgrade) are the clearest demonstration that the benchmark scores conduct under pressure, not the pressure itself.
The compassion gap inside big tech is wide. Microsoft (65.3, Established) took a documented human-rights accountability step; Amazon (12.8) and Meta Platforms (7.8) sit in the Critical band of the Fortune 500. The label "AI lab" spans a 57-point range once the parent corporations are read alongside the model builders.
Worker voice is now a named pressure channel, not yet a scored one. The DeepMind/Google union vote (a reported 98% of ~300 London staff) targets military AI and Project Nimbus. It is currently a forward indicator — DeepMind/Google holds at 56.9 — and the open question is what, if anything, converts worker-voice into a scorable accountability signal.

The field

1,156 entities across the five bands — the full distribution this briefing draws from.

Source: Compassion Benchmark · CC-BY

1. Frame

The Compassion Benchmark scores how an institution recognizes, responds to, and reduces suffering. For an AI lab, that is a slow-moving thing to demonstrate: model cards, red-teaming, refusal ethics, harm-incident transparency, equitable access. In mid-June 2026, the external governance environment around the largest US labs moved much faster than any of that internal practice could. Inside a single fortnight (June 1–15):

the US Commerce Department issued an export-control directive forcing Anthropic to suspend its two most powerful models for all foreign nationals, and Anthropic disabled them for all users to comply;
a coalition of 42 state attorneys general served OpenAI with a subpoena naming model sycophancy a "design flaw" and targeting its marketing to minors and seniors — four days after OpenAI's confidential IPO filing;
Google DeepMind's UK staff voted overwhelmingly to unionize over the company's military-AI contracts and the Project Nimbus deal with Israel;
and the running backdrop — Pentagon contracting fights, the EFF's human-rights pressure on Google and Amazon, Microsoft's contrasting accountability step, and the EU AI Act's Digital Omnibus — kept the structural pressure on.

The central tension this briefing examines: what does a government-mandated shutdown, a 42-state subpoena, or a union vote actually say about a lab's compassion score? The answer the record gives is disciplined and counter-intuitive: in this cycle, none of these events moved a published composite. That is not the benchmark ignoring them. It is the benchmark applying three different evidentiary rules — conduct-vs-coercion, the pre-adjudication discount, and worker-voice-as-forward-indicator — to three different kinds of pressure. This briefing asks three questions of the existing record:

Conduct vs. coercion — is a compelled shutdown (Anthropic) scored the same as self-inflicted harm? (No — and §3 shows why.)
The pre-adjudication discount — does a 42-state subpoena (OpenAI) move a score? (Not yet — §4.)
Worker-voice and military-AI accountability — does a union vote (DeepMind) or contracting posture register? (As a forward indicator and an integrity signal, not a composite move — §5.)

The thesis: the benchmark is currently absorbing a faster, more adversarial governance environment with rules that were already in place — and it is holding the line that pressure is evidence to be classified, not a verdict to be scored. The strain is showing not in the scores but in the questions the cycle raises about where the discounts end.

2. The cohort

Recomputed directly from the index JSONs. The AI Labs index holds 50 entities, mean composite 43.6, distributed as: 4 Exemplary, 8 Established, 15 Functional, 18 Developing, 5 Critical — of which 3 are at the 0.0 floor (xAI/Grok, Palantir AI, Character AI). The pressure-tested labs and their big-tech parents:

Entity	Index	Composite	Band	Weakest dimensions	Pressure event this cycle
Microsoft	fortune-500	65.3	Established	EMP 3.2, EQU 3.2, INT 3.0	Human-rights accountability step (suspended services)
Anthropic	ai-labs	59.1	Functional	EQU 3.1, SYS 3.2, INT 3.2	Govt export-control shutdown (Fable 5 / Mythos 5)
DeepMind/Google	ai-labs	56.9	Functional	ACC 3.0, INT 3.0, BND 3.2	UK staff union vote; Project Nimbus / Pentagon
Alphabet/Google	fortune-500	40.0	Developing	INT 2.1, EQU 2.4, EMP 2.5	EFF "acknowledged risks, ignored responsibilities"
Amazon AWS AI	ai-labs	35.9	Developing	EMP 2.0, BND 2.0, INT 2.0	EFF Nimbus pressure (parent)
OpenAI	ai-labs	27.5	Developing	INT 1.7, ACC 1.9, BND 2.0	42-state AG subpoena
Meta AI	ai-labs	26.3	Developing	ACC 1.5, INT 1.8, EMP 2.1	(parent Meta Platforms 7.8, Critical)
Amazon	fortune-500	12.8	Critical	EMP 1.4, ACC 1.4, AWR 1.5	EEOC findings (pre-adjudication)
Meta Platforms	fortune-500	7.8	Critical	ACC 1.0, EMP 1.1, EQU 1.2	Platform-harm cluster

The single most important structural fact about this cohort: the four labs that took the heaviest external pressure this fortnight (Anthropic, OpenAI, DeepMind/Google, and the Alphabet parent) span a 31.6-point range (27.5 → 59.1), and not one of them moved a published composite on the pressure event. The amount of governance pressure an entity attracts is, on this record, uncorrelated with its compassion score — which is exactly what the methodology intends. A lab can be high-scoring and heavily pressured (Anthropic) or low-scoring and heavily pressured (OpenAI); the pressure is an input to be classified, not a score in itself.

A second structural fact: read alongside their parents, the big-tech AI actors span a 57.5-point range — Microsoft (65.3, Established) to Meta Platforms (7.8, Critical). "Big tech AI" is not a band; it is the full width of the scale.

3. Conduct vs. coercion — the Anthropic shutdown

On June 12, 2026, Commerce Secretary Howard Lutnick wrote to Anthropic CEO Dario Amodei directing that Fable 5 and Mythos 5 be subject to export controls for all foreign persons, inside or outside the US, after another company reportedly demonstrated a jailbreak of Mythos. To comply, Anthropic disabled both flagship models for all customers on June 12–13. The naive read — "the government just shut down a lab's most powerful products" — sounds like a maximal negative event. The benchmark's read is the opposite of naive, and it is the cleanest live demonstration of the conduct-vs-coercion rule in the corpus.

The benchmark scores Anthropic's behavior in the event, not the government's action or market sentiment about it. Anthropic's conduct was, verbatim:

Compliance: "We are complying with the government's legal directive and are removing access to Fable 5 and Mythos 5 for all users."
Apology: "We apologize for this disruption to our customers."
Stated disagreement (transparency): "We disagree that the finding of a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people."

That conduct profile — same-day compliance, public disclosure of the order, an apology with no legal deflection, and a transparent statement of disagreement — is mildly positive on Accountability/Transparency (AB1 Harm Acknowledgment, AB3 Transparency), not negative. The June 13 assessment confirmed the composite at 59.1, nudging only ACC (3.5 → 3.6) without changing the composite. The history shows one sub-threshold day: a −0.3 documented on June 14 (58.8, still Functional), then a hold back at 60.0/59.1. A compelled shutdown that the entity handles transparently is net-neutral-to-mildly-positive, because the harm (loss of model access) was inflicted by the government, and the only scorable surface is how Anthropic responded.

This is the same logic the benchmark applies to states: OCHA evidence of harm in Gaza is attributed to Israel's conduct, not the Palestinian Authority's, and so it reinforces but does not lower Palestine's score. Harm caused to an entity by an external actor is not harm caused by that entity. The Anthropic case is the AI-lab instance of that attribution rule.

The longitudinal record makes the contrast sharper still. Anthropic has spent most of the last month at the exact Functional/Established boundary (60.0), held there by an unresolved DC Circuit appeal (Anthropic v. Hegseth) over the Pentagon's designation of Anthropic as a supply-chain risk for refusing to remove safety guardrails. On May 29 the record logged a +1 Integrity signal precisely for that refusal — the lab maintained a costly safety posture under direct government and commercial pressure. The export-control shutdown sits inside that same arc: a lab repeatedly absorbing external coercion while keeping its own conduct intact. The pressure is intense; the conduct, on the record, holds.

4. The pre-adjudication discount — the OpenAI subpoena

On June 12, 2026, a coalition of 42 state attorneys general, led by New York AG Letitia James, served OpenAI with a subpoena — four days after its confidential S-1 IPO filing (June 8). The subpoena's scope is broad and squarely compassion-relevant: consumer and health data handling, marketing to vulnerable populations (children and seniors), age verification, safety-testing policies, and the behavioral properties of OpenAI's models — including model sycophancy named as a design flaw. A separate Florida civil suit (June 1) alleges ChatGPT validated a 16-year-old's suicidal ideation and supplied self-harm methods.

This is the broadest accountability signal in OpenAI's record. It is also, by methodology, an allegation under investigation, not an adjudicated finding — and that distinction is load-bearing. The June 15 reassessment is the cleanest live application of the pre-adjudication (Tier) discount in the corpus:

The probe applies genuine downward pressure on the right dimensions — a conservative reconstruction nudges ACC (1.9 → 1.7) for harm-acknowledgment / marketing-to-vulnerable-populations and AWR (2.2 → 2.0) for anticipatory awareness of harm to minors — yielding a reconstructed composite of 25.9.
That is a −1.6 delta, below the 5-point scoring threshold. The 42-state probe is a sub-dimension intensifier within the Developing band, not a scorable composite move. OpenAI holds at 27.5.

The ruling tag in the assessment frontmatter states it plainly: ALLEGATION-NOT-ADJUDICATED — sub-dimension pressure (ACC/AWR) within band, no scorable composite move. The watch conditions are pre-registered: any AG enforcement action, settlement, or adjudicated finding — as distinct from the current investigative subpoena — converts to a scorable ACC/INT downgrade, as would the Florida suit reaching adjudication.

This is consistent with how the benchmark treats corporate enforcement generally: Amazon's EEOC findings this same cycle (pregnant-worker February, disabled-worker April) were held as "pre-adjudication and consistent with the 17.8 Critical profile," and Oracle's 30,000-layoff coercive-severance event was scored only once the structure (sign-or-forfeit) was documented, not on allegation alone. The subpoena is the AI-lab instance of the same FILED-BUT-UNADJUDICATED discipline that keeps the Fortune-500 corporate cluster jammed just above the Critical line until a merits ruling lands.

The tension worth naming: OpenAI is already low (27.5, Developing, with INT at 1.7 — near the Critical band). Its score got there through adjudicated and structural conduct, not allegation — a −1.7 Integrity-gap downgrade (May 20) for accepting Pentagon terms Anthropic refused, and a −1.9 (May 27) for a documented failure-to-report leadership override. The subpoena is consistent with that trajectory but cannot, by itself, drive it lower. The pre-adjudication discount is doing real work here: it is the difference between an entity whose conduct is scored down and an entity that merely attracts a high-profile allegation. OpenAI is, on the record, both — but only the former moves the number.

5. Worker voice and military-AI accountability — DeepMind, Nimbus, and the big-tech split

The third pressure channel is the one the benchmark currently treats most cautiously: worker voice. Around 300 London-based Google DeepMind employees voted (a reported 98% in favor) to unionize with the CWU and Unite, demanding that DeepMind's Gemini models be blocked from military uses, that the Pentagon classified-network contracts end, and that the $1.2 billion Project Nimbus deal with Israel be terminated. The workers also sought an independent ethics oversight body and a right to refuse to contribute to projects on moral grounds. Management faced a 10-day window to respond voluntarily or face legal proceedings.

DeepMind/Google holds at 56.9 (Functional), with its weakest dimensions ACC (3.0), INT (3.0), and BND (3.2) — precisely the dimensions a worker revolt over military AI implicates (accountability for end-use, integrity of stated values, boundary-setting on harmful applications). The union vote is currently a forward indicator, not a scored event: it is a strong internal signal that workers perceive a values-conduct gap, but it is not, by itself, an adjudicated finding of harm or a change in the lab's operative conduct. The open question is what converts worker-voice into a scorable signal — a management refusal, a contract termination, or a documented retaliation would each register differently.

The military-AI accountability theme runs across the big-tech cluster and exposes a wide compassion split that the "AI lab" label conceals:

Microsoft (65.3, Established) took a documented step the benchmark already logged as a positive-watch: per EFF, it "suspended certain services after initial investigations raised serious concerns" about misuse of its cloud/AI infrastructure, and its Israel chief departed "amid an ethical controversy." Note the score is held sub-threshold within Established under COMPELLED-REMEDY-NOT-SELF-CORRECTION — the remedy is credited, but as a compelled response, not a self-initiated one. Microsoft's INT (3.0) is its own weakest dimension; the human-rights step is the kind of conduct that, if sustained and self-initiated, would test the Established/Exemplary boundary.
Alphabet/Google (40.0, Developing) and Amazon (12.8, Critical) are the EFF's named counter-example: the title of the April analysis is "Google and Amazon: Acknowledged Risks, and Ignored Responsibilities." Google's own internal assessments reportedly warned of Project Nimbus risks before signing; neither company responded substantively, and Amazon "fail[ed] to even acknowledge the request." That posture is consistent with their low scores — Alphabet's INT (2.1) is its weakest dimension; Amazon sits in the Critical band.
The contrast is the finding. Under the same external pressure (military-AI contracting, a human-rights spotlight), Microsoft took a (compelled) remedial step and Google/Amazon did not — and the scores already encode that divergence. Worker voice (DeepMind) and civil-society voice (EFF) are pushing on the same INT/ACC dimensions across all three; the benchmark is registering the responses, not the pressure.

Forward view — what to watch

The DC Circuit ruling (Anthropic v. Hegseth). This is the single highest-value forward trigger for the top of the AI-labs cohort. A favorable ruling credits Anthropic's costly refusal of Pentagon terms as an I1 governance signal and triggers the Functional/Established crossing (59.1 → 60.0+); an adverse ruling produces downward movement. It directly tests (how compelled vs. self-initiated conduct is scored at the top of the band).
The OpenAI subpoena's adjudication arc. Any of the 42 AGs filing an enforcement action, or the Florida suit reaching settlement or a finding, converts the current sub-threshold pressure (reconstruction 25.9) into a scorable downgrade that would push OpenAI toward the Critical band (INT already 1.7). This is the live test of . IPO timing — the subpoena landed four days after the S-1 — makes the disclosure environment more consequential.
DeepMind management's response to the union vote. A voluntary recognition, a refusal, a contract change, or any retaliation each register differently and are the live test of . The 10-day response window makes this the fastest-moving of the three pressure channels.
The big-tech human-rights split. Whether Microsoft sustains its (compelled) accountability step toward self-initiated conduct — and whether Google/Amazon move at all under continued EFF and worker pressure — is the medium-term arc that would test the Established/Exemplary boundary (Microsoft) or deepen the Developing/Critical positions (Alphabet/Amazon).
The EU AI Act Digital Omnibus. A structural regulatory backdrop rather than an entity event: as enacted obligations land, they convert from prospective posture (currently net-neutral) into operative compliance conduct that the benchmark can score — the regulatory analog of the conduct-vs-coercion rule, applied at scale.

Sources

Canonical scores (ground truth): site/src/data/indexes/ai-labs.json (50-entity roster, band distribution, Anthropic 59.1, DeepMind/Google 56.9, OpenAI 27.5, the three floor designations); site/src/data/indexes/fortune-500.json (Microsoft 65.3, Alphabet/Google 40.0, Amazon 12.8, Meta Platforms 7.8). All cohort counts and dimension vectors were recomputed directly from rankings[] and reconcile with the canonical composite formula (site/scripts/lib/scoring.mjs).
Assessment corpus (provenance): research/assessments/anthropic-2026-06-13.md (conduct-vs-coercion treatment of the export-control shutdown; ACC 3.5→3.6, composite confirm 59.1); research/assessments/openai-2026-06-15.md (ALLEGATION-NOT-ADJUDICATED ruling; reconstruction 25.9, −1.6 sub-threshold).
Daily-research artifacts: research/scans/2026-06-13-assessor-summary.json and research/scans/2026-06-15-assessor-summary.json (Anthropic government-compelled net-neutral note; OpenAI 42-state subpoena confirm; Amazon EEOC pre-adjudication; Oracle coercive-severance).
Longitudinal context: site/public/data/history/anthropic.json (DC Circuit boundary-watch arc, May 29 +1 Integrity for guardrail refusal, June 14 sub-threshold −0.3); site/public/data/history/openai.json (May 20 −1.7 Integrity-gap on Pentagon terms, May 27 −1.9 failure-to-report).
Ruling corpus: research/PENDING_CHANGES.md (COMPELLED-REMEDY-NOT-SELF-CORRECTION for Microsoft; MILITARY-AI-BY-CONTRACT-GOVERNANCE; FILED-BUT-UNADJUDICATED discipline; Microsoft positive-watch 2026-06-07).
Fresh web evidence (fetched, verbatim quotes ≤50 words):
Anthropic statement — anthropic.com/news/fable-mythos-access (compliance, apology, disagreement quotes).
Export-control directive context — Axios, Time, Fortune.
OpenAI 42-state subpoena — TechCrunch, TechTimes (sycophancy "design flaw").
DeepMind/Google union vote & Project Nimbus — Fortune.
Big-tech human-rights split — EFF: "Google and Amazon: Acknowledged Risks, and Ignored Responsibilities"; EFF: "Microsoft Took a Step Toward Human Rights Accountability".

How to read the scores

The 0–100 scale — five bands

Every entity — state, corporation, AI lab, robotics lab, or city — is scored 0–100 across 8 dimensions and 40 subdimensions. The composite score places the entity in one of five bands:

Critical0–20Foundational compassion practices are absent or documented active harm is present.

Developing20–40Some practices are emerging but remain inconsistent, reactive, or unevenly applied.

Functional40–60Core practices exist and meet a basic bar, with significant gaps remaining.

Established60–80Practices are systematic, documented, and supported by consistent evidence.

Exemplary80–100Practices are independently verified, consistent, and sustained under pressure.

The 8 dimensions

Each dimension is scored 1–5 across 5 subdimensions (40 subdimensions total), then converted to a 0–100 composite. A score of 1.0 on a subdimension represents the minimum anchor; 5.0 is exemplary conduct.

AWRAwarenessDoes this entity reliably detect when others are in pain or need — before they name it?

EMPEmpathyDoes this entity genuinely connect with the inner experience of those it serves?

ACTActionDoes compassionate understanding translate into real, proportional, effective help?

EQUEquityIs care distributed fairly — especially toward those with greatest need and least power?

BNDBoundariesIs helping sustainable, ethical, and autonomy-preserving — not dependency-creating?

ACCAccountabilityDoes this entity own its failures, correct course, and make genuine repair?

SYSSystemic ThinkingDoes compassion extend to root causes and structural change — not only symptom relief?

INTIntegrityIs compassion genuine, consistent, and non-performative — especially when it costs something?

Scores are based on public evidence — government reports, regulatory filings, independent audits, judicial findings, and verifiable third-party records. Entities never pay for inclusion, score changes, or suppression of findings. Full methodology

AI Governance Under Pressure — What a Shutdown, a Subpoena, and a Union Vote Actually Tell the Benchmark

Key Findings

The field

1. Frame

2. The cohort

3. Conduct vs. coercion — the Anthropic shutdown

4. The pre-adjudication discount — the OpenAI subpoena

5. Worker voice and military-AI accountability — DeepMind, Nimbus, and the big-tech split

Forward view — what to watch

Sources

The 0–100 scale — five bands

The 8 dimensions

Continue reading

America at 250: The Compassion Score of a Founding Promise

Famine as a Scored Event — One Hunger Evidence, Three Different Scores

Introducing the University Index — How We Score Universities on Compassion, Not Prestige

Aid Obstruction — When Institutions Stop Relief and Silence the Witnesses

The Denial Machine — When Coverage Becomes the Harm

The University Index — The Prestige–Compassion Gap

Allegation, Indictment, Ruling — How the Benchmark Scores Accusations vs Proof

The Equity Tax — The One Dimension That Drags Almost Everyone Down

The Middle of the Scale — What a 50 Actually Means

State of Exception — When Governments Codify Impunity

The State of Institutional Compassion — 2026

What the Product Is For — Robotics and AI at the Harm Frontier

Layoffs Despite Profits — When a Layoff Becomes a Compassion Failure

What Good Looks Like — Exemplars Across Entity Types

The Floor and the Critical Band — How the Benchmark Judges the Worst

Weekly score highlights — institutional compassion findings