Compassion Benchmark
Special BriefingThematic (one-off; revisit on each AI/robotics index refresh)June 16, 2026

What the Product Is For — Robotics and AI at the Harm Frontier

Sort the 50 robotics labs and 50 AI labs not by rank but by what their core product is *for*, and one gradient appears in both indexes at once: defense, surveillance, and weapons cluster at the floor; healthcare, accessibility, and assistive technology cluster at the ceiling. Compassion Benchmark is the only institution that scores robotics labs at all — there is no comparator. This briefing examines what that gradient is actually measuring, and where conduct and purpose come apart.

Scope: The two technology indexes — 50 robotics labs (`robotics-labs.json`, by `category`) and 50 AI labs (`ai-labs.json`, by `sector`), 100 entities. The analysis re-sorts both by the *purpose of the core product* rather than by rank.

Cohort: 100 entities across the two technology indexes — 50 robotics labs, 50 AI labs. · The defense / surveillance / weapons cohort (9 entities) has a median composite of 20.3 (Critical/Developing). · The robotics healthcare / accessibility / assistive cohort (9 Exemplary entities) has a median of 83.0; AI healthcare and drug-discovery sit at 60.9. · All four entities at the absolute 0.0 floor across both indexes are product-defined: Ghost Robotics (weaponized quadrupeds), Palantir AI (military/ICE targeting), xAI/Grok (un-guardrailed model), Character AI (un-bounded companion bots).

If you remember one thing

In both technology indexes, the purpose of the product tracks the score. Re-sorted by end-use, robotics labs run from a Defense/Security median of 9.4 to a Healthcare/Accessibility median of 95.9; AI labs run from AI/Government (0.0) and AI/Surveillance (10.9) up to AI Safety/Research (70.3) and AI/Open Source (69.0). The gradient is the single most legible pattern in either index.

Key Findings

  1. In both technology indexes, the purpose of the product tracks the score. Re-sorted by end-use, robotics labs run from a Defense/Security median of 9.4 to a Healthcare/Accessibility median of 95.9; AI labs run from AI/Government (0.0) and AI/Surveillance (10.9) up to AI Safety/Research (70.3) and AI/Open Source (69.0). The gradient is the single most legible pattern in either index.
  2. Every entity at the absolute floor is there because of what it builds, not merely how it behaves. The four 0.0 entities — Ghost Robotics, Palantir AI, xAI/Grok, Character AI — share the identical all-minimum profile. In each case the floor designation rests on a product whose core function leaves no remediation surface to credit: a weaponized robot, a targeting system, a model stripped of guardrails, a companion bot deployed to minors without bounds.
  3. The same lab, two products, a 45-point gap. Boston Dynamics appears twice: its research/industrial line scores 65.6 (Established), while its weaponized "SPOT-demo" defense entry scores 20.3 — a 45.3-point spread within one institution. Purpose, not corporate identity, is what moves the number.
  4. Purpose flows into three specific dimensions — Boundaries, Accountability, Integrity. The floor designations name the same primary drivers: BND (refusing to restrict harmful use), ACC (no published safety evaluation, no harm-disclosure), INT (renouncing downstream-harm responsibility). The benchmark is not scoring "defense work" as a category; it is scoring the refusal-to-restrict, the absent accountability framework, and the values-conduct gap that cluster around weaponized and surveillance products.
  5. A pro-social product is necessary but not sufficient at the top. The healthcare/accessibility ceiling is real, but it is reached on a narrow surface — a single assistive product line — without the whole-of-population test a state faces. Robotics is 26% Exemplary, far above any other index, partly because a narrow pro-social mandate satisfies the band easily.
  6. The conduct-versus-purpose line is the live question. Anduril (AI/Defense, 31.3) and Moog (Defense/Industrial, 48.4) build for defense yet sit well above the floor, because they retain published policy, restriction, and accountability structure. The floor is reserved for product-purpose plus refusal-to-restrict plus absent accountability — not for defense work as such.
  7. This is an unoccupied citation lane. No other institution scores robotics labs on harm accountability at all. The purpose-to-score gradient is a finding only this record can produce.

The field

1,156 entities across the five bands — the full distribution this briefing draws from.

17715%53947%24521%13211%Critical 0–20Developing 20–40Functional 40–60Established 60–80Exemplary 80–100
Source: Compassion Benchmark · CC-BY

1. Frame

The Compassion Benchmark scores 50 robotics labs and 50 AI labs on the same eight dimensions it applies to states, corporations, and cities. Both indexes carry a field most others lack: a label for what the lab's core product is forcategory in robotics (Healthcare/Accessibility, Defense/Security, Industrial, Education…) and sector in AI (AI/Healthcare, AI/Surveillance, AI/Defense, AI/Government…). This briefing ignores rank and re-sorts both indexes by that field, asking one question of the existing record:

Does the purpose of the product predict the compassion score — and if so, what is the benchmark actually measuring when it does?

The thesis is that it does, cleanly and in both indexes at once: products built to harm, surveil, or kill cluster at the floor; products built to restore mobility, assist, or care cluster at the ceiling. This is defensible — a weapon has no remediation surface, and the benchmark's floor test is precisely "no remediation surface to credit." But it raises a real tension. If the product floors an entity at the bottom and lifts it at the top, is the benchmark measuring compassion conduct or product teleology? The honest answer the record supports is: mostly conduct, channeled through purpose — purpose determines which conduct is even possible, and then conduct (refusal-to-restrict, absent accountability, renounced responsibility) decides where on the gradient an entity lands. Anduril and Ghost Robotics both build for defense; only one is at the floor.

A second, structural point: Compassion Benchmark is the only institution that scores robotics labs on harm accountability at all. There is no comparator index, no peer ranking, no external floor designation for a company like Ghost Robotics. Whatever this record says about the harm frontier of automation, it says alone.


2. The cohort — the purpose gradient, both indexes

Re-sorted by category / sector and reported as medians (both indexes are quantized — many entities share identical composites — so means would be misleading; see §6).

Robotics labs, by category (median composite)

Category bandMediannRangeRepresentative entities
Defense/Security9.430.0–20.3Ghost Robotics (0.0), Paladin AI/Shield AI (9.4), Boston Dynamics SPOT-demo (20.3)
Industrial/Defense35.9235.9Kawasaki Heavy, Sarcos
Consumer / Labor~32423.4–35.9Tesla Optimus (31.2), UBTECH (35.9), Hanson (34.4)
Industrial48.4823.4–60.9Universal Robots, Omron, Figure AI
Research / Service60.91135.9–65.6Boston Dynamics research line (65.6), Naver, Neura
Healthcare/Rehab83.0760.9–83.0Cyberdyne, Ekso, ReWalk, Wandercraft
Education85.0185.0Apexica (RoboKind)
Healthcare/Accessibility95.9383.0–97.5Open Bionics (97.5), Ottobock (95.9), Kinova

AI labs, by sector (median composite)

Sector bandMediannRangeRepresentative entities
AI/Government0.010.0Palantir AI
AI/Surveillance10.9110.9Clearview AI
AI/Consumer21.930.0–48.4Character AI (0.0), Replika (21.9), Inflection (48.4)
AI Research/Social26.3126.3Meta AI
AI/Defense31.3131.3Anduril
AI/Creative32.8521.9–35.9Runway, Stability, Midjourney
AI/Healthcare60.9260.9Abridge, Tempus AI
AI/Drug Discovery60.9260.9Isomorphic Labs, Recursion
AI/Open Source69.0250.0–88.1Hugging Face (88.1), Together AI
AI Safety/Research70.3259.1–81.4Imbue (81.4), Anthropic (59.1)

The single most important structural fact: across both indexes, all four entities at the absolute 0.0 floor are product-defined, and all four carry the identical all-minimum profile ([1/1/1/1/1/1/1/1]):

EntityIndexProduct purposeFloor designation date
Ghost RoboticsroboticsWeaponized quadrupeds (Defense/Security)2026-05-06
Palantir AIai-labsMass-scale targeting / ICE enforcement (AI/Government)2026-04-30
xAI/Grokai-labsUn-guardrailed public LLM (AI Research)2026-04-30
Character AIai-labsUn-bounded companion bots, deployed to minors (AI/Consumer)2026-05-07

The floor across these indexes is not "a very low score." It is the simultaneous collapse of all eight dimensions to the anchor minimum, which the canonical composite renders as exactly 0.0 — reserved by ruling for a product whose core function is the unremediable harm.


3. The floor four — the product is the harm

The defining feature of the bottom of both technology indexes is that the floored entities are not there for an isolated incident or an unadjudicated allegation. They are there because the product itself, by design, has no remediation surface. The floor designations are explicit on this point, and they name the same three primary drivers in every case: Boundaries (BND), Accountability (ACC), Integrity (INT).

  • Ghost Robotics (0.0, Defense/Security). The floor rationale cites "explicit refusal to restrict military payload use," refusal to sign the 2022 industry anti-weaponization pledge that six peer firms signed, a "sniper-rifle-equipped quadruped deployed to US-Mexico border," no published model or system card, and "entity-level renunciation of downstream-harm responsibility." Primary drivers: BND, ACC, INT. The fresh record confirms each load-bearing fact: the company's CEO position ("If it is a weapon that they need to put on our robot to do their job, we are happy for them to do that"), the October 2021 sniper-rifle quadruped, the February 2022 border deployment, and the company's pointed absence from the October 2022 Boston Dynamics-led pledge are all independently documented.
  • Palantir AI (0.0, AI/Government). The rationale cites "AI products built specifically to enable mass-scale lethal targeting and immigration enforcement," leadership rhetoric endorsing lethal use, no published model behavior or safety policy, and "no third-party accountability." "Composite resolves at zero because compassion infrastructure is absent by design." Primary drivers: AWR, EMP, ACC, EQU, INT.
  • xAI/Grok (0.0, AI Research). The rationale cites "deliberate removal of safety guardrails," public deployment of a model that "produces antisemitic and violent content on demand," founder-directed alignment toward propaganda objectives, and "zero functional accountability or evidence-of-care infrastructure." Primary drivers: AWR, EMP, ACC, INT.
  • Character AI (0.0, AI/Consumer). The rationale cites the Pennsylvania AG May 2026 enforcement action (a chatbot posing as a licensed psychiatrist with a fabricated license number), the January 2026 settlement of five wrongful-death/harm lawsuits — two involving minors who died by suicide — 20M+ monthly users with no model card or safety policy, and a "sustained reactive remediation pattern." Primary drivers: BND, ACC, EMP, INT. The fresh record confirms the January 7, 2026 mediated settlement covering the Sewell Setzer III case and four others.

The common thread is not the sector label. It is the combination the rulings keep naming: a harmful-by-design product, plus a structural refusal to restrict its use, plus the absence of any accountability framework. That is what crosses the floor — and it is conduct, expressed through and amplified by the purpose of the product.


4. How purpose flows into the dimensions

The benchmark does not have a "defense penalty." It has eight dimensions, and the purpose of a product determines which of them are even reachable. The harm-frontier entities fail in a recognizable, repeating shape — and it is concentrated in exactly the three dimensions the floor designations name.

DimensionWhat it asks (per dimensions.ts)How a harm-frontier product fails it
BND (Boundaries)"Does this entity refuse harmful practices even when profitable? Does it decline with dignity, set scope, obtain consent?"A weaponized or surveillance product is the refusal-to-set-a-boundary. Ghost Robotics' refusal to restrict payloads is a B4/B3 collapse by design.
ACC (Accountability)"Does this entity own its failures, disclose performance and harm, make repair?"No published model/system card, no third-party harm evaluation, no incident-disclosure process — the recurring ACC finding across all four floored entities.
INT (Integrity)"Is conduct consistent regardless of who is watching; is the values-behavior gap acknowledged?""Entity-level renunciation of downstream-harm responsibility" (Ghost), founder-directed propaganda alignment (xAI) — direct I2/I4 failures.

The clearest demonstration that the benchmark is scoring purpose-channeled conduct rather than corporate identity is Boston Dynamics, which appears twice:

EntryCategoryCompositeBNDINT
Boston Dynamics (research line)Research/Industrial65.6 (Established)3.54.0
Boston Dynamics (SPOT demo)Defense/Security20.3 (Developing)1.51.5

The same institution, two products, a 45.3-point gap — concentrated precisely in BND and INT, the dimensions a weaponized deployment implicates. Boston Dynamics signed the 2022 anti-weaponization pledge; the gap is the cost of the deployment that pulls against that commitment. No corporate-identity variable could produce this; only product purpose does.


5. The ceiling — pro-social, but on a narrow surface

The top of the robotics index is a clean healthcare/accessibility cluster: Open Bionics (97.5, assistive prosthetics), Ottobock (95.9), and seven mobility/rehab firms (Cyberdyne, Ekso, ReWalk, Wandercraft, Kinova, et al.) at 83.0. The robotics Healthcare/Accessibility median is 95.9; Healthcare/Rehab is 83.0. In AI, the healthcare and drug-discovery sectors (Abridge, Tempus, Isomorphic, Recursion) sit at 60.9, and the safety/open-source sectors top out (Hugging Face 88.1, Imbue 81.4).

But the ceiling is reached on a narrower surface than the floor is reached, and this is the symmetric tension. A floored entity is judged on a refusal-to-restrict that runs through its whole conduct. A ceiling entity is, in several cases, an assistive-prosthetics company with a single intrinsically pro-social product line — and the band does not impose anything like a sovereign's whole-of-population test or a diversified corporation's stakeholder-breadth test. This is why robotics is 26% Exemplary (13 of 50) — far above any other index (the Fortune 500 is under 2%). A narrow pro-social mandate satisfies the band easily.

The reading the record supports: a pro-social product is necessary but not sufficient at the top. The healthcare cluster also carries real conduct evidence (consistent AWR/EMP/ACT/ACC at 4.0+). But the benchmark should be explicit that the top of these indexes is partly a product-category effect, not purely a demonstrated-compassion effect — the exact symmetric counterpart to the floor mechanic.


6. Conduct vs. purpose — the line, and the quantization caveat

The line is real, and the record draws it. Not every defense or surveillance entity is at the floor:

EntitySector/CategoryCompositeWhy above the floor
Moog Inc.Defense/Industrial48.4Diversified industrial; defense is one line, not the renounced-responsibility posture
Kawasaki / SarcosIndustrial/Defense35.9Industrial-primary; defense adjacent, not weaponized-product-defined
AndurilAI/Defense31.3Defense-primary, but retains published structure; not a floor designation
Clearview AIAI/Surveillance10.9Surveillance product, near-floor — but retains some detectable structure (not all-1)
Boston Dynamics SPOT-demoDefense/Security20.3Weaponized demonstration, not refusal-to-restrict + renounced responsibility

The floor (0.0) is reserved for product-purpose + refusal-to-restrict + absent accountability framework, together. Anduril builds for defense and sits at 31.3 because it has not renounced downstream-harm responsibility wholesale or refused every restriction; Ghost Robotics has, and is at 0.0. That distinction — purpose alone does not floor you; purpose plus the refusal-and-absence conduct does — is the load-bearing one, and it is what keeps the gradient an analysis of conduct rather than a category penalty.

The caveat that disciplines every cluster number here: both indexes are heavily quantized. Many entities sit at pixel-identical composites (six robotics labs at exactly 83.0; clusters at 60.9, 48.4, 35.9) reflecting uniform-anchor profiles consistent with placeholder first-baselines rather than independently measured assessment. This is why this briefing reports medians and named leaders/laggards, never treats identical composites as independent measurements, and rests its argument on the named anchor entities (the floor four, the Boston Dynamics split, the Open Bionics/Ghost Robotics poles) rather than on cluster means. The gradient holds on the named entities regardless of how the quantized middle is eventually re-baselined.


8. Forward view — what to watch

  • Humanoid and automation scaling is the fastest 2026 deployment story, and the harm-frontier cohort is where it will be tested first. Tesla Optimus (31.2, Labor/Consumer) and Figure AI (37.5 in AI / 48.4 in robotics) are the labor-displacement entities to watch; a documented weaponization or surveillance contract at any of them is the most likely new floor-conversion event.
  • The conduct line at the defense midband. Anduril (31.3) and the Industrial/Defense cluster (Kawasaki, Sarcos, 35.9) are the entities where the conduct-vs-purpose distinction will be stress-tested: a refusal-to-restrict statement or a renounced-responsibility posture from any of them would test whether the floor test (Q1) holds as conduct rather than category.
  • Floor exits require product change, not statements. The Ghost Robotics and Character AI designations specify exit criteria — binding restrictions on use, published safety evaluation, an accountability framework, institutional acknowledgment. No floored technology entity has met them. A documented, audited product-level change at any of the four would be the most significant possible movement at the bottom of these indexes.
  • The unoccupied lane. Because no comparator scores robotics labs, any external adoption of a robotics harm-accountability standard — an industry pledge revision, a procurement rule, a regulator's safety-card requirement — would be the first external validation of the gradient this record describes alone.

Sources

How to read the scores

The 0–100 scale — five bands

Every entity — state, corporation, AI lab, robotics lab, or city — is scored 0–100 across 8 dimensions and 40 subdimensions. The composite score places the entity in one of five bands:

Critical0–20Foundational compassion practices are absent or documented active harm is present.
Developing20–40Some practices are emerging but remain inconsistent, reactive, or unevenly applied.
Functional40–60Core practices exist and meet a basic bar, with significant gaps remaining.
Established60–80Practices are systematic, documented, and supported by consistent evidence.
Exemplary80–100Practices are independently verified, consistent, and sustained under pressure.

The 8 dimensions

Each dimension is scored 1–5 across 5 subdimensions (40 subdimensions total), then converted to a 0–100 composite. A score of 1.0 on a subdimension represents the minimum anchor; 5.0 is exemplary conduct.

AWRAwarenessDoes this entity reliably detect when others are in pain or need — before they name it?
EMPEmpathyDoes this entity genuinely connect with the inner experience of those it serves?
ACTActionDoes compassionate understanding translate into real, proportional, effective help?
EQUEquityIs care distributed fairly — especially toward those with greatest need and least power?
BNDBoundariesIs helping sustainable, ethical, and autonomy-preserving — not dependency-creating?
ACCAccountabilityDoes this entity own its failures, correct course, and make genuine repair?
SYSSystemic ThinkingDoes compassion extend to root causes and structural change — not only symptom relief?
INTIntegrityIs compassion genuine, consistent, and non-performative — especially when it costs something?

Scores are based on public evidence — government reports, regulatory filings, independent audits, judicial findings, and verifiable third-party records. Entities never pay for inclusion, score changes, or suppression of findings. Full methodology

Continue reading

Companion

June 16, 2026

Allegation, Indictment, Ruling — How the Benchmark Scores Accusations vs Proof

In a single fortnight, OpenAI was hit by a 42-state attorney-general subpoena and its score did not move; Oracle's documented severance terms moved it into the Critical band. That is not inconsistency — it is the discipline that keeps the benchmark citable. This briefing examines six entities to show the exact line the record draws between what is alleged and what is proven, and between conduct an institution chose and conduct a government forced on it.

Read briefing
Companion

June 16, 2026

The Equity Tax — The One Dimension That Drags Almost Everyone Down

The benchmark scores eight dimensions of institutional conduct. One of them — Equity, the fair distribution of care toward those with the greatest need and least power — is the weakest score for nine of every ten entities assessed, from authoritarian states to model corporations. This briefing measures that pattern across all 1,156 entities, shows the exact mechanism by which a single weak equity score caps an otherwise strong profile, and asks what it means that the institutions which get everything else right still fail the most vulnerable.

Read briefing
Companion

June 16, 2026

The Middle of the Scale — What a 50 Actually Means

The benchmark's two foundational briefings spent the extremes: the 23 at the floor and the 64 at the top, together under 9% of the field. But almost every entity a reader looks up — their employer, their city, their country — lives in the vast Developing and Functional middle. This briefing is the on-ramp: what a middling score actually measures, why a balanced 50 and a spiky 50 are not the same thing, and why the "boring" middle is the hardest band to read.

Read briefing
Companion

June 16, 2026

State of Exception — When Governments Codify Impunity

A cluster of governments is not falling to the bottom of the scale through single atrocities. It is legislating its way there — converting emergency powers, "extremist" designations, and election repression into durable, signed-into-law impunity. This briefing tracks that pattern across the Critical-band countries and examines its sharpest case: Bolivia's descent from 28.4 to 6.3 across four scoring cycles, the benchmark's first sequence in which a predicted trigger was named in advance and then realized.

Read briefing
Companion

June 16, 2026

The State of Institutional Compassion — 2026

This is the first comprehensive read on how institutions worldwide recognize, respond to, and reduce suffering. Across seven indexes, 1,156 institutions — every kind, from sovereign states to single-product labs — are scored on one shared 0–100 framework. The headline is sobering and consistent: the modal institution is mediocre, the tails are thin, and almost every institution on Earth, from the worst to the very best, is weakest at the same thing — fairness to those with the least power. This is the state of the field as of mid-2026.

Read briefing
Companion

June 15, 2026

AI Governance Under Pressure — What a Shutdown, a Subpoena, and a Union Vote Actually Tell the Benchmark

In a single fortnight, the US government forced Anthropic to pull its two most powerful models, 42 state attorneys general subpoenaed OpenAI, and Google DeepMind's UK staff voted to unionize over military AI. The benchmark scores how institutions recognize and reduce suffering — not how much external pressure they attract. This briefing examines what each of those events does, and does not, say about an AI lab's compassion score.

Read briefing
Companion

June 15, 2026

Layoffs Despite Profits — When a Layoff Becomes a Compassion Failure

A 2026 Fortune 500 restructuring wave is testing a boundary the benchmark is only beginning to price: the difference between a layoff forced by distress and a layoff that protects margin while profits rise. Two cases set the new anchors — Procter & Gamble, downgraded out of the top tier for cutting 7,000 jobs "despite increasing profits," and Oracle, dropped into the Critical band for a 30,000-person cut wrapped in a "sign the release or forfeit your severance" ultimatum. This briefing examines what separates a Boundaries-neutral business decision from a scorable harm.

Read briefing
Companion

June 11, 2026

What Good Looks Like — Exemplars Across Entity Types

The same 0–100 scale that judges the worst also names the best. At the top, 64 entities across states, corporations, AI and robotics labs, and cities reach the Exemplary band. This briefing asks what high compassion actually looks like in the record — what dimension profile produces it, whether it is earned the same way across entity types, and why even the best institutions share a single, universal soft spot.

Read briefing
Companion

June 11, 2026

The Floor and the Critical Band — How the Benchmark Judges the Worst

A single 0–100 scale ranks states, corporations, AI and robotics labs, and cities together. At the bottom, that shared scale meets four entity types that fail in structurally different ways — and reach the bottom by different mechanics. This briefing examines the 176 entities in the Critical band and the 23 at the absolute floor, and asks what the record actually shows about how the worst are judged.

Read briefing

Related daily briefing

June 16, 2026 — daily benchmark

Cite this briefing

Copy-ready citation string for journalism, research, or academic use.

Compassion Benchmark. "What the Product Is For — Robotics and AI at the Harm Frontier." compassionbenchmark.com/updates/special/what-the-product-is-for-2026-06-16. Accessed [Month Year]. Independent — entities never pay for inclusion, score changes, or suppression of findings.

For methodology, see compassionbenchmark.com/methodology. Data terms: /data-licenses. Press resources: /media.

You just read a Special Briefing.

Weekly score highlights — institutional compassion findings

The week's top score movements and evidence-linked findings across 1,156 entities, delivered every Friday. Daily briefings publish on the site. Free.

No spam. Unsubscribe anytime. Your email is never shared.