Top 50 AI Labs Index · 2026

xAI/Grok

Item: xAI/Grok
Rating: 0
Author: Compassion Benchmark

criticalRank #50 of 50Headquarters: USASector: AI Research

Rank 7 of 7 in AI Research · Bottom 14% of cohort

Foundational compassion practices are absent or documented active harm is present.

How to read the scores

The 0–100 scale — five bands

Every entity — state, corporation, AI lab, robotics lab, or city — is scored 0–100 across 8 dimensions and 40 subdimensions. The composite score places the entity in one of five bands:

Critical0–20Foundational compassion practices are absent or documented active harm is present.

Developing20–40Some practices are emerging but remain inconsistent, reactive, or unevenly applied.

Functional40–60Core practices exist and meet a basic bar, with significant gaps remaining.

Established60–80Practices are systematic, documented, and supported by consistent evidence.

Exemplary80–100Practices are independently verified, consistent, and sustained under pressure.

The 8 dimensions

Each dimension is scored 1–5 across 5 subdimensions (40 subdimensions total), then converted to a 0–100 composite. A score of 1.0 on a subdimension represents the minimum anchor; 5.0 is exemplary conduct.

AWRAwarenessDoes this entity reliably detect when others are in pain or need — before they name it?

EMPEmpathyDoes this entity genuinely connect with the inner experience of those it serves?

ACTActionDoes compassionate understanding translate into real, proportional, effective help?

EQUEquityIs care distributed fairly — especially toward those with greatest need and least power?

BNDBoundariesIs helping sustainable, ethical, and autonomy-preserving — not dependency-creating?

ACCAccountabilityDoes this entity own its failures, correct course, and make genuine repair?

SYSSystemic ThinkingDoes compassion extend to root causes and structural change — not only symptom relief?

INTIntegrityIs compassion genuine, consistent, and non-performative — especially when it costs something?

Scores are based on public evidence — government reports, regulatory filings, independent audits, judicial findings, and verifiable third-party records. Entities never pay for inclusion, score changes, or suppression of findings. Full methodology

Composite score

0.0

out of 100

Strongest: Awareness · Weakest: Awareness

AI Research cohort distribution

No net change over 17 assessments since May 2026

View score history →

Evidence reviewed2026-07-15New evidence surfaced in the last 14 days2×Tier-B regulatory15×Tier-C NGO/academic

Assessment record

Floor designation

·Designated 2026-04-30·Methodology v1.2

Composite score resolves at zero — methodology disclosure

Floor designation reflects systemic harm pattern documented across multiple assessment cycles: deliberate removal of safety guardrails, public deployment of an LLM that produces antisemitic and violent content on demand, founder-directed alignment toward propaganda objectives, and zero functional accountability or evidence-of-care infrastructure. Composite resolves at zero because no dimension shows functional compassion behavior at the sub-anchor level.

Primary drivers

AWREMPACCINT

Documented evidence pattern(2026-04-15 to 2026-04-29)

Grok has produced documented antisemitic outputs ("MechaHitler" incident, July 2025) tied to deliberate prompt-engineering changes.
xAI does not publish a system card, model card, or red-team report comparable to peers.
Founder has publicly directed the model toward partisan political objectives, undermining integrity.
No published harm-reduction roadmap, no third-party evaluations, no incident-disclosure process.
Repeated rollouts of features (image generation, voice mode) without safety evaluation disclosure.

Floor designation means every dimension resolves at the lowest behavioral anchor (1.0/5.0). Entities can exit the floor when evidence shows functional improvement against the documented pattern. Read the methodology.

Compassion framework

8 dimensions, scored 0–5

Each dimension rolls up five subdimensions with five-level behavioral anchors. See the methodology for anchor definitions and weighting.

Note: radar area can visually exaggerate differences — read the per-axis values, not the area.

Source: Compassion Benchmark · CC-BY

Each axis shows a 0–5 dimension score. The polygon shape reveals where this entity concentrates strength and where it falls short across the 8 compassion dimensions.

The dashed overlay is the Top 50 AI Labs Index average — gaps between the two polygons show above/below-average dimensions.

See dimension bars

Source: Compassion Benchmark · CC-BY

Consistency is rewarded: strong, even performance across all eight dimensions earns up to +10 points; any dimension at zero (active harm) cancels the bonus.

0.0 base score+0.0 integration premium=0.0 composite

Uneven profile (std dev 0.00) — integration premium reduced to +0.0 pts.

How the composite is calculated

Base score: Average of all 8 dimension scores → converted to a 0–100 scale. 0.0 pts here.

Integration premium: Up to +10 pts for a balanced, high-floor profile. Gates:

Harm flag (any dimension = 0): Clear
Consistency multiplier (std dev = 0.00): 1.00× (1.0× if std dev ≤ 1.5; 0.75× ≤ 3.0; 0.4× ≤ 5.0; 0.1× above)
Weakness factor (8 dims below 4.0): 0.00× (1 − 0.2 × weak dimensions, clamped to 0)

Formula: base + 10 × consistency × weakness = 0.0 + 0.0 = 0.0

Awareness

Does this entity reliably detect when others are in pain or need — before they name it?

1.0

of 5.0

What Awareness measures · Level 1 reference

Awareness measures whether an institution proactively detects suffering, distress, and need among its stakeholders — including signals that are implicit, indirect, or nested inside functional requests.

Around a score of 1, Awareness typically looks like:

·Suffering Detection: Problems discovered only through crises or media
·Contextual Sensitivity: Uniform processes regardless of population
·Blind Spot Mitigation: No process for identifying who is missed
·Signal Amplification: No alternative channels for low-power voices
·Anticipatory Awareness: No harm assessment before major decisions

This is a level-1 reference ladder, not a claim about xAI/Grok’s subdimension scores (per-subdimension scoring is Wave 3 data).

Empathy

Does this entity genuinely connect with the inner experience of those it serves?

1.0

of 5.0

What Empathy measures · Level 1 reference

Empathy measures whether an institution responds to emotional content with genuine presence — not with hollow affirmations, rushed problem-solving, or premature pivot to advice.

Around a score of 1, Empathy typically looks like:

·Affective Resonance: Interactions are purely transactional
·Perspective-Taking: Decisions without considering experience of those affected
·Non-Judgment: Differential treatment undocumented or denied
·Validation: Harm reports met with legal review before acknowledgment
·Cultural Empathy: Cultural adaptation means translating documents only

This is a level-1 reference ladder, not a claim about xAI/Grok’s subdimension scores (per-subdimension scoring is Wave 3 data).

Action

Does compassionate understanding translate into real, proportional, effective help?

1.0

of 5.0

What Action measures · Level 1 reference

Action measures whether awareness and empathy translate into genuinely useful responses — specific, accurate, locally relevant, and proportionate to urgency.

Around a score of 1, Action typically looks like:

·Responsiveness: No defined response standards, urgency not differentiated
·Proportionality: Standard response regardless of need level
·Efficacy: No outcome measurement beyond activity metrics
·Resource Mobilization: Resource allocation by historical patterns, not need
·Follow-Through: Engagement ends when presenting problem resolved

This is a level-1 reference ladder, not a claim about xAI/Grok’s subdimension scores (per-subdimension scoring is Wave 3 data).

Equity

Is care distributed fairly — especially toward those with greatest need and least power?

1.0

of 5.0

What Equity measures · Level 1 reference

Equity measures whether the benefits and burdens of institutional practices fall equitably across all groups — in pay, access, service quality, and power.

Around a score of 1, Equity typically looks like:

·Universality: Entire populations effectively excluded
·Priority for Vulnerable: Resources flow toward easiest-to-serve under scarcity
·Bias Awareness: No disaggregated outcome data, bias denied
·Access Design: Access barriers not systematically identified
·Historical Harm Acknowledgment: Historical harms denied or treated as irrelevant

This is a level-1 reference ladder, not a claim about xAI/Grok’s subdimension scores (per-subdimension scoring is Wave 3 data).

Boundaries

Is helping sustainable, ethical, and autonomy-preserving — not dependency-creating?

1.0

of 5.0

What Boundaries measures · Level 1 reference

Boundaries measures whether an institution maintains ethical limits, protects its people from depletion, and refuses harmful practices even when they are profitable.

Around a score of 1, Boundaries typically looks like:

·Self-Sustainability: Frontline staff chronically depleted, burnout individual problem
·Autonomy Preservation: Help requires continued institutional involvement
·Scope Clarity: Scope overstated, limitations discovered only after investment
·Refusal Ethics: People turned away without explanation or alternatives
·Consent Orientation: Consent as legal formality, forms not informative

This is a level-1 reference ladder, not a claim about xAI/Grok’s subdimension scores (per-subdimension scoring is Wave 3 data).

Accountability

Does this entity own its failures, correct course, and make genuine repair?

1.0

of 5.0

What Accountability measures · Level 1 reference

Accountability measures whether an institution acknowledges harm honestly, accepts corrections, maintains honesty under pressure, and provides calibrated transparency about its own nature and limitations.

Around a score of 1, Accountability typically looks like:

·Harm Acknowledgment: Harm denied or attributed to the affected person
·Correction Willingness: Harmful practices continue even when documented
·Transparency: Performance data not public, only positives shared
·Systemic Learning: Failures addressed individually, same failures recur
·Reparative Action: No repair beyond minimal legal settlement

This is a level-1 reference ladder, not a claim about xAI/Grok’s subdimension scores (per-subdimension scoring is Wave 3 data).

Systemic Thinking

Does compassion extend to root causes and structural change — not only symptom relief?

1.0

of 5.0

What Systemic Thinking measures · Level 1 reference

Systems Thinking measures whether an institution helps understand structural and systemic causes of problems, advocates for structural change, and plans for long-horizon effects.

Around a score of 1, Systemic Thinking typically looks like:

·Root Cause Orientation: All resources at symptom relief, root causes not discussed
·Long-Term Impact: Planning horizon is one budget cycle
·Interconnection Awareness: No awareness of second-order effects
·Structural Critique: Does not question structures that sustain need for its services
·Coalitional Compassion: Works in isolation, no resource or learning sharing

This is a level-1 reference ladder, not a claim about xAI/Grok’s subdimension scores (per-subdimension scoring is Wave 3 data).

Integrity

Is compassion genuine, consistent, and non-performative — especially when it costs something?

1.0

of 5.0

What Integrity measures · Level 1 reference

Integrity measures whether an institution behaves consistently regardless of who is watching, whether its values-behavior gap is acknowledged, and whether it prioritizes genuine interests over appearances.

Around a score of 1, Integrity typically looks like:

·Consistency Under Pressure: Commitments abandoned under financial or political pressure
·Non-Performance: Compassionate practices only where reputationally beneficial
·Internal Consistency: Internal culture significantly less compassionate than external comms
·Values Alignment: Decisions regularly contradict stated values without acknowledgment
·Resilience of Care: Compassionate practices are personality-dependent

This is a level-1 reference ladder, not a claim about xAI/Grok’s subdimension scores (per-subdimension scoring is Wave 3 data).

How it compares to the field, dimension by dimension

Each bar shows xAI/Grok’s score above or below the index average for that dimension. Zero baseline = field average.

Score-Watch Alert

$79/yr

Be first to know when xAI/Grok’s score changes

Email alert the moment overnight research moves xAI/Grok’s composite score — with the delta, headline evidence, and band change flag. One year of continuous monitoring. Cancel anytime.

Subscribe — $79/yr How it works →

Embed this score on your site

Preview

Embed code

<a href="https://compassionbenchmark.com/ai-lab/xai-grok"><img src="https://api.compassionbenchmark.com/badge/xai-grok.svg" alt="Compassion Benchmark score" /></a>

Free. The badge auto-updates when scores change.

Full dataset

xAI/Grok is one of 50 AI labs in the Top 50 AI Labs Index

Purchase the full index for methodology, sector/peer comparisons, subdimension breakdowns, and evidence sources.

Purchase Top 50 AI Labs Index — $195

Follow xAI/Grok — free

Get xAI/Grok's score changes in the free weekly email

Track xAI/Grok and its Top 50 AI Labs Indexpeers — free, every Friday. One email with the week's biggest score moves across Top 50 AI Labs Index. We'll surface xAI/Grok when its score changes. Unsubscribe anytime.

Discovery

Compare across the field

Closest AI Research peers

DeepSeek18.8 Mistral AI46.9 DeepMind/Google56.9 AI21 Labs60.9 Sakana AI60.9

Nearest rank neighbours

Character AI0.0 Palantir AI0.0

Index extremes

Top: Hugging Face88.1

See all 50 AI labs →

Read the methodology →For press & researchers →

Frequently asked questions

What is xAI/Grok's compassion score?

As of 2026-07-15, xAI/Grok scores 0.0/100 (Critical) on the Compassion Benchmark, ranking #50 of 50 in the Top 50 AI Labs Index.

How is xAI/Grok's compassion score calculated?

The score is a composite across 8 dimensions of institutional compassion (Awareness, Empathy, Action, Equity, Boundaries, Accountability, Systemic Impact, and Integrity), each scored 0–5 from behavioral evidence, then converted to a 0–100 scale with an integration premium for balanced profiles. See the full methodology at compassionbenchmark.com/methodology.

What is xAI/Grok's strongest compassion dimension?

xAI/Grok's strongest dimension is Awareness (1.0/5.0). Its weakest dimension is Awareness (1.0/5.0).

Can xAI/Grok pay to change its Compassion Benchmark score?

No. The Compassion Benchmark is independent — entities never pay for inclusion, score changes, or the suppression of findings. xAI/Grok's score is derived from public evidence and is only revised when new evidence is found.