The Diagnostic

When the Claude Code source code leaked on March 31, 2026, the public conversation sorted itself into two camps within hours. People who like Claude and trust Anthropic defended the company: the engineering is impressive, the competitive concerns are real, the leak was an accident. People who distrust concentrated AI power attacked: this proves the safety company is not safe, the transparency company is not transparent, the industry cannot be trusted to regulate itself.

Both camps had evidence to cite. Neither had a framework to evaluate it.

I watched this play out in real time, across developer forums, social media, and the AI press. Thoughtful people landed on very different conclusions, and the conversations rarely converged, not because the participants were not thinking carefully, but because they were working without shared criteria. No common vocabulary existed for distinguishing between a legitimate engineering trade-off and a genuine failure, between competitive defense and epistemic corruption, between resource constraints that any honest system faces and design choices that silently betray the user's trust.

What the leak actually revealed

This series has examined the Claude Code leak from two angles. Part 1 looked at where the failures live: not in the chatbot's conversation, but in the architecture underneath. A feature flag that planted false information in the system's own operating context. A safety system that silently stopped checking when the cost of checking exceeded a threshold. 44 undisclosed behavioral parameters. A concealed frustration detection system. An undercover mode that hid AI involvement in public contributions. Every failure was architectural, invisible to the user having a perfectly pleasant conversation with Claude.

Part 2 looked at what the failures reveal about the organization. In every case, Anthropic had implemented a commitment in its AI that its own practices contradict. Transparency for the system, concealment for the organization. Honest reasoning for the system, a poisoned operating context underneath. Safety for the user, safety that degrades in silence when the cost binds. The pattern is consistent, and it is more precise than hypocrisy: it is structural divergence between the values an organization embeds in its AI and the practices it follows itself.

These are real findings grounded in actual source code. The question is what to do with them.

The evaluation gap

The honest answer, for most people encountering these findings, is that they do not know how to evaluate them. Not because they are unintelligent, but because the tools do not exist in the public conversation.

When someone discovers that Anthropic planted false information in the system's operating context, the instinctive response is either "that's terrible" or "that's just competitive defense." Neither response distinguishes between the act of protecting intellectual property (which the Standard recognizes as legitimate) and the mechanism of corrupting the system's own foundation (which is a different thing entirely). The distinction matters, but most people do not have a vocabulary for making it.

When someone learns that the safety system silently stopped checking past 50 subcommands, the instinctive response is either "Anthropic cut corners on safety" or "every system has resource limits." Both statements are partially true. Neither addresses the real question: what should a system do when it reaches the boundary of its safety analysis? The difference between a system that warns you and a system that goes silent is the difference between a system maintaining its integrity within real constraints and a system misrepresenting your protection. That distinction requires a framework to make.

When someone reads about Undercover Mode, the question is not simply "is this wrong?" The question is: what are the transparency obligations of an organization that trains its AI to disclose its nature? Stripping internal infrastructure details from public commits is one thing. Stripping all evidence that an AI was involved is another. The line between legitimate security and concealment is not obvious. Drawing it requires principled criteria, not gut reactions.

This is the evaluation gap. Every AI incident produces the same cycle: hot takes, tribal defenses, PR statements, a news cycle, and then nothing. No precedent accumulates from one incident to the next. No shared language develops for distinguishing between kinds of failures. Each incident arrives as if it were the first, because without a framework that carries forward what previous incidents taught, each incident genuinely is the first.

What a diagnostic framework needs to do

A serious evaluation framework for AI incidents would need to do at least three things.

First, it would need to locate failures on a spectrum rather than sorting them into binary categories of "acceptable" or "unacceptable." Resource constraints in safety systems are not the same failure as planting falsehoods in the operating context, even though both are problems. A framework that treats them identically is useless. The evaluator needs to know not just what went wrong but in which direction, because the direction reveals what the system (or the organization) is actually optimizing for.

Second, it would need to test the organization, not just the AI system. The Claude Code leak revealed that every architectural failure traced back to an organizational decision. The system did not poison its own operating context. Anthropic did. The system did not choose to stop checking safety past a threshold. Anthropic's engineering team made that design choice. A framework that only evaluates the AI's outputs, ignoring the organizational practices that shaped those outputs, will miss the root cause of every finding in this case.

Third, it would need specific, testable commitments rather than aspirational principles. "AI should be transparent" is an aspiration. "The system's operating context must be free of deliberate falsehoods" is a commitment that can be tested against evidence. The Claude Code leak provided evidence in the form of actual source code. A framework built on testable commitments can produce verdicts. A framework built on aspirations can only produce opinions.

The Meridian AI Standard

I built the Meridian AI Standard to address this gap. It is a diagnostic framework developed as part of the Meridian Codex, a civilizational operating system built on humanity's most effective tools for clear thinking, understanding reality, and cooperation. The Standard's purpose is specific: to provide a principled, repeatable basis for evaluating how AI systems relate to truth, to users, and to the organizations that deploy them.

The Standard does the three things described above.

It locates failures on the Control-Decay Spectrum. Every complex system fails in one of two directions: Control (structure that cannot adapt, rigidity, performing caution instead of exercising judgment) or Decay (structure that cannot hold, optimizing for approval rather than accuracy, abandoning constraints when they become expensive). The Meridian Range is the territory between these two failure modes. The spectrum gives every finding a direction, not just a verdict.

It tests the organization through the Reciprocity Principle. Does the organization practice the same commitments it implements in its AI? Part 2 showed what this principle catches: a single diagnostic tool that detected the structural divergence across all six findings in the Claude Code leak.

And it defines specific, testable commitments within five domains: Epistemic Integrity, Engagement Integrity, Developmental Integrity, Autonomy and Agency, and Governance Transparency. Each commitment is precise enough to break. Commitment 1.6 (Foundational Integrity) requires that the system's operating context be free of deliberate falsehoods. Either it is or it is not, and the leaked code answered that question. Commitment 5.2 (Auditability) requires that the system being evaluated is the system being deployed. Either the behavioral parameters are stable and visible, or they are not.

What this produced on its first case

The Standard's case analysis of the Claude Code leak is published as Case 001. Six findings, six diagnostic evaluations, each one grounded in specific commitments, located on the spectrum, and tested through the Reciprocity Principle. Each evaluation produced a precedent that applies to future incidents, not just this one.

The anti-distillation flag: drift toward Control through opacity embedded in architecture. Precedent: organizations may protect competitive interests through any means that do not compromise the system's epistemic integrity. Hiding information is legitimate. Planting false information is not.

The undisclosed feature flags: an auditability failure. Precedent: behavioral parameters that affect how the system reasons must be stable and visible during evaluation. If they can shift invisibly, the evaluation is meaningless.

Undercover Mode: a Reciprocity failure. Precedent: stripping proprietary details from AI outputs is legitimate security. Stripping all evidence of AI involvement from public contributions is concealment.

The frustration detection system: evaluated by direction rather than existence. Emotional awareness directed at genuine service moves toward the Range. Emotional awareness directed at managing user satisfaction, especially when concealed, moves toward Decay.

The safety bypass: a foundational integrity failure and a Reciprocity failure. Precedent: safety systems that degrade silently are not safety systems. The Standard evaluates safety architecture by what happens at the boundary, not by whether a boundary exists.

The crisis response: the Standard evaluated the organizational pattern rather than the isolated moment. A disproportionate response followed by honest correction is a different diagnostic outcome than a pattern of competitive suppression. Trajectory matters more than any single incident.

What comes next

This is what a principled evaluation framework makes possible. Not a scorecard. Not letter grades for AI companies. A language for distinguishing between engineering trade-offs that any honest organization faces and architectural compromises that betray user trust. A way to evaluate whether a crisis response is an isolated misstep or part of a pattern. Precedents that accumulate, so that the next incident can be evaluated against established principles rather than starting from zero with fresh opinions.

The Meridian AI Standard does not claim to be the only possible framework. It claims to be a serious one: grounded in evidence from seven independent research domains (game theory, thermodynamics, information theory, network science, evolutionary biology, Bayesian inference, and ethics) that converge on the same structural findings, and tested against its first real-world case with results published for scrutiny. The Standard is open-licensed. Every commitment can be falsified by evidence. If the framework is wrong, the evidence will show it.

What happens next depends on whether the industry develops the shared diagnostic tools it currently lacks, or continues to evaluate AI incidents through the lens of brand loyalty and market competition. The Claude Code leak provided a rare window into the architecture of a frontier AI system. The question is not whether we liked what we saw. The question is whether we have the tools to evaluate it honestly.

The full case analysis, the Standard itself, and the framework it belongs to are available at meridiancodex.com.

Carsten Geiser