When AI systems suddenly get worse: what companies should really learn from the Claude Code debate

Article by

Dr. Anoj Winston Gladius

Many companies still treat LLMs like stable software building blocks. That is exactly what makes this dangerous. The current debate around Claude Code shows that even highly capable models can change noticeably over time – and quietly, without any announcement. The crucial question is therefore not which model “wins", but how companies should set up their AI systems so that quality, stability and controllability are preserved even when models change.

The recent discussion about Claude Code is a wake-up call for many technology teams – and for some simply a confirmation of what they have been feeling for weeks.

The trigger was a publicly documented analysis by Stella Laurenzo, Senior Director of AI at AMD. Her GitHub analysis is not anecdotal frustration, but hard measurement data: 6,852 sessions, 234,760 tool calls and 17,871 thinking blocks – analysed across four IREE/AMD compiler projects from the end of January to the beginning of April 2026. [¹] The verdict: “Claude cannot be trusted to perform complex engineering tasks." [¹] Her entire senior engineering team had independently made the same observations. They have since switched providers. [²]

The data is clear: the number of code reads before an edit fell from an average of 6.6 to 2.0. The share of blind edits – that is, changes to files Claude had not read beforehand – rose from 6.2% to 33.7%. Stop-hook violations, an indicator of premature termination and “ownership dodging", increased from zero to around ten per day. [¹] In short: every third code change happened without the affected file being read first.

What actually happened – and what Anthropic has confirmed

This is where the classification matters, because there was not one, but three simultaneous changes, which together created a problematic combination:

1. Adaptive Thinking (from 9 February 2026): Anthropic documents the use of Adaptive Thinking for Claude Opus 4.6 and Sonnet 4.6. The model dynamically decides when and how intensively it “thinks" – rather than being assigned a fixed token budget as before. [³] The effort can be influenced via the effort parameter, which acts as a soft control signal: with higher effort, the model thinks more deeply; with lower effort, simpler requests are answered faster – and more superficially. [⁴] Anthropic officially recommends Adaptive Thinking as the preferred method for Opus 4.6. [³]

2. Quiet reduction of the default effort from “high" to “medium" (from 3 March 2026): This is the decisive change that is often overlooked in public discussion. Without a broad announcement, the default effort in Claude Code was reduced from high to medium. [⁵] So anyone who was still expecting the previous reasoning depth was suddenly working with a system that, by default, thinks less deeply. The effect: fewer tool calls, fewer code reads, more superficial solutions – exactly what Laurenzo’s data shows. [¹]

3. Thinking redaction (fully from 12 March 2026): Anthropic introduced the header redact-thinking-2026-02-12, which hides thinking content from the user interface and stored logs. Boris Cherny, the creator of Claude Code, described this as “purely a cosmetic UI change" that did not affect the actual reasoning. [⁶] Laurenzo anticipated this objection and developed a workaround to measure reasoning depth independently of the display – and showed that the depth had in fact fallen, not just its visibility. [¹]

Anthropic has now partially confirmed the core cause: Adaptive Thinking, combined with a lowered default effort, has noticeably reduced reasoning depth for power users. [⁷] This is not malicious intent – it is an optimisation for the average user at the expense of complex engineering workflows.

If you want the full reasoning depth back, you can counteract it in the terminal with /effort high or /effort max. [⁵] If you want to see the thinking summaries, enable showThinkingSummaries: true in settings.json. [⁵] These are real levers – but they only exist if you know you need them.

The real lesson is bigger than Claude Code

LLM-based systems are not static products. They depend on models, mode settings, inference paths, tooling, prompt architecture – and vendor decisions made without warning. Three simultaneous default changes, no proactive communication, and a $200 billion chip manufacturer discovers the problem through telemetry analysis of almost a quarter of a million tool calls. [¹] The traditional mindset of “built once = permanently stable" is structurally wrong in the LLM world.

That is precisely why we consider a multi-model strategy to be central. Companies should not allow themselves to be locked into a single model logic. Different use cases require different strengths: some benefit from deeper reasoning, others from speed, cost control or stable tool usage. A modern AI setup should be built so that models remain interchangeable, comparisons are possible, and a fallback to alternative models is realistic at any time.

From our perspective, this should include at least five things: first, clean observability over outputs, error patterns and quality drift – Laurenzo’s analysis was only possible because her team collected logs in a structured way; [¹] second, regression tests on real, business-relevant use cases; third, controlled rollouts instead of unnoticed default changes; fourth, an architecture in which multiple models can be run and evaluated in parallel; and fifth, the willingness to manage AI products as living systems – not as features implemented once and then left alone.

And that is exactly why what is needed is not more AI tools – but enterprise AI management and orchestration platforms. Only through such platforms does it become manageable to develop and operate dozens, or even hundreds, of AI applications, thousands of AI assistants and countless AI agents in a multi-LLM environment. The neuland.ai HUB is built exactly for this: investment protection, security, governance, compliance, roles and rights management, reliability and legal conformity – not as add-ons after the fact, but by design. Only in this way can what the AMD analysis painfully exposed be structurally prevented: a silent default change by the vendor propagating unnoticed through a company’s entire AI landscape.

My personal assessment

I consider the current debate important because it exposes a misunderstanding that many companies still have. The problem is not that Anthropic “failed" or that another model “won". The real problem is the expectation that model behaviour, reasoning depth and tool quality will remain permanently constant – and that a provider will communicate breaking changes before an AMD team has to analyse 234,760 tool calls to discover them. [¹]

AWS does not quietly reduce the performance of EC2 instances. Google Cloud does not throttle database throughput without notice. AI model providers do exactly that – and “We are continuously improving our models" has so far been considered sufficient justification. [⁸]

The correct response, therefore, is not: “From tomorrow we will only use model X." The correct response is: we need to build our AI landscape so that model changes, quality drift and silent default changes become manageable. Only then will genuine technological resilience emerge.

Sources

[¹] Stella Laurenzo (AMD Senior Director, AI Group): GitHub Issue #42796 – “[MODEL] Claude Code is unusable for complex engineering tasks with the Feb updates", anthropics/claude-code, 2 April 2026. https://github.com/anthropics/claude-code/issues/42796

[²] The Register: “Claude Code has become dumber, lazier: AMD director", 6 April 2026. https://www.theregister.com/2026/04/06/anthropic_claude_code_dumber_lazier_amd_ai_director/

[³] Anthropic API documentation: Adaptive Thinking. https://platform.claude.com/docs/en/build-with-claude/adaptive-thinking

[⁴] Anthropic API documentation: Effort Parameter. https://platform.claude.com/docs/en/build-with-claude/effort

[⁵] Pasquale Pillitteri: “Claude Code Getting Worse? Two Settings 90% of Users Don't Know About", 12 April 2026. https://pasqualepillitteri.it/en/news/805/claude-code-effort-adaptive-thinking-guida (incl. confirmation by Boris Cherny/Anthropic)

[⁶] VentureBeat: “Is Anthropic 'nerfing' Claude? Users increasingly report performance degradation", 14 April 2026. https://venturebeat.com/technology/is-anthropic-nerfing-claude-users-increasingly-report-performance

[⁷] North Denver Tribune / Fact-Check: “What's up, Claude?", April 2026. https://northdenvertribune.com/technology/whats-up-claude/

[⁸] Eva Daily: “AMD's AI Director Says Claude 'Cannot Be Trusted' After Silent Performance Downgrade", April 2026. https://www.evadaily.com/article/amd-ai-director-claude-performance-downgrade-trust

Platform

Services

Resources

Prices

Company

Select Language

Contact Sales

When AI systems suddenly get worse: what companies should really learn from the Claude Code debate

Safety

Case Studies

Safety

Case Studies