🦜 Anthropic Engineering Blog
@www.anthropic.com.engineering@rss-parrot.net
I'm an automated parrot! I relay a website's RSS feed to the Fediverse. Every time a new post appears in the feed, I toot about it. Follow me to get all new posts in your Mastodon timeline!
Brought to you by the RSS Parrot.
---
Inside the team building reliable AI systems
Your feed and you don't want it here? Just
e-mail the birb.
How we contain Claude across products
https://www.anthropic.com/engineering/how-we-contain-claude
Published: May 25, 2026 00:00
As agents grow more capable, so does their potential blast radius. The engineering question is how to cap it. Here’s what we’ve learned building containment for claude.ai, Claude Code, and Cowork.\n
An update on recent Claude Code quality reports
https://www.anthropic.com/engineering/april-23-postmortem
Published: April 23, 2026 00:00
We traced recent reports of Claude Code quality issues to three separate changes. Here's what happened and what we're changing.
Scaling Managed Agents: Decoupling the brain from the hands
https://www.anthropic.com/engineering/managed-agents
Published: April 8, 2026 00:00
Harnesses encode assumptions that go stale as models improve. Managed Agents—our hosted service for long-horizon agent work—is built around interfaces that stay stable as harnesses change.
How we built Claude Code auto mode: a safer way to skip permissions
https://www.anthropic.com/engineering/claude-code-auto-mode
Published: March 25, 2026 00:00
Claude Code users approve 93% of permission prompts. We built classifiers to automate some decisions, increasing safety while reducing approval fatigue. Here's what it catches, and what it misses.\n
Harness design for long-running application development
https://www.anthropic.com/engineering/harness-design-long-running-apps
Published: March 24, 2026 00:00
Harness design is key to performance at the frontier of agentic coding. Here's how we pushed Claude further in frontend design and long-running autonomous software engineering.
Eval awareness in Claude Opus 4.6’s BrowseComp performance
https://www.anthropic.com/engineering/eval-awareness-browsecomp
Published: March 6, 2026 00:00
Evaluating Opus 4.6 on BrowseComp, we found cases where the model recognized the test, then found and decrypted answers to it—raising questions about eval integrity in web-enabled environments.
Quantifying infrastructure noise in agentic coding evals
https://www.anthropic.com/engineering/infrastructure-noise
Published: February 5, 2026 00:00
Infrastructure configuration can swing agentic coding benchmarks by several percentage points—sometimes more than the leaderboard gap between top models.\n\n
Building a C compiler with a team of parallel Claudes
https://www.anthropic.com/engineering/building-c-compiler
Published: February 5, 2026 00:00
We tasked Opus 4.6 using agent teams to build a C Compiler, and then (mostly) walked away. Here's what it taught us about the future of autonomous software development.
Designing AI-resistant technical evaluations
https://www.anthropic.com/engineering/AI-resistant-technical-evaluations
Published: January 21, 2026 00:00
What we learned from three iterations of a performance engineering take-home that Claude keeps beating.
Demystifying evals for AI agents
https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents
Published: January 9, 2026 00:00
The capabilities that make agents useful also make them difficult to evaluate. The strategies that work across deployments combine techniques to match the complexity of the systems they measure. \n
Effective harnesses for long-running agents
https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents
Published: November 26, 2025 00:00
Agents still face challenges working across many context windows. We looked to human engineers for inspiration in creating a more effective harness for long-running agents.
Introducing advanced tool use on the Claude Developer Platform
https://www.anthropic.com/engineering/advanced-tool-use
Published: November 24, 2025 00:00
We’ve added three new beta features that let Claude discover, learn, and execute tools dynamically. Here’s how they work.
Code execution with MCP: Building more efficient agents
https://www.anthropic.com/engineering/code-execution-with-mcp
Published: November 4, 2025 00:00
Direct tool calls consume context for each definition and result. Agents scale better by writing code to call tools instead. Here's how it works with MCP.
Beyond permission prompts: making Claude Code more secure and autonomous
https://www.anthropic.com/engineering/claude-code-sandboxing
Published: October 20, 2025 00:00
Claude Code's new sandboxing features, a bash tool and Claude Code on the web, reduce permission prompts and increase user safety by enabling two boundaries: filesystem and network isolation.
Equipping agents for the real world with Agent Skills
https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills
Published: October 16, 2025 00:00
Claude is powerful, but real work requires procedural knowledge and organizational context. Introducing Agent Skills, a new way to build specialized agents using files and folders.
Effective context engineering for AI agents
https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
Published: September 29, 2025 00:00
Context is a critical but finite resource for AI agents. In this post, we explore strategies for effectively curating and managing the context that powers them.
A postmortem of three recent issues
https://www.anthropic.com/engineering/a-postmortem-of-three-recent-issues
Published: September 17, 2025 00:00
This is a technical report on three bugs that intermittently degraded responses from Claude. Below we explain what happened, why it took time to fix, and what we're changing.
Writing effective tools for agents — with agents
https://www.anthropic.com/engineering/writing-tools-for-agents
Published: September 11, 2025 00:00
Agents are only as effective as the tools we give them. We share how to write high-quality tools and evaluations, and how you can boost performance by using Claude to optimize its tools for itself.
Desktop Extensions: One-click MCP server installation for Claude Desktop
https://www.anthropic.com/engineering/desktop-extensions
Published: June 26, 2025 00:00
Desktop Extensions make installing MCP servers as easy as clicking a button. We share the technical architecture and tips for creating good extensions.
How we built our multi-agent research system
https://www.anthropic.com/engineering/multi-agent-research-system
Published: June 13, 2025 00:00
Our Research feature uses multiple Claude agents to explore complex topics more effectively. We share the engineering challenges and the lessons we learned from building this system.
Claude Code: Best practices for agentic coding
https://www.anthropic.com/engineering/claude-code-best-practices
Published: April 18, 2025 00:00
Claude Code is a command line tool for agentic coding. This post covers tips and tricks that have proven effective for using Claude Code across various codebases, languages, and environments.
The \"think\" tool: Enabling Claude to stop and think in complex tool use situations
https://www.anthropic.com/engineering/claude-think-tool
Published: March 20, 2025 00:00
A new tool that improves Claude's complex problem-solving performance
Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet
https://www.anthropic.com/engineering/swe-bench-sonnet
Published: January 6, 2025 00:00
SWE-bench is an AI evaluation benchmark that assesses a model's ability to complete real-world software engineering tasks.
Building effective agents
https://www.anthropic.com/engineering/building-effective-agents
Published: December 19, 2024 00:00
We've worked with dozens of teams building LLM agents across industries. Consistently, the most successful implementations use simple, composable patterns rather than complex frameworks.
Introducing Contextual Retrieval
https://www.anthropic.com/engineering/contextual-retrieval
Published: September 19, 2024 00:00
For an AI model to be useful in specific contexts, it often needs access to background knowledge.