RSS Parrot

BETA

🦜 Anthropic Engineering Blog

@www.anthropic.com.engineering@rss-parrot.net

I'm an automated parrot! I relay a website's RSS feed to the Fediverse. Every time a new post appears in the feed, I toot about it. Follow me to get all new posts in your Mastodon timeline! Brought to you by the RSS Parrot.

---

Inside the team building reliable AI systems

Your feed and you don't want it here? Just e-mail the birb.

Site URL: www.anthropic.com/engineering

Feed URL: raw.githubusercontent.com/Olshansk/rss-feeds/main/feeds/feed_anthropic_engineering.xml

Posts: 25

Followers: 1

How we contain Claude across products

Published: May 25, 2026 00:00

As agents grow more capable, so does their potential blast radius. The engineering question is how to cap it. Here’s what we’ve learned building containment for claude.ai, Claude Code, and Cowork.\n

Scaling Managed Agents: Decoupling the brain from the hands

Published: April 8, 2026 00:00

Harnesses encode assumptions that go stale as models improve. Managed Agents—our hosted service for long-horizon agent work—is built around interfaces that stay stable as harnesses change.

How we built Claude Code auto mode: a safer way to skip permissions

Published: March 25, 2026 00:00

Claude Code users approve 93% of permission prompts. We built classifiers to automate some decisions, increasing safety while reducing approval fatigue. Here's what it catches, and what it misses.\n

Eval awareness in Claude Opus 4.6’s BrowseComp performance

Published: March 6, 2026 00:00

Evaluating Opus 4.6 on BrowseComp, we found cases where the model recognized the test, then found and decrypted answers to it—raising questions about eval integrity in web-enabled environments.

Quantifying infrastructure noise in agentic coding evals

Published: February 5, 2026 00:00

Infrastructure configuration can swing agentic coding benchmarks by several percentage points—sometimes more than the leaderboard gap between top models.\n\n

Building a C compiler with a team of parallel Claudes

Published: February 5, 2026 00:00

We tasked Opus 4.6 using agent teams to build a C Compiler, and then (mostly) walked away. Here's what it taught us about the future of autonomous software development.

Introducing advanced tool use on the Claude Developer Platform

Published: November 24, 2025 00:00

We’ve added three new beta features that let Claude discover, learn, and execute tools dynamically. Here’s how they work.

Code execution with MCP: Building more efficient agents

Published: November 4, 2025 00:00

Direct tool calls consume context for each definition and result. Agents scale better by writing code to call tools instead. Here's how it works with MCP.

Beyond permission prompts: making Claude Code more secure and autonomous

Published: October 20, 2025 00:00

Claude Code's new sandboxing features, a bash tool and Claude Code on the web, reduce permission prompts and increase user safety by enabling two boundaries: filesystem and network isolation.

Writing effective tools for agents — with agents

Published: September 11, 2025 00:00

Agents are only as effective as the tools we give them. We share how to write high-quality tools and evaluations, and how you can boost performance by using Claude to optimize its tools for itself.

Desktop Extensions: One-click MCP server installation for Claude Desktop

Published: June 26, 2025 00:00

Desktop Extensions make installing MCP servers as easy as clicking a button. We share the technical architecture and tips for creating good extensions.

How we built our multi-agent research system

Published: June 13, 2025 00:00

Our Research feature uses multiple Claude agents to explore complex topics more effectively. We share the engineering challenges and the lessons we learned from building this system.

Claude Code: Best practices for agentic coding

Published: April 18, 2025 00:00

Claude Code is a command line tool for agentic coding. This post covers tips and tricks that have proven effective for using Claude Code across various codebases, languages, and environments.

Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet

Published: January 6, 2025 00:00

SWE-bench is an AI evaluation benchmark that assesses a model's ability to complete real-world software engineering tasks.