Anthropic has announced a major upgrade to Claude 3.5 Sonnet, introducing significant performance improvements and a revolutionary new computer use capability that allows the AI to interact directly with computer interfaces.
Enhanced Claude 3.5 Sonnet Performance
The updated Claude 3.5 Sonnet delivers substantial improvements across key metrics:
- Coding tasks: 33% improvement on the SWE-bench Verified benchmark, scoring 49.0%
- Tool use: 65.0% accuracy on TAU-bench, an agentic coding benchmark
- Instruction following: Enhanced ability to follow complex, multi-step directions
These improvements make the new Sonnet particularly powerful for software development tasks, from writing initial code to debugging and optimization.
Revolutionary Computer Use Feature
The most groundbreaking announcement is the introduction of “computer use” - a capability that allows Claude to interact with computer interfaces just like a human user. In public beta, this feature enables Claude to:
- View screenshots and understand what’s on screen
- Move the mouse cursor to specific coordinates
- Click buttons and interact with UI elements
- Type text into applications
- Navigate between applications and windows
This opens up possibilities for automating complex workflows that previously required human intervention or custom integrations.
New Claude 3.5 Haiku
Alongside the Sonnet upgrade, Anthropic also announced Claude 3.5 Haiku, the fastest model in the Claude 3.5 family. Despite being optimized for speed and cost-efficiency, Haiku matches or exceeds Claude 3 Opus on many benchmarks while maintaining significantly faster response times.
Real-World Applications
Early testing partners have demonstrated various applications:
- Asana is using computer use for building integrations
- Canva has implemented it for complex design workflows
- Replit is exploring automated coding assistance
- The Browser Company is testing agentic web browsing
Developer Access
The updated Claude 3.5 Sonnet is available immediately through the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. Computer use is available in public beta with the understanding that it’s still an early-stage capability.
Pricing Remains Unchanged
Despite the significant improvements, Anthropic has maintained the same pricing structure: $3 per million input tokens and $15 per million output tokens for Claude 3.5 Sonnet.
Safety Considerations
Anthropic has implemented several safety measures for computer use, including training Claude to avoid potentially risky actions and recommending that developers implement human-in-the-loop confirmations for sensitive operations.
Looking Ahead
This update positions Anthropic competitively against OpenAI’s recent o1 release, offering a different approach to advanced AI capabilities. While o1 focuses on reasoning, Claude 3.5 Sonnet emphasizes practical application through direct computer interaction.
The combination of improved performance and computer use capabilities represents a significant step toward truly autonomous AI agents capable of handling complex, multi-step tasks across various applications.