Claude 3.5 Sonnet: Master Your Computer with AI

Anthropic just rolled out a big update for its Claude AI models, introducing a standout feature called “Computer Use.” This lets developers guide the updated Claude 3.5 Sonnet to navigate desktop applications, move cursors, click buttons, and type text—basically, act like a person at a computer.

In their blog, Anthropic explained that they’re focused on teaching Claude general computer skills rather than creating individual tools for specific tasks. This way, Claude can use a variety of standard software programs meant for human users.

The Computer Use API translates text prompts into computer commands. For instance, developers can input requests like, “use data from my computer and the internet to fill out this form” or “move the cursor to open a web browser.” This marks the first time an AI model from Anthropic can browse the web.

The new feature analyzes screenshots to determine cursor movements needed to perform tasks. It can handle hundreds of steps in a sequence, self-correcting if it hits a snag. Right now, it’s in public beta and is designed to help automate repetitive tasks, test software, and take on open-ended challenges. Replit, for example, is testing it as a way to navigate user interfaces while building their Replit Agent product.

Even though Claude’s Computer Use is a leap forward, it’s not without flaws. Anthropic acknowledges the feature struggles with scrolling, dragging, and zooming. In a test for booking flights, it succeeded just 46% of the time—though that’s up from 36% in the last version. Since Claude processes screenshots instead of a continuous video, it sometimes misses quick actions or notifications. During a coding demo, it unexpectedly decided to browse photos of Yellowstone National Park.

Claude 3.5 Sonnet: Performance and Limitations

On OSWorld, a platform to measure AI performance, Claude scored 14.9% on screenshot tasks, still way below human skill levels, which are estimated between 70% and 75%. But it’s almost double the score of the next best AI system. Anthropic hopes to enhance this with input from developers.

The Computer Use feature comes with focused safety measures. It’s built to avoid risks related to user privacy—it doesn’t train on user data or access the internet during training. One vulnerability they identified is prompt injection attacks, where malicious commands could make the AI behave unpredictably.

Research indicates that such jailbreak attacks could result in harmful behavior from models that lack Computer Use capabilities. Studies show these attacks are successful in about 20% of cases. To combat this for Claude Sonnet 3.5, Anthropic’s Trust and Safety teams created systems to detect and prevent potential prompt injections, especially since Claude interprets screenshots that might hold harmful content.

Developers also prepared for potential misuse of Claude’s skills. They implemented classifiers to identify and monitor harmful activities, such as spam or misinformation. For safety, Claude can’t post on social media or engage with government websites to limit political risks. Testing included collaboration with U.S. and U.K. safety institutes, and Claude 3.5 Sonnet is rated at AI Safety Level 2, indicating it poses no significant risks that require stricter safety protocols.

Besides Computer Use, Claude 3.5 Sonnet shows improvements in coding capabilities compared to its predecessor, maintaining the same speed and cost. Its performance on a coding benchmark called SWE-bench Verified jumped from 33.4% to 49%, outperforming other reasoning models like OpenAI’s latest.

Businesses are increasingly looking to Generative AI for coding, though the technology isn’t flawless. AI-generated code can lead to system outages, prompting security concerns over its use in software development.

Users have noticed the upgrades, including GitLab, which found that Claude improved reasoning for DevSecOps tasks by up to 10%, with no added latency. Cognition also reported better coding, planning, and problem-solving with this version compared to the previous one.

Claude 3.5 Sonnet is available now through the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. There’s a version rolling out without Computer Use for Claude apps.

Anthropic also introduced Claude 3.5 Haiku, a more affordable upgrade to their least expensive model. Haiku offers faster responses and improved instruction accuracy while being useful for user-facing applications. It matches the performance of the larger Claude 3 Opus model but does so at the same cost and similar speed as the previous generation.

Haiku saw success on SWE-bench Verified with a score of 40.6%, surpassing the earlier Claude 3.5 Sonnet and GPT-4o. This model will be available next month as a text-prompt-only version, with image inputs planned for later.

The introduction of Computer Use enhances Claude 3.5 Sonnet’s move toward AI agents—tools capable of handling complex tasks independently. According to Yiannis Antoniou from Lab49, the choice to call it “computer use” makes it more relatable for everyday users, differentiating it from traditional AI copilots that assist rather than operate independently.

Now, major companies like Microsoft, Workday, and Salesforce are integrating agents into their AI strategies. For instance, Salesforce recently launched Agentforce to empower generative AI in areas like customer support and sales.

IBM’s Armand Ruiz pointed out at a recent festival that we’re on the brink of entering an “agentic era,” where tailored AI agents will work alongside people to improve efficiency within organizations.

He emphasized that while progress is being made, achieving reliable, scalable AI solutions remains a journey. The vision could be so advanced that future AI might even create itself. Recently, Meta announced a “Self-Taught Evaluator” AI model capable of autonomously assessing its own performance and that of other systems, showcasing the potential for AI to learn from its experiences.

To order our software development and system administration services, please visit our contact page.