Anthropic has announced the launch of Claude 3.5 Sonnet, the first release in the upcoming Claude 3.5 model family.
Claude 3.5 Sonnet is a significant advancement in AI, surpassing competitor models and the earlier Claude 3 Opus in various benchmarks. It combines impressive intelligence with the speed and cost-efficiency of a mid-tier model, making it a versatile and powerful tool for a range of applications.
Accessing Claude 3.5 Sonnet
Claude 3.5 Sonnet is now available for free on Claude.ai and the Claude iOS app. Users with Claude Pro and Team plan subscriptions can access the model with significantly higher rate limits. Additionally, Claude 3.5 Sonnet is accessible via the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI.
The pricing for the model is set at $3 per million input tokens and $15 per million output tokens, featuring a 200K token context window, which allows for extensive and complex interactions.
Comprehensive Benchmark Performance
Claude 3.5 Sonnet stands out in several key benchmark areas, setting new industry standards and demonstrating its superior capabilities across multiple dimensions.
1. Graduate-Level Reasoning (GPQA): Claude 3.5 Sonnet excels in GPQA evaluations, showcasing its ability to handle complex reasoning tasks that typically require a deep understanding of various subjects at a graduate level. This makes it highly effective for applications that demand advanced problem-solving and analytical skills.
2. Undergraduate-Level Knowledge (MMLU): In the MMLU benchmark, which tests the model’s knowledge across undergraduate-level topics, Claude 3.5 Sonnet shows significant improvements. This indicates its proficiency in a wide range of academic disciplines, making it a reliable tool for educational and research purposes.
3. Coding Proficiency (HumanEval): Claude 3.5 Sonnet’s coding skills are tested through the HumanEval benchmark, where it demonstrates exceptional ability in writing, understanding, and debugging code. The model solved 64% of coding problems in internal evaluations, a substantial increase from the 38% solved by Claude 3 Opus. This highlights its effectiveness in software development and programming tasks.
4. Visual Reasoning: Claude 3.5 Sonnet surpasses its predecessors in standard vision benchmarks, particularly in tasks requiring visual reasoning. It can interpret charts, graphs, and other visual data accurately, making it invaluable for fields such as data analysis, financial forecasting, and scientific research.
5. Text Transcription from Imperfect Images: The model’s ability to transcribe text from imperfect images has also seen notable enhancements. This capability is crucial for industries like retail and logistics, where accurate data extraction from images can streamline operations and improve efficiency.
Advanced Vision Model
Claude 3.5 Sonnet is Anthropic’s most advanced vision model yet, surpassing Claude 3 Opus on standard vision benchmarks.
These improvements are especially noticeable in tasks requiring visual reasoning, such as interpreting charts and graphs. The model can accurately transcribe text from imperfect images, a critical capability for sectors like retail, logistics, and financial services where AI can extract more insights from images, graphics, or illustrations than from text alone.
Introducing Artifacts on Claude.ai
Alongside the launch of Claude 3.5 Sonnet, Anthropic is introducing a new feature called Artifacts on Claude.ai. This feature expands how users can interact with Claude, creating a dynamic workspace where they can generate content such as code snippets, text documents, or website designs.
These Artifacts appear in a dedicated window alongside the conversation, allowing users to see, edit, and build upon Claude’s creations in real-time. This seamless integration of AI-generated content into projects and workflows marks Claude’s evolution from a conversational AI to a collaborative work environment.
This preview feature is just the beginning of a broader vision for Claude.ai. In the near future, the platform will expand to support team collaboration, enabling teams — and eventually entire organizations — to securely centralize their knowledge, documents, and ongoing work in one shared space, with Claude serving as an on-demand teammate.
Commitment to Safety and Privacy
Anthropic places a strong emphasis on safety and privacy in AI development.
Claude 3.5 Sonnet has undergone rigorous testing to reduce misuse, and despite its leap in intelligence, it remains at ASL-2, the second level of Anthropic’s AI safety standards. More details can be found in the model card addendum.
To ensure safety and transparency, Anthropic has engaged with external experts to test and refine the safety mechanisms within this latest model.
Recently, Claude 3.5 Sonnet was provided to the UK’s Artificial Intelligence Safety Institute (UK AISI) for pre-deployment safety evaluation. The UK AISI completed tests of 3.5 Sonnet and shared their results with the US AI Safety Institute (US AISI) as part of a Memorandum of Understanding between the US and UK AISIs announced earlier this year.
Anthropic has integrated policy feedback from various external subject matter experts to ensure robust evaluations that consider new trends in abuse.
For example, feedback from child safety experts at Thorn was used to update classifiers and fine-tune the models. This engagement has helped Anthropic scale up its ability to evaluate Claude 3.5 Sonnet against various types of misuse.
Privacy is a core constitutional principle guiding Anthropic’s AI model development. The company does not train its generative models on user-submitted data unless explicit permission is given. To date, no customer or user-submitted data has been used to train their generative models.
Looking Ahead: Future Plans for Claude 3.5 Family
Anthropic aims to continuously improve the balance between intelligence, speed, and cost with each model release.
To complete the Claude 3.5 model family, they plan to introduce Claude 3.5 Haiku and Claude 3.5 Opus later this year. These models will build on the capabilities of Claude 3.5 Sonnet, providing even more options for businesses.
In addition to the next-generation model family, Anthropic is developing new modalities and features to support more business use cases. This includes integrations with enterprise applications and exploring features like Memory, which will enable Claude to remember user preferences and interaction history, making the AI experience even more personalized and efficient.
Anthropic is committed to continuous improvement and values user feedback. Users are encouraged to submit feedback on Claude 3.5 Sonnet directly within the product. This feedback informs the development roadmap and helps the teams enhance the user experience. Anthropic looks forward to seeing the innovative ways users will build, create, and discover with Claude 3.5 Sonnet.
With its superior intelligence, enhanced speed, advanced coding capabilities, and collaborative features, Claude 3.5 Sonnet sets a new standard for what AI can achieve. Whether it’s for customer support, workflow automation, or complex coding tasks, it offers a powerful and versatile solution for businesses.