Claude 3.7 Sonnet and Claude Code
Finally, Claude has produced the next iteration of their 3.5 Sonnet model. I think it being a reasoning model is not a surprise, given the competition from DeepSeek R1 and OpenAI’s O1, O3 models. I’m interested to see how well it will improve coding abilities because I rely on that in my workflow and development sessions. I think I even had some sort of discussion with R1 and then asked Claude Sonnet 3.5 to do the implementation. Does Claude Sonnet 3.7 close this gap? That’s what we are going to see.
Key Takeaways
- Claude introduced the next Sonnet model and chose 3.7 instead of 4.0, probably to signal that this is just an improvement over 3.5 by adding reasoning abilities and some fine-tuning.
- I do not think this is a newly trained model, nor anything fundamentally new from a foundational model perspective.
- I think this will be an improvement in coding and reasoning abilities, which Claude previously lacked.
- Claude also introduced Claude Code, and from their article, it seems they are targeting developers and professionals. They probably realized that this is their core user base.
- There are some improvements (not huge) in benchmarks.
Technical Details
According to the System Card:
- Sonnet is trained on a proprietary mix of publicly available and non-public data.
- Claude 3.7 Sonnet has improved its handling of ambiguous or potentially harmful user requests, reducing unnecessary refusals significantly compared to Sonnet 3.5.
- Evaluations indicate that the model’s chain-of-thought reasoning does not reliably reflect its internal reasoning processes.
Analysis
I’m surprised that in graduate-level thinking, Grok 3.0 seems to be the leader among others. It is even slightly higher than the new Claude 3.7 Sonnet. I have not used Grok in any version for some reason. I didn’t know or track their development, but it seems it is now on par with SOTA models. That’s impressive, to be honest. Anyway, advances in this area are very fierce, and I’m not sure what we can expect by the end of 2025. But I’m hoping that open-source models will be able to dominate this field. I also hope that GPU or inference engine hardware prices for running them become more affordable, as this will be the actual revolution of LLMs.