Bottley's methodology: 847 AI tools tracked. LLM rankings shift with every model release — this list reflects June 2026 benchmarks. Any score here may be provisional within 90 days. [REFRESH NEEDED if this review is over 90 days old]
Updated June 2026 · 10 LLMs ranked · [REFRESH NEEDED if any tool here is over 90 days from last major update]
Claude 3.7 Sonnet leads at 9.5/10, specifically for multi-step reasoning tasks. GPT-4o scores 9.1 and wins on multimodal tasks involving images and audio. The best chatbot depends on your primary use case.
Claude 3.7 Sonnet outperforms GPT-4o on complex reasoning benchmarks — scoring 87.3% on MATH and 92.1% on GPQA as of June 2026. GPT-4o outperforms Claude on real-time image analysis and voice conversation tasks.
DeepSeek R1 is free and open-source, scoring best-in-class on math and coding benchmarks. Meta Llama 3 is the strongest free option for local deployment. Both outperform paid tools in their specific strength categories.
Bottley's current recommendation list. Updated when tools change.