ME Blog

Links

A collection of interesting links and resources I’ve found around the web, with added context and commentary.

Introducing Perplexity Deep Research

By perplexity

February 16, 2025

So perplexity is introducing deep search to compete on this with OpenAI. That’s is perplexity trying to adjust itself to be focus on search and that was always their unique selling point over other LLMs. So now they feel like they will need to adjust for the competition. That would be interesting, they are marketing aggresievly recently. They even have some agreements with other services and even ISPs. Lets see how this goes. I will try it and see how it compares to Gemini. Maybe post a comparison sometime.

Read more →

PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

By C. Anderson et al.

February 9, 2025

I came across this fascinating paper that introduces new benchmarks derived from NPR Sunday puzzle challenges. The authors make a compelling argument that PhD-level benchmarks are often too specialized for non-experts to grasp. Instead, they’ve created about 600 puzzles that are both challenging and easy to verify, testing these across different reasoning models.

Key Takeaways

  • OpenAI’s o1 model significantly outperformed others, achieving 59% accuracy
  • DeepSeek R1 and Gemini Thinking showed notable reasoning failures and uncertainties
  • The study identifies common failure modes, including models giving up on problems or producing incorrect answers without justification
  • The study highlights how reasoning length impacts accuracy in some subjective ways

Notable Quotes

“We focus on evaluating the latest generation of models that use test-time compute to reason before producing a final answer”

Read more →