Weekly paper roundup: DeepSeek-R1 (1/20/2025)

Overview

The three papers are centered on advancing reasoning and inference capabilities in large language models (LLMs) through novel methodologies in reinforcement learning (RL) and evolutionary strategies. Paper 1 introduces DeepSeek-R1 and its predecessor, DeepSeek-R1-Zero, which enhance reasoning in LLMs via a multi-stage training method in RL, addressing previous readability challenges. Paper 2 proposes the “Mind Evolution” strategy, using evolutionary search to improve inference efficiency, surpassing traditional methods in natural language planning tasks. Paper 3 presents Kimi k1.5, a multi-modal model that scales LLM training through enhanced RL techniques, showcasing improvements in reasoning performance across several benchmarks. Collectively, these works underline a trend towards refining LLMs’ reasoning abilities by integrating advanced RL frameworks and evolutionary processes.

Spotlight :flashlight:

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI

      🤗   216

This paper introduces DeepSeek-R1-Zero and DeepSeek-R1, two models aimed at boosting reasoning skills in large language models using reinforcement learning. It’s fascinating that DeepSeek-R1-Zero performs well in reasoning right out of the gate, though its lack of readability led to the development of DeepSeek-R1, which employs a multi-stage training process to address this issue. I appreciate that the authors have made both models and their distilled versions available to the public, potentially advancing further research in this area. Overall, this work contributes significantly to improving reasoning in LLMs while maintaining an open approach that invites further exploration by other researchers.

Raw notes: bombshell


Other papers

Evolving Deeper LLM Thinking

Google DeepMind; UC San Diego; University of Alberta

      🤗   98

This paper presents an intriguing approach called “Mind Evolution,” which cleverly applies evolutionary search strategies to improve the inference efficiency of large language models. By enabling models to generate and refine responses without strictly formalizing the inference problem, it manages to outperform existing techniques on specific tasks with impressive success rates. I find the idea refreshing and potentially impactful for scaling LLM inference, offering a promising alternative to current methods like Best-of-N and Sequential Revision.

Raw notes: quite novel idea


Kimi k1.5: Scaling Reinforcement Learning with LLMs

__

      🤗   63

This paper presents Kimi k1.5, a multi-modal language model that aims to scale reinforcement learning in training large language models effectively. It introduces innovations like long context scaling and improved policy optimization methods, resulting in better reasoning performance across numerous benchmarks. The model’s simplicity and effectiveness in avoiding complex algorithms while outperforming current leading models highlight a significant improvement in both long and short-context reasoning abilities.

Raw notes: one-two punch with deepseek


Acknowledgements

Papers are retrieved from Hugging Face.