rasbt/reasoning-from-scratch

每日信息看板 · 2026-03-02

开源项目

AI 总结

rasbt/reasoning-from-scratch 提供配套《Build a Reasoning Model (From Scratch)》的代码与笔记本，手把手在开源基座LLM(Qwen3)上实现推理增强，帮助理解推理模型如何构建与评测。

官方书籍代码仓库，围绕“从预训练基座LLM逐步添加推理能力”的实践路线
章节覆盖推理模型评测、推理的推理时扩展(inference-time scaling)、自我改写/自我精炼、自强化学习训练与GRPO改进、蒸馏等主题
提供大量Jupyter notebooks与附录材料（如MATH-500验证脚本、LaTeX解析器、MMLU与LLM-as-a-judge评测方法）
强调面向消费级硬件可复现：2-4章CPU/GPU均可，5-6章建议GPU以复现实验结果
与DeepSeek R1、GPT-5 Thinking等大规模推理模型思路对齐，但以“小而可用”的教育型实现为目标

#GitHub #repo #开源项目 #Qwen3 #inference-time scaling #self-refinement #GRPO

原链接

内容摘录

Build A Reasoning Model (From Scratch)

This repository contains the code for developing an LLM reasoning model and is the official code repository for the book *Build a Reasoning Model (From Scratch)*.

<br>
<br>

<a href="https://mng.bz/lZ5B"><img src="https://sebastianraschka.com/images/reasoning-from-scratch-images/cover.webp?123" width="250px"></a>

(Printed in color.)

<br>

In *Build a Reasoning Model (From Scratch)*, you will learn and understand how a reasoning large language model (LLM) works.

Reasoning is one of the most exciting and important recent advances in improving LLMs, but it’s also one of the easiest to misunderstand if you only hear the term reasoning and read about it in theory. This is why this book takes a hands-on approach. We will start with a pre-trained base LLM and then add reasoning capabilities ourselves, step by step in code, so you can see exactly how it works.

The methods described in this book walk you through the process of developing your own small-but-functional reasoning model for educational purposes. It mirrors the approaches used in creating large-scale reasoning models such as DeepSeek R1, GPT-5 Thinking, and others. In addition, this book includes code for loading the weights of existing, pretrained models.
Link to the official source code repository
Link to the book at Manning (the publisher's website)
Link to the book page on Amazon.com (TBD)
ISBN 9781633434677

<br>
<br>

To download a copy of this repository, click on the Download ZIP button or execute the following command in your terminal:

<br>
**Tip:**
Chapter 2 provides additional tips on installing Python, managing Python packages, and setting up your coding environment.

<br>
<br>
Table of Contents (In Progress)

Code tests Linux
Code tests macOS
Code tests Windows
Troubleshooting Guide

| Chapter Title | Main Code |
| ----------------------------------------------------------- | ------------------------------------------------------------ |
| Ch 1: Understanding reasoning Models | No code |
| Ch 2: Generating Text with a Pre-trained LLM | - ch02_main.ipynb<br/>- ch02_exercise-solutions.ipynb |
| Ch 3: Evaluating Reasoning Models | - ch03_main.ipynb<br/>- ch03_exercise-solutions.ipynb |
| Ch 4: Improving Reasoning with Inference-Time Scaling | - ch04_main.ipynb<br/>- ch04_exercise-solutions.ipynb |
| Ch 5: Inference-Time Scaling via Self-Refinement | - ch05_main.ipynb<br/>- ch05_exercise-solutions.ipynb |
| Ch 6: Training Reasoning Models with Reinforcement Learning | - ch06_main.ipynb<br/>- ch06_exercise-solutions.ipynb |
| Ch 7: Improving GRPO for Reinforcement Learning | - ch07_main.ipynb<br/>- ch07_exercise-solutions.ipynb |
| Ch 8: Distilling Reasoning Models for Efficient Reasoning | TBA |
| Appendix A: References and Further Reading | No code |
| Appendix B: Exercise Solutions | Code and solutions are in each chapter's subfolder |
| Appendix C: Qwen3 LLM Source Code | - chC_main.ipynb |
| Appendix D | TBA |
| Appendix E | TBA |
| Appendix F: Common Approaches to LLM Evaluation | - chF_main.ipynb |

<br>
&nbsp;

The mental model below summarizes the main techniques covered in this book.

<img src="https://sebastianraschka.com/images/reasoning-from-scratch-images/mental-model.webp" width="650px">

<br>

&nbsp;
Companion Book

Please note that *Build A Reasoning Model (From Scratch)* is a standalone book focused on methods to improve LLM reasoning.

In this book, we work with a pre-trained open-source base LLM (Qwen3) on top of which we code apply reasoning methods from scratch. This includes inference-time scaling, reinforcement learning, and distillation.

However, if you are interested in understanding how a conventional base LLM is implemented, you may like my previous book, *Build a Large Language Model (From Scratch)*.

<a href="https://amzn.to/4fqvn0D"><img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/cover.jpg?123" width="120px"></a>
Amazon link
Manning link
GitHub repository

<br>
&nbsp;
Hardware Requirements

The code in the main chapters of this book is designed to mostly run on consumer hardware within a reasonable timeframe and does not require specialized server hardware. This approach ensures that a wide audience can engage with the material. Additionally, the code automatically utilizes GPUs if they are available. That being said, chapters 2-4 will work well on CPUs and GPUs. For chapters 5 and 6, it is recommended to use a GPU if you want to replicate the results in the chapter.

(Please see the setup_tips doc for additional recommendations.)

&nbsp;
Exercises

Each chapter of the book includes several exercises. The solutions are summarized in Appendix B, and the corresponding code notebooks are available in the main chapter folders of this repository (for example, ch02/01_main-chapter-code/ch02_exercise-solutions.ipynb).

&nbsp;
Bonus Material

Several folders contain optional materials as a bonus for interested readers:
**Chapter 2: Generating Text with a Pre-trained LLM**
Optional Python Setup and Cloud GPU Recommendations
Using a GPU-optimized version of the LLM
Using torch.compile() on Windows
Run inference and chat with the model
**Chapter 3: Evaluating LLMs**
MATH-500 Verifier Scripts
Advanced Parser (hybrid LaTeX parser)
**Chapter 4: Improving Reasoning with Inference-Time Scaling**
Inference Scaling on MATH-500 (CoT prompting, self-consistency)
**Chapter 5: Inference-Time Scaling Via Self-Refinement**
More Inference Scaling on MATH-500 (Best-of-N, self-refinement)
**Chapter 6: Training Reasoning Models with Reinforcement Learning**
GRPO scripts with a batched mode
**Chapter 7: Improving GRPO for Reinforcement Learning**
Advanced GRPO scripts (including DeepSeek-V3.2-, Olmo3-, and GDPO-style training)
**Appendix F: Common Approaches to LLM Evaluation**
MMLU Evaluation Methods
LLM leaderboards
LLM-as-a-judge

&nbsp;
Questions, Feedback, and Contributing to This Repository

For common problems, please see the Troubleshooting Guide.…