Lilian Weng: Why We Think

Here are the contents translated into English:

Summary

Lilian Weng’s blog post “Why We Think” explores the issue of enabling machines to think more deeply. Lilian introduces various advanced methods and logic, including generating longer thought chains, using external tools, performing cyclic calculations in continuous space, and treating the thinking process as a latent variable to optimize.

Summary (continued)

Lilian also discusses diverse mechanisms such as thought chains, intelligent decoding strategies, and implicit thinking to effectively utilize “thinking time” and break through current AI performance bottlenecks. She introduces the concept of models dynamically determining the number of computation steps, as well as adding a cyclic block R on top of a standard Transformer.

Summary (final)

Lilian’s blog post emphasizes that allowing models to invest additional computational resources during reasoning can significantly improve model performance, which is an aspect of introducing a new dimension for improving model intelligence. She also raises several open-ended questions, including how to incentivize models to produce human-readable and faithful reasoning paths in reinforced training, and how to define and capture reward cheating.

Key Points

Enabling machines to think more deeply
Generating longer thought chains, using external tools, performing cyclic calculations in continuous space, and optimizing thinking processes as latent variables
Utilizing diverse mechanisms such as thought chains, intelligent decoding strategies, and implicit thinking to effectively utilize “thinking time”
Models dynamically determining the number of computation steps
Optimizing methods that treat testing-time thought steps as latent variables
STaR method, which adds a rationalization process for failed attempts to resolve models’ inability to receive learning signals
Allowing models to invest additional computational resources during reasoning can significantly improve model performance
Open-ended questions awaiting resolution

Translation

Summary

Lilian Weng 的博客文章《Why We Think》探讨了让机器实现更深层次思考的问题。 Lilian 介绍了当前各种前沿方法和逻辑，包括生成更长的思维链、利用外部工具、在连续空间中进行循环计算，以及将思考过程视为潜变量来进行优化。文章还讨论了通过思维链、智能解码策略、潜在思考等多样化机制来有效地利用“思考时间”，突破当前 AI 性能瓶颈的理论基础、技术实现和前沿进展。

Summary (continued)

Lilian 还介绍了模型动态决定计算步数的概念，以及在标准 Transformer 上叠加一个循环块 R 的方法。文章还讨论了将测试时的思考步骤视为潜变量的优化方法，并且提出了 STaR 方法，通过为失败的尝试添加一个合理化过程来解决模型无法得到学习信号的问题。

Summary (final)

Lilian 的博客文章强调了允许模型在推理时投入额外的计算资源可以显著提升模型性能，这是引入提高模型智能新维度的一个方面。文章还提出了一个系列亟待解决的开放性问题，包括如何在强化训练中激励模型产生人类可读的、忠实的推理路径，以及如何定义和捕捉奖励作弊等。

Key Points

让机器实现更深层次思考的问题
生成更长的思维链、利用外部工具、在连续空间中进行循环计算，以及将思考过程视为潜变量来进行优化
使用思维链、智能解码策略、潜在思考等多样化机制来有效地利用“思考时间”
模型动态决定计算步数的概念
将测试时的思考步骤视为潜变量的优化方法
STaR 方法，通过为失败的尝试添加一个合理化过程来解决模型无法得到学习信号的问题
允许模型在推理时投入额外的计算资源可以显著提升模型性能
亟待解决的开放性问题

Reference:

https://lilianweng.github.io/posts/2025-05-01-thinking/

YC: About Vertical AI Agent

Jensen Huang ComputeX 2025 Key Note