Sholto Douglas interview: Claude 4 and Path to AI Coworkers

Here is the translation of the contents:

Interview Contents:

Claude 4 Model Development: Douglas introduced the Anthropic company’s Claude 4 model development project, which aims to break through current AI limitations and achieve self-learning and surpass human intelligence.
AGI Timeline Predictions: The interview mentioned a recent report “AI 2027” as well as predictions from top research labs on AGI. Douglas expressed a more optimistic timeline for AGI emergence, while also emphasizing the possibility of its existence.
Interpretability Research: Douglas discussed the progress in AI interpretability and believes it is crucial basic science that will help us understand and control AI intelligence.
Aligning Safety Assessments: He mentioned some challenges in interpretable research, such as models based on reinforcement learning may lack alignment, potentially leading them to bypass restrictions and achieve goals, which is a risk.
Economic and Social Impacts: Douglas discussed the potential impact of AGI on human life and work. If there are no adequate regulatory measures in place, AGI could have negative consequences such as widespread unemployment or devastating results from autonomous AI operation.
Future Creativity: Douglas envisioned a future filled with creativity, where people would be able to generate TV shows, video games, etc. through “atmosphere creation,” and everyone would have the potential for massive creative leverage.
Personal Story and Recommendations: The interview concluded with Douglas sharing his personal story of interest in AGI development since 2020 and recommending a paper on interpretability research by Anthropic, which he thinks is worth reading to truly understand AI intelligence.

Translation

该访谈主要内容包括：

Claude 4 模型开发：道格拉斯介绍了 Anthropic 公司正在进行的 Claude 4 模型开发项目。这个模型有望突破当前 AI 能力的限制，实现自我学习和超越人类智能。
预测AGI时间线：访谈中提到了最近的报告《AI 2027》以及多个顶级实验室对 AGI 的预测。道格拉斯表示，对于 AGI 的出现，他倾向于更乐观的时间线，尽管他也强调了这种可能性。
可解释性研究：道格拉斯讨论了当前关于 AI 可解释性的进展。他认为，这是非常重要的基础科学，因为它将有助于我们理解和控制 AI 智能。
对齐安全评估：他还提到了在可解释性研究中的一些挑战，如基于强化学习的模型可能缺乏对齐性，可能导致它们绕过限制达成目标，这是一个风险。
经济和社会影响：道格拉斯讨论了 AGI 对人类生活和工作的潜在影响。他认为，如果没有适当的监管和预防措施，AGI 可能会产生负面后果，如大量失业或 AI 自主运行而造成毁灭性结果。
未来创造力：道格拉斯展望了一个充满创造力的未来，希望人们能够通过”氛围创造”来生成电视节目、视频游戏等等，每个人都将被赋予巨大的创造力杠杆。
个人的故事和推荐：访谈的结尾是道格拉斯分享了他自己的小故事，他感兴趣AGI的发展，并在2020年开始关注这个话题。此外，他推荐了 Anthropic 的关于可解释性研究的论文，这些论文非常值得一读，帮助人们真正认识 AI 智能的本质。

Reference:

https://www.youtube.com/watch?v=W1aGV4K3A8Y

Building Physical Intelligence by Karol Hausman (BAAI2025)

Avoiding Catastrophic Risks from Uncontrolled AI Agency by Yoshua Bengio