Here is the translation of the contents into English:

  1. Utility Problem: Despite AI having surpassed human capabilities, there hasn’t been a significant change in the world from an economic and GDP perspective. This is because the current assessment settings are out of sync with reality.

  2. Limitations of Assessment Settings:
    • Assessments typically require automatic agents to complete tasks independently and receive corresponding rewards. However, in real life, agents often interact with humans throughout the task process.
    • To resolve issues, back-and-forth communication is necessary, which is closer to real-world scenarios compared to traditional evaluation methods.
  3. Independence and Identical Distribution (i.i.d.) Assumption:
    • Assessments often assume independent and identical distribution, meaning a test set of 500 tasks can be run independently, with metrics calculated for each task, and then averaged to obtain an overall indicator.
    • However, in reality, solving tasks is not done in parallel but sequentially.

Yao Shunyu emphasizes that to overcome these challenges and problems, we need to adopt new development patterns, including:

  • Developing novel assessment settings or tasks tailored to real-world practicality and using general methods to solve them.
  • Or enhancing existing methods by adding novel components, then cycling through this process repeatedly.

This indicates that AI development has reached a critical juncture, shifting from training-focused to evaluation-centered. Reconsidering how we evaluate will be the key driver of AI’s continued growth and development.

Translation

姚顺雨关于AI发展“下半场”的观点主要涉及到几个方面:

  1. 效用问题:即使AI已经具备了超出人类的能力,但从经济和GDP的角度来看,世界并没有发生太大的变化。这是因为现有的评估设置与现实世界的实际情况存在差异。

  2. 评估设置的局限性
    • 评估通常要求自动运行一个Agent独立完成任务,然后获得相应的任务奖励。但是在现实生活中,Agent往往需要在整个任务过程中与人类进行互动。
    • 要来回沟通几次,以解决问题,这是与传统的评估方式相比更贴近现实场景。
  3. 独立同分布(i i d)的假设
    • 评估往往要求在独立同分布的情况下进行,即假设有一个包含500个任务的测试集,独立运行每个任务,计算每个任务的指标,然后取平均值得到一个整体指标。
    • 然而,在现实世界中,解决任务的方式并不是并行的,而是顺序进行的。

姚顺雨强调,为了解决这些问题和挑战,我们需要采用新的发展模式。这包括:

  • 为现实世界的实用性去开发新颖的评估设置或任务,然后用通用的方法去解决这些任务。
  • 或者通过添加新颖的组件来增强这些方法,然后再不断循环这个过程。

这表明,AI的发展正处于一个关键的转折点,从注重训练到重视评估,如何重新思考评估的方式,将是推动AI持续发展的关键所在。

Reference:

https://ysymyth.github.io/The-Second-Half/


<
Previous Post
Richard Sutton vs David Silver: Welcome to the Era of Experience
>
Next Post
Aidan Gomez interview: Scaling Limits Emerging, AI Use-cases with PMF & Life After Transformers