World Labs: 3D world generation model

Here is the translation of the contents from the XML tags:

Li Feifei’s World Labs company has set an example for us. Even when she was at Stanford University laboratory, she had been trying to teach computers “how to act in 3D worlds”. For example, by using large language models to instruct a robotic arm to perform tasks such as opening doors or making sandwiches. Later, she began planning to start her own business in April and officially founded World Labs in September to explore “spatial intelligence”.

According to Forbes, just over a week after its establishment, World Labs had already raised $230 million in venture capital funding, with an estimated valuation of over $10 billion.

Li Feifei mentioned in her TED talk that training an AI system that can understand complex physical worlds and the relationships between objects within them is “the key to solving the puzzle of artificial intelligence”. As for what “spatial intelligence” means, she explained that it involves visualizing insights, seeing as understanding, and understanding leading to action.

Human intelligence can be divided into many types, one of which is linguistic intelligence, which enables us to communicate with others through language. However, more fundamental perhaps is spatial intelligence, which not only allows us to understand and interact with the world around us but also helps us bring our mental images into reality in three-dimensional space and time, modeling the world, and reasoning about objects, places, and interactions.

Li Feifei has also publicly criticized OpenAI’s Sora model, pointing out that although it can generate videos, its core is still two-dimensional, lacking a deep understanding of three-dimensional space. In her view, two-dimensionality is superficial, while three-dimensionality is essential. Spatial intelligence, she believes, is the key to taking AI towards Artificial General Intelligence (AGI).

Translation

李飞飞团队的World Labs公司为我们打了个样早在李飞飞在斯坦福大学实验室的时候她就已经开始试图教会计算机“如何在3D 世界中行动”了例如通过使用大语言模型让机械臂执行诸如打开门、制作三明治等任务。后来，她在今年4月开始计划创业并于9月正式创立World Labs 探索“空间智能”。根据《福布斯》报道成立还不到一个星期 World Labs就已经筹集了2.3亿美元的风投资金估值超过10亿美元。

李飞飞在TED会议的演讲中提到过要训练一个能够理解复杂的物理世界及其内部物体相互关系的AI系统在李飞飞看来这是“解决人工智能难题的关键拼图”。而至于什么是“空间智能”，她说到视觉化为洞察看见成为理解，理解导致行动。人类智能可以分为许多种其中之一是语言智能它让我们能够通过语言进行交流和与他人联系但是更为基础的或许是空间智能它不仅让我们能够理解和与周围世界互动还能帮我们将脑海中的画面带入现实世界在三维空间和时间中对世界进行建模并对物体、场所和交互进行推理。

李飞飞也曾经公开diss过OpenAI 的 Sora 模型她指出尽管Sora能够生成视频但是它的核心仍然是二维的缺乏对三维空间的深入理解在她看来，二维是表象三维是本质空间智能才是 AI 通向 AGI 的钥匙。

Reference:

https://www.worldlabs.ai/blog

NeurIPS best paper award (Tian et.al.)

I-JEPA: A Human-Like world Model by Yann Lecun