World Labs: 3D world generation model
Here is the translation of the contents from the
Li Feifei’s World Labs company has set an example for us. Even when she was at Stanford University laboratory, she had been trying to teach computers “how to act in 3D worlds”. For example, by using large language models to instruct a robotic arm to perform tasks such as opening doors or making sandwiches. Later, she began planning to start her own business in April and officially founded World Labs in September to explore “spatial intelligence”.
According to Forbes, just over a week after its establishment, World Labs had already raised $230 million in venture capital funding, with an estimated valuation of over $10 billion.
Li Feifei mentioned in her TED talk that training an AI system that can understand complex physical worlds and the relationships between objects within them is “the key to solving the puzzle of artificial intelligence”. As for what “spatial intelligence” means, she explained that it involves visualizing insights, seeing as understanding, and understanding leading to action.
Human intelligence can be divided into many types, one of which is linguistic intelligence, which enables us to communicate with others through language. However, more fundamental perhaps is spatial intelligence, which not only allows us to understand and interact with the world around us but also helps us bring our mental images into reality in three-dimensional space and time, modeling the world, and reasoning about objects, places, and interactions.
Li Feifei has also publicly criticized OpenAI’s Sora model, pointing out that although it can generate videos, its core is still two-dimensional, lacking a deep understanding of three-dimensional space. In her view, two-dimensionality is superficial, while three-dimensionality is essential. Spatial intelligence, she believes, is the key to taking AI towards Artificial General Intelligence (AGI).
Translation
李飞飞团队的World Labs公司 为我们打了个样 早在李飞飞在斯坦福大学实验室的时候 她就已经开始试图教会计算机“如何在3D 世界中行动”了 例如 通过使用大语言模型让机械臂执行诸如打开门、制作三明治等任务。 后来,她在今年4月开始计划创业 并于9月正式创立World Labs 探索“空间智能”。 根据《福布斯》报道 成立还不到一个星期 World Labs就已经筹集了2.3亿美元的风投资金 估值超过10亿美元。
李飞飞在TED会议的演讲中提到过 要训练一个能够理解复杂的物理世界 及其内部物体相互关系的AI系统 在李飞飞看来 这是“解决人工智能难题的关键拼图”。 而至于什么是“空间智能”,她说到 视觉化为洞察 看见成为理解,理解导致行动。 人类智能可以分为许多种 其中之一是语言智能 它让我们能够通过语言进行交流和与他人联系 但是更为基础的或许是空间智能 它不仅让我们能够理解和与周围世界互动 还能帮我们将脑海中的画面带入现实世界 在三维空间和时间中对世界进行建模 并对物体、场所和交互进行推理。
李飞飞也曾经公开diss过OpenAI 的 Sora 模型 她指出 尽管Sora能够生成视频 但是它的核心仍然是二维的 缺乏对三维空间的深入理解 在她看来,二维是表象 三维是本质 空间智能才是 AI 通向 AGI 的钥匙。
Reference:
https://www.worldlabs.ai/blog