Here is the translated text:

Gemini 2.5 Pro is a software application that enables rapid development of endless runner games, generates high-quality images, and allows for programming. It supports creating scenes with p5 js without HTML and can produce pixel-style dinosaurs and interesting backgrounds. Gemini 2.5 Pro also allows users to subscribe for $20 per month to get a faster rate limit version, with the exact pricing to be announced.

Recently, OpenAI released GPT-4o image generation capabilities, which is an incredible technology product that enables accurate, precise, and realistic image output. Unlike its predecessor DALL-E, GPT-4o allows for more controlled detail in generated images. OpenAI CEO Sam Altman said he was amazed by the first batch of images generated by the model, wondering if they were truly created by AI.

GPT-4o is a native integrated model in ChatGPT that uses self-supervised learning (RLHF technology) to learn from human feedback and generate more accurate and practical images. This new feature enables multiple-round editing, photo-realistic rendering, and text-to-image rendering.

However, OpenAI also noted that due to the increased detail of generated images, image rendering time is longer, taking over a minute per image. Additionally, the current model still has limitations, such as inappropriate cropping and hallucinations. These issues will be addressed through future improvements.

In summary, this latest 4o image generation milestone resolves several key issues, marking another significant step towards practical AI image generation applications, and pushing the text-to-image field to a new competitive level. It’s likely that other AI companies will soon follow suit.

Translation

Gemini 2.5 Pro是一款能够快速开发无尽跑酷游戏、生成高质量图像以及进行编程等功能的应用程序。它支持使用p5 js创建场景,不需要HTML,并且可以实现像素风格的恐龙和有趣的背景。 Gemini 2.5 Pro还允许用户以20美元的订阅费用获得更高速率限制的版本,并将推出确切定价。

最近,OpenAI发布了GPT-4o图像生成功能,这是一项令人难以置信的技术产品,可以实现精确、准确和逼真的图像输出。相比之下,之前的DALL-E模型在生成图像时细节不太能够受到控制。OpenAI的CEO Sam Altman表示,他看到模型生成的第一批图像时很难相信它们真的是由AI创作的。

GPT-4o是一个原生嵌入在ChatGPT中的自回归模型,通过基于人类反馈的强化学习(RLHF技术),模型学会了更为精准地遵循人类的指令,从而生成更加准确和实用的图像。这个新功能可以实现多轮编辑、照片级别的真实感与写实风格的适应、精确文本渲染等功能。

但是,OpenAI也表示,由于模型可以创建更加细腻的图片,因此图像渲染的时间也更长,一张图通常需要一分钟以上。此外,目前这个生图模型还不完美,会出现裁剪不恰当和一定的幻觉现象等问题。后续将通过改进来解决这些限制。

总之,这次4o图像生成解决掉了几个关键问题,标志着AI图像生成向实际应用又一大步迈进,也再一次将文生图领域拉到了一个新的竞争高度。估计其他AI公司也会开始卷起来了。

Reference:

https://www.youtube.com/watch?v=wMoN5OGqD8g


<
Previous Post
Think-Then-React TTR
>
Next Post
Tracing the thoughts of LLM (by Anthropic)