Jeff Dean: trends of ML
Here are the key points from the Q&A session:
On the importance of data quality:
- High-quality training data is essential for improving model performance.
- Having too much low-quality data can actually decrease a model’s ability to perform well on specific tasks.
On the future of LLMs (Large Language Models):
- The speaker disagrees that we’ve exhausted all high-quality training data, pointing out that there’s still a vast amount of video data waiting to be used.
- Multimodal models (combining multiple types of data) are an important area of research and may lead to better performance on various tasks.
On the cost of training large models:
- The cost of training large models prevents small startups from making significant impacts.
- However, there’s still a wide range of interesting research projects that can be done with limited resources (e.g., one GPU or a handful of GPUs).
On the emphasis on LLMs stifling other work:
- There is a worry that the focus on LLMs might crowd out other innovative ideas in machine learning.
- The speaker encourages exploring alternative models and ideas, even if they don’t seem as fully developed.
On multimodal models:
- Multimodal models (combining multiple types of data) are becoming increasingly important.
- The speaker sees a future where we deal with 50-100 different modalities of data, not just language and human-centric data.