- Google is working on Gemini, its next-generation AI foundation model that can combine conversational text with image generation.
- The company has pulled in key DeepMind and Google Brain team members to work on this.
- Gemini could release for developers as soon as this fall and be integrated into several Google products for consumers.
AI has been the buzzword for 2023 as companies race each other to find innovative ways to use AI. We’ve seen Microsoft take the lead with its integration of ChatGPT into Bing Chat. This caused a lot of innovation leaders to scramble to protect their position. Google reacted with the release of Google Bard and the integration of AI within several of its consumer-facing products, but it seems the company has even more in store with what it hopes to do with AI in the form of Gemini.
According to a report published by The Information citing an anonymous source, Google is working on its largest AI project yet in the form of “Gemini” that could launch as early as this fall. Gemini is the company’s next-generation AI foundation model comprising a group of large machine-learning models.
With Gemini, Google hopes to surpass competition that has primarily focused on a singular medium for its large language models. It could combine conversational text capabilities with AI image generation, making it fit more general-purpose use cases.
Gemini would thus not only be able to generate text like ChatGPT but also create contextual images and hopefully even go beyond this. In the future, it could possibly be used to analyze charts, create graphics with text descriptions, and control software with text or voice commands.
Google is also reportedly using YouTube video transcripts to train Gemini. Models trained on YouTube videos can provide advice based on video content, like helping mechanics diagnose a problem based on car repair videos, for example. Using YouTube video content could also help Google develop text-to-video software.
However, the company’s lawyers closely monitor the training materials to avoid training on copyrighted materials. In one instance, the lawyers made researchers remove training data from textbooks over concerns about pushback from copyright holders.
The company could integrate Gemini into its suite of products and services, such as Bard, Google Docs, and Slides. We can expect to see some form of developer release for Gemini before the end of the year, though the company may start using it in some consumer products sooner than that. Developers can expect some cost-gated access to Gemini through the Google Cloud Platform.
To achieve these goals and beat the competition, Google has reportedly brought together several members of its Google Brain and DeepMind teams together to work on Gemini. This includes Google co-founder Sergey Brin, who is said to be instrumental in evaluating and training the Gemini models.