The CEO of Google DeepMind, Demis Hassabis, has introduced the latest version of Google’s Gemini large language model (LLM). Known as Bard in its previous form, the new iteration of the LLM, called Gemini 1.5, aims to shift the focus from OpenAI’s ChatGPT to Google’s own advanced AI technology.
Hassabis highlighted the significantly improved performance of Gemini 1.5 in a blog post, stating that it represents a major breakthrough in Google’s AI development approach. The Pro version, currently available as a developer preview, is specifically optimized for better understanding of long-context information. Hassabis shared a video demonstrating Gemini 1.5’s ability to summarize a 402-page transcript of the Apollo 11 Moon landing mission.
Another video showcased Gemini 1.5’s analysis of a 44-minute Buster Keaton movie, where it successfully identified a scene in which the main character picks up a piece of paper.
A Google engineer tweeted about submitting three JavaScript programs with over 100,000 lines of code as inputs to Gemini 1.5. They praised the model’s capability to identify highly relevant examples when asked to find the top three instances regarding a specific skill, out of hundreds of possibilities. The engineer also mentioned how Gemini was able to locate specific animations and provide instructions on how to modify the corresponding code to achieve desired changes.
Jeff Dean, chief scientist at Google DeepMind, also tweeted about Gemini 1.5’s language translation capabilities. It was able to learn and translate the Kalamang language, previously unseen by the model, into English. Gemini 1.5 was trained using a 573-page book on Kalamang grammar and a bilingual word list. Quantitative research showed that the model scored 4.36 out of 6 in comparison to a human learning the Kalamang language, who scored 5.52.
Hassabis explained that Gemini 1.5 employs a new Mixture-of-Experts (MoE) architecture. This architecture allows the model to activate only the most relevant expert pathways within its neural network depending on the given input. According to Hassabis, this specialization significantly enhances the model’s efficiency.