Google Unveils Gemini: A Highly-Awaited Large Language Model

Blog

Gemini, Google’s highly anticipated large language model and competitor to GPT-4, is now available to consumers using Bard or Pixel 8 Pro. The enterprise model, Gemini Pro, will be released on Dec. 13. Developers can sign up for an early preview of Gemini in Android AICore.

Gemini is a powerful language model that utilizes generative artificial intelligence applications. It can summarize text, create images, and provide answers to questions. Trained on Google’s Tensor Processing Units v4 and v5e, Gemini enhances Google Bard by offering advanced reasoning, planning, and understanding capabilities.

There are three sizes of Gemini models: Ultra, Pro, and Nano. Ultra is the most capable, Nano is the smallest and most efficient, and Pro falls in between, serving general tasks. The Nano version is used on the Pixel, while Bard utilizes Pro. Google plans to conduct thorough trust and safety checks before releasing the Ultra version to select groups.

In addition to language tasks, Gemini is capable of coding in popular programming languages like Python, Java, C++, and Go. It has been utilized to upgrade Google’s AI-powered code generation system, AlphaCode.

Google has plans to integrate Gemini into other products such as Ads, Chrome, Duet AI, and eventually Google Search.

Gemini faces competition from other large language models such as OpenAI’s GPT-4, Microsoft’s Copilot, Anthropic’s Claude AI, and Meta’s Llama 2. Google claims that Gemini Ultra outperforms GPT-4 in various benchmarks, including language understanding and Python code generation.

The enterprise product, Gemini Pro, will be accessible to enterprise customers and developers through the Gemini API in Google’s Vertex AI or Google AI Studio starting Dec. 13. Gemini Nano is expected to be available to developers and enterprise customers in early 2024.

Gemini’s ability to understand and reason about users’ intent makes it particularly useful for enterprise use cases. It generates a customized user interface based on whether the user is seeking images or text and can ask for clarification in areas where it lacks information. Gemini has been trained on multimodal content from the outset, enabling it to parse written or visual information with equal precision.

Compared to other popular large language models, Gemini’s timing is advantageous, as it incorporates multimodal capabilities and is natively integrated into the Google Pixel 8. Users can access Gemini on their Pixel 8 without an internet connection, distinguishing it from ChatGPT, which initially operated in a browser.