Google's Gemma 2 outshines Llama 3

At its annual Google I/O conference last May, the giant made a myriad of generative AI announcements. The firm notably unveiled Gemma 2, the second generation of its family of large Gemma language models. Google then demonstrated the performance of its model, available in a single version with 27 billion parameters.

Today, it seems that a second version has been developed since the giant has just made Gemma 2 9B and Gemma 2 27B available to researchers and developers. And the least we can say is that the firm does not hide its pride: “Gemma 2 is more efficient and effective in inference than the first generation, with significant advances in security.”

Two versions capable of competing with those of Llama 3

The version with 27 billion parameters is called“competitive alternative to models more than twice the size”a nod to Meta’s Llama 3 70B. Google attributes this to the ability to run these models on a single H100 Tensor Core GPU, an A100 80GB, or a Google Cloud TPU host, which significantly reduces deployment costs. Compared to Llama 3 70B, the 27B version scores 75.2% on the MMLU test (which tests both world knowledge and problem-solving ability) compared to 79.5% for Meta’s version. On the BBH reasoning test, Gemma 2 27B scores 74.9% while Llama 3 70B scores 81.3%.

For its part, the 9B Gemma 2 model also offers the best performance in its class, assures Google, surpassing the Llama 3 8B and other open models in its size category. The summary table of the tests carried out shows that version 9B is better than Llama 3 8B on the MMLU and GSM8K benchmark (test on solving mathematics problems). Both models score roughly the same on the HellaSwag test (assess advanced natural language understanding and common sense reasoning in AI models).

Impressive inference capabilities

Google promises such high-quality inference with Gemma 2 that the model can run on a range of hardware, from gaming laptops to high-end desktops to cloud-based configurations. The company cites computers with a graphics card as examples Nvidia RTX or a GeForce RTX via Hugging Face Transformers.

Like the first-generation Gemma models, Gemma 2 is available under the Gemma Commercial License so that developers and researchers can share and commercialize their output. Note that starting next month, Google Cloud customers will be able to deploy and manage Gemma 2 on Vertex AI.

A compact model to come

The firm intends to continue the development of this family of models and indicates that a “upcoming 2.6 billion parameter Gemma 2 model, designed to bridge the gap between lightweight affordability and powerful performance” will soon see the light of day. This version was trained on 2,000 billion tokens, details the firm in a dedicated technical report. By comparison, the Gemma 2 27B and 9B versions were trained on 13 trillion tokens of mostly English data and 8 trillion tokens, respectively.

Important clarification: these tokens come from various data sources, including web documents, code, and scientific articles. These are not multimodal models and are not specifically trained for state-of-the-art multilingual capabilities.

Selected for you