EvolutionaryScale secures $142 million funding to enhance biology through generative AI


“If we could learn to read and write the code of life, biology would become programmable. Trial and error would be replaced by logic, and laborious experiments by simulation,” indicates the start-up EvolutionaryScale in the preamble to the presentation of its activity. This week, the startup announced raising $142 million in seed funding, led by Nat Friedman, Daniel Gross and Lux ​​Capital. Amazon Web Services (AWS) and the venture capital arm of Nvidia also participated in the fundraising.

Describing this event as “ChatGPT moment for biology”, Josh Wolfe, co-founder and partner of Lux Capital, told Reuters that the company had developed the first major language model for creating proteins and other biological systems. The start-up wishes to use its AI model called ESM3 to strengthen the “ability to program and create using the code of life.”

Advertisement

Pushing back the barriers of biology with AI

Alex Rives, current chief scientist at the start-up and formerly of Meta AI, says the company plans to use its AI for a wide range of applications, from accelerating drug discovery to designing microbes capable of breaking down plastic present in the environment. Customized versions are also being developed for this type of use. With this funding, the start-up intends to continue the development of its models, and actively recruit to establish partnerships with the biotechnology industry.

To date, the start-up has developed a family of ESM models, the smallest of which is available as open source for non-commercial research. AWS and Nvidia are expected to make models commercially available, including the larger ESM3 model. The model is in fact trained on GPUs from Nvidia and is qualified by the start-up as “most powerful model ever applied to training a biological model, trained with over 1×1024 FLOPS and 98 billion parameters.”

An evolution that would have taken the equivalent of 500 million years in nature

The company said it used its model “to design a fluorescent protein that deviates from the evolutionary trajectory of natural fluorescent proteins”. As a reminder, fluorescent proteins (GFP) are responsible for the bright colors of jellyfish and corals, and are important tools in modern biotechnology. The protein created by the start-up and called esmGFP has a sequence that is only 58% similar to the closest known fluorescent protein.


“Based on the rate of diversification of GFPs found in nature, we estimate that this generation of a new fluorescent protein is equivalent to simulating more than 500 million years of evolution.” To get there, EvolutionaryScale researchers transformed the three-dimensional structure and function of the three fundamental biological properties of proteins – sequence, structure and function – into alphabets.

Advertisement

A database representative of the Earth's diversity

The goal of this work is to write each three-dimensional structure as a sequence of letters. “This allows ESM3 to be trained at scale, unlocking emerging generative capabilities. ESM3's vocabulary bridges sequence, structure, and function within a single language model,” says the company.

The model therefore gains an in-depth understanding of the link between sequence, structure and function through varied data representative of the Earth's diversity – from the Amazon rainforest to the depths of the oceans, including the extreme environments such as hydrothermal vents and microbes found in a handful of soil. “At the scale of billions of proteins and billions of parameters, ESM3 learns to simulate evolution,” she adds.

Generate new proteins

The multimodal reasoning power of ESM3 allows scientists to generate new proteins with an unprecedented degree of control, the company believes. In particular, she gives as an example her ability to combine structure, sequence and function to “propose a potential scaffold for the active site of PETase, an enzyme that degrades polyethylene terephthalate (PET)” and is of interest to protein engineers who want to break down plastic waste.

The researchers further indicate that ESM3 improves with feedback using alignment methods similar to reinforcement learning from human feedback (RLHF) applied in LLMs. Feedback from laboratory experiments or existing experimental data could also be used in addition to improve the model and its generative capabilities.

AI to support the development of new biological systems

The integration of artificial intelligence – particularly generative AI – in sectors as critical as biology is attracting keen interest. Last May, Sanofi announced its intentions to accelerate the development and commercialization of drugs using artificial intelligence. To do this, the industrialist does not hesitate to surround himself with American companies, starting with OpenAI and Formation Bio.

The multi-year contract “is the first to bring together cross-industry expertise and proprietary data to build and train OpenAI models and custom software designed specifically for drug development,” says a Sanofi spokesperson.

In the case of EvolutionaryScale, the ESM3 model is a tool primarily intended for scientists working on protein design and synthetic biology. “ESM3 is just the first step in our roadmap for programming biology. We believe the future will be increasingly multimodal models that learn from biological data and integrate across all scales of life, from individual molecules to cells”, concludes the company.

Selected for you

Advertisement