Arc Institute and NVIDIA Unveil Evo 2, the Largest AI Model for Biology

Arc Institute and NVIDIA released Evo 2, a 40-billion parameter AI foundation model trained on 9.3 trillion nucleotides that can predict and generate DNA, RNA, and protein sequences across all domains of life.

Arc Institute, in collaboration with NVIDIA and researchers from Stanford University, UC Berkeley, and UC San Francisco, released Evo 2 on February 19, 2025. The model represents the largest AI system for biology to date, containing approximately 40 billion parameters and trained on 9.3 trillion nucleotides of genomic data spanning bacteria, archaea, and eukaryotes including humans.

Technical diagram showing vulnerability chain

Figure 1: Visual representation of the BeyondTrust vulnerability chain

What Happened

Arc Institute announced Evo 2 on February 19, 2025, through a coordinated release that included a preprint publication, open-source code, and integration with NVIDIA's BioNeMo cloud platform. The announcement followed approximately two years of development work at Arc Institute's Palo Alto headquarters.

The model builds on the original Evo architecture released by Arc Institute in November 2024, which demonstrated the feasibility of training large language models on genomic sequences. Evo 2 scales this approach significantly, increasing the parameter count from 7 billion to approximately 40 billion and expanding the training dataset from 2.7 million genomes to over 128,000 complete genomes plus additional sequence data.

According to Patrick Hsu, a core investigator at Arc Institute and assistant professor at UC Berkeley, the model "thinks in nucleotides" rather than amino acids or higher-level biological abstractions. The research team stated that this approach allows Evo 2 to capture regulatory elements, non-coding RNAs, and other genomic features that protein-focused models cannot represent.

NVIDIA provided computational infrastructure for training, which required thousands of GPU hours on their DGX systems. The company integrated Evo 2 into BioNeMo, their platform for biological AI models, making it accessible to researchers and pharmaceutical companies through cloud APIs.

Key Claims and Evidence

Arc Institute and collaborating researchers made several technical claims about Evo 2's capabilities, supported by experimental validation described in their preprint.

Genome-scale generation: The team reported generating complete bacteriophage genomes of approximately 50,000 base pairs that were synthesized and shown to produce viable viruses capable of infecting bacterial hosts. According to the preprint, these AI-designed phages demonstrated functional replication cycles in laboratory tests.

CRISPR system design: Researchers used Evo 2 to generate novel CRISPR-Cas systems, including guide RNAs and associated proteins. The preprint states that some generated systems showed functional activity when tested experimentally, though specific efficiency metrics were not disclosed in initial announcements.

Mutation effect prediction: The model demonstrated ability to predict pathogenicity of human genetic variants by analyzing how mutations affect sequence context. According to Nature's coverage, Evo 2 achieved competitive performance on benchmark datasets for variant effect prediction, though direct comparisons with specialized tools like AlphaMissense were not provided.

Cross-domain learning: Training data included sequences from all three domains of life, enabling the model to identify conserved patterns and domain-specific features. The researchers stated that this broad training improved performance on tasks involving less-studied organisms compared to models trained only on well-characterized species.

Figure 2: How the authentication bypass vulnerability works

Pros and Opportunities

Evo 2 presents several potential advantages for biological research and biotechnology applications.

Drug discovery acceleration: Pharmaceutical companies could use the model to design therapeutic proteins, optimize antibody sequences, or identify drug targets based on genomic analysis. The integration with NVIDIA BioNeMo provides enterprise-ready infrastructure for such applications.

Synthetic biology tools: The demonstrated ability to generate functional CRISPR systems and viable phage genomes suggests applications in developing new gene editing tools and phage therapies for antibiotic-resistant infections.

Rare disease research: Variant effect prediction capabilities could help interpret genetic variants of uncertain significance, potentially improving diagnosis rates for rare genetic conditions.

Open science benefits: Arc Institute released model weights and code under open licenses, enabling academic researchers to use and build upon the work without licensing fees. The nonprofit structure of Arc Institute aligns with making foundational tools broadly accessible.

Computational efficiency: According to NVIDIA, the BioNeMo integration allows researchers to run Evo 2 inference without maintaining their own GPU infrastructure, reducing barriers to adoption for smaller research groups.

Cons, Risks, and Limitations

Several limitations and concerns accompany the Evo 2 release.

Biosecurity considerations: The ability to design functional viral genomes raises dual-use concerns. While the researchers focused on bacteriophages rather than human pathogens, the underlying capability to generate novel genetic sequences could theoretically be applied to harmful purposes. Arc Institute stated they consulted with biosecurity experts during development but did not detail specific safeguards.

Validation requirements: Generated sequences require experimental validation before practical use. The computational predictions, while promising, do not guarantee functional outcomes when synthesized and tested in biological systems.

Training data biases: The model's training data overrepresents well-studied organisms, particularly bacteria and model organisms. Performance on underrepresented species or novel sequence types remains less characterized.

Computational costs: Despite cloud availability, running large-scale analyses with a 40-billion parameter model requires substantial computational resources. Costs for extensive use through BioNeMo were not disclosed in initial announcements.

Interpretability challenges: Like other large language models, Evo 2 operates as a black box. Understanding why the model makes specific predictions or generates particular sequences remains difficult, limiting its utility for hypothesis-driven research.

Figure 3: Privilege escalation from user to SYSTEM level

How the Technology Works

Evo 2 employs a transformer architecture adapted for biological sequences, processing DNA as a sequence of nucleotide tokens (A, T, G, C) rather than words or subwords used in text models.

Architecture: The model uses a decoder-only transformer with approximately 40 billion parameters. Input sequences are tokenized at the single-nucleotide level, allowing the model to capture fine-grained sequence patterns. The architecture includes modifications for handling very long sequences, as genomes can span millions of base pairs.

Training approach: Evo 2 was trained using next-token prediction, the same objective used for language models like GPT. Given a sequence of nucleotides, the model learns to predict the next nucleotide in the sequence. This self-supervised approach requires no labeled data, enabling training on the vast corpus of publicly available genomic sequences.

Multi-scale learning: The training data included sequences at multiple scales, from short regulatory elements to complete chromosomes. According to the researchers, this multi-scale exposure helps the model learn both local sequence motifs and long-range genomic organization.

Generation process: To generate new sequences, the model samples nucleotides autoregressively, with each prediction conditioned on previously generated tokens. Researchers can guide generation by providing seed sequences or applying constraints during sampling.

Technical context for experts: Evo 2 extends the StripedHyena architecture used in the original Evo model, incorporating improvements for handling genomic data. The model processes sequences in chunks with overlapping context windows to manage memory requirements for long sequences. Training employed mixed-precision computation and model parallelism across multiple NVIDIA H100 GPUs.

Broader Industry Implications

The release of Evo 2 reflects several trends in AI and biotechnology that extend beyond this specific model.

Foundation model paradigm in biology: Following the success of foundation models in language and vision, biology is emerging as the next domain for large-scale pretraining. Evo 2 joins models like ESMFold, AlphaFold, and others in establishing AI as a core tool for biological research. The scale of Evo 2 suggests continued investment in larger biological AI systems.

NVIDIA's healthcare expansion: The BioNeMo integration represents NVIDIA's ongoing push into life sciences computing. The company has positioned itself as infrastructure provider for AI-driven drug discovery, competing with cloud providers and specialized biotech computing platforms.

Nonprofit research model: Arc Institute's structure as a nonprofit with significant philanthropic funding (over $650 million since founding) represents an alternative to traditional academic and commercial research models. The institute's ability to release large models as open source while maintaining research quality could influence how other organizations approach similar projects.

Synthetic biology maturation: The demonstrated ability to design functional genetic systems computationally marks progress toward programmable biology. As these capabilities improve, the field moves closer to designing organisms and biological systems with specified properties.

What Remains Unclear

Several aspects of Evo 2 and its implications require further clarification or investigation.

Performance benchmarks: While the announcement included qualitative demonstrations, comprehensive quantitative comparisons with existing tools for specific tasks (variant effect prediction, protein design, etc.) were not provided in initial materials. Independent benchmarking will help establish where Evo 2 excels or falls short.

Commercial terms: NVIDIA's BioNeMo platform offers Evo 2 access, but pricing and usage terms for commercial applications were not detailed. The economics of using the model at scale remain unclear for potential adopters.

Biosecurity measures: The specific safeguards implemented to prevent misuse were not fully described. How Arc Institute and NVIDIA plan to monitor for concerning applications of the open-source model remains an open question.

Experimental validation scope: The preprint describes successful synthesis of AI-designed sequences, but the full scope of experimental validation, including failure rates and edge cases, requires peer review and independent replication.

What to Watch Next

Several developments will indicate how Evo 2 and similar models progress.

Peer review outcomes: The preprint will undergo peer review, which will provide independent assessment of the technical claims and experimental results.

Adoption patterns: Usage statistics from BioNeMo and GitHub will indicate whether the research community finds Evo 2 useful for practical applications.

Follow-on research: Publications building on Evo 2 for specific applications (drug discovery, diagnostics, synthetic biology) will demonstrate real-world utility.

Competitive responses: Other organizations working on biological AI, including academic groups and companies like Meta AI and Google DeepMind, may release competing or complementary models.

Regulatory attention: Biosecurity discussions around AI-designed genetic sequences may attract regulatory interest, particularly if demonstrated capabilities expand to more concerning applications.

Arc Institute roadmap: The institute's plans for Evo 3 or related models will indicate the trajectory of this research program.

Arc Institute and NVIDIA Unveil Evo 2, the Largest AI Model for Biology

What Happened

Key Claims and Evidence

Pros and Opportunities

Cons, Risks, and Limitations

How the Technology Works

Broader Industry Implications

What Remains Unclear

What to Watch Next

Sources & References

Related Topics

METR Study Finds AI Coding Tools Reduce Developer Productivity by 19 Percent

Mercury Diffusion LLM Achieves Record Inference Speeds

Nvidia Blackwell Architecture Deep Dive Reveals Massive GPU Design

What Happened

Key Claims and Evidence

Pros and Opportunities

Cons, Risks, and Limitations

How the Technology Works

Broader Industry Implications

What Remains Unclear

What to Watch Next

Sources & References

Related Topics

Related Reading

METR Study Finds AI Coding Tools Reduce Developer Productivity by 19 Percent

Mercury Diffusion LLM Achieves Record Inference Speeds

Nvidia Blackwell Architecture Deep Dive Reveals Massive GPU Design