Rapid advances in generative artificial intelligence have revolutionized biological modeling across domains such as protein, genetics, and single-cell. However, existing works often organize applications by molecule types or specific research tasks, overlooking the methodological convergence and cross-modal innovations. This paper aims to present a unified methodological perspective that highlights the fundamental technical commonalities across biological modalities. We systematically organize recent advances in generative modeling for biology through the lens of core machine learning paradigms, from language models (LMs) and diffusion models to their emerging hybrid architectures. Our work reveals how techniques initially developed for one molecular type (e.g., protein design) can be effectively transferred to others (e.g., RNA engineering), and identifies the convergence trend where discrete diffusion models and iterative language models represent different facets of a unified generative framework. We cover the evolution from domain-specific models to multi-modal biological foundation models and agent-based systems. By emphasizing methodological connections rather than applications, this paper aims to accelerate cross-domain innovation and make the field more accessible to the broader machine learning community. We conclude by identifying promising research directions where successful techniques in one biological domain remain unexplored in others, offering a roadmap for future advances in generative biology.
Figure 2. An overview of language-modeling pipelines for biological generation, illustrating how diverse biological inputs are tokenized, processed by a Transformer backbone, and trained under either masked bidirectional-context modeling or autoregressive causal modeling objectives.
Figure 3. Overview of the forward-reverse diffusion pipeline for biological generation, showing the forward noising process and the learned reverse denoising process, in both discrete (token-level) and continuous (geometry-level) formulations.
Figure 4. A unified generative-model view integrating classic architectures, language models, and diffusion models. This illustrates the perspective that iterative language modeling can be viewed as tokenized diffusion, where training and inference correspond to iterative denoising, supported by shared backbones and modality-specific (discrete/continuous) output heads.
Figure 5. A multi-modal foundation model for biomolecules, where heterogeneous inputs are embedded and fused into a backbone to support diverse downstream design tasks such as binder/vaccine design, motif scaffolding, structure generation, and inverse folding.
Figure 6. The biological agent paradigm that couples hypothesis generation with tool use/creation and laboratory automation, enabling iterative refinement from computational reasoning to experimental validation.
@article{
GenAI4bio:Li,
author = {Xiner Li and Xingyu Su and Yuchao Lin and Chenyu Wang and Yijia Xiao and Tianyu Liu and Chi Han and Michael Sun and Montgomery Bohde and Anna Hart and Wendi Yu and Masatoshi Uehara and Gabriele Scalia and Xiao Luo and Carl Edwards and Wengong Jin and Jianwen Xie and Ehsan Hajiramezanali and Edward De Brouwer and Qing Sun and Byung-Jun Yoon and Xiaoning Qian and Marinka Zitnik and Heng Ji and Hongyu Zhao and Wei Wang and Shuiwang Ji},
title = {Generative Artificial Intelligence for Biology: Toward Unifying Models, Algorithms, and Modalities},
journal = {ChemRxiv},
volume = {2026},
number = {0212},
pages = {},
year = {2026},
}