ConfRover Logo

An autoregressive model generates protein conformations and dynamics

Overview

Proteins are not static molecules - they constantly move and transition between different conformations. Understanding these dynamics is crucial for explaining how proteins function. Molecular dynamics (MD) simulations capture atomic motion over time by modeling physical interactions and gradually mapping out the conformational space, providing rich data for studying protein conformational ensembles and dynamics. ConfRover learns from MD simulation data to directly generate protein conformations or motion trajectories, providing a fast alternative to costly MD runs.

The key idea is simple: sampling protein conformations or trajectories can be viewed as generating each conformation (frame) either independently or autoregressively conditioned on preceding frames, like language models. This unified view provides an efficient framework for learning and generating protein conformational dynamics across a variety of tasks.

| |
Distributions
Dynamics
Transition Pathways

Model

ConfRover brings together the strengths of modern protein structure predictions, language-model-like sequence models, and diffusion probabilistic models to capture the complex spatiotemporal dependencies in protein motion and sample new conformations conditioning on historical context.

  1. Structure-aware context encoding. ConfRover integrates folding modules with a frame encoder to represent context conformations using rich single- and pair-wise structural features, leveraging the best practices from modern protein structure modeling.
  2. Autoregressive trajectory modeling. The trajectory module interleaves structural and temporal update blocks, enabling the model to capture complex spatiotemporal relationships across frames.
  3. Physical time encoding. The timestamp of each frame is encoded with Rotary Positional Encoding (RoPE), allowing ConfRover to represent relative temporal positions and learn from motions at multiple timescales.
  4. Diffusion decoding in continuous SE(3) space. A diffusion-based decoder predicts conformations directly in continuous 3D space, avoiding discretized tokenization and preserving fine-grained geometric details.
  5. Efficient long-trajectory learning and generation. A causal transformer backbone enables efficient parallel training, KV-cached autoregressive generation, and flexible trajectory lengths. The same architecture also supports unconditional conformation generation using a learned mask-token initialization.

Model illustration

Examples

ConfRover Logo