A generative model for protein design

is a generative model that creates new protein molecules based on geometric and functional programming instructions.

Three billion years of evolution have produced a tremendous diversity of protein molecules, but the full potential of this molecular class is likely far greater. Accessing this potential has been challenging for computation and experiments because the space of possible protein molecules is much larger than the space of those likely to host function. 

Here we introduce Chroma, a generative model for proteins and protein complexes that can directly sample novel protein structures and sequences and can be conditioned to steer the generative process towards desired properties and functions. To enable this, we introduce a diffusion process that respects conformational statistics of polymer ensembles, an efficient neural architecture for molecular systems based on random graph neural networks that enables long-range reasoning with sub-quadratic scaling, equivariant layers for efficiently synthesizing 3D structures of proteins from predicted inter-residue geometries, and a general low-temperature sampling algorithm for diffusion models. 

Chroma realizes protein design as Bayesian inference under external constraints, which can involve symmetries, substructure, shape, semantics, and even natural-language prompts. Experimental characterization of 310 proteins shows that sampling from Chroma results in proteins that express, fold, and have favorable biophysical properties. Crystal structures of two designed proteins exhibit atomistic agreement with Chroma samples (backbone RMSD of ~1.0 Å). With this unified approach to protein design, we hope to accelerate the prospect of programming protein matter for human health, materials science, and synthetic biology.

Explore Proteins
View and examine a selection of chroma generated proteins.
  • Generate an unconditional protein with residues per chain.

  • Generate a protein with a specified secondary structure: with residues.

  • Generate a protein from symmetry group

  • Generate a protein in the shape of an alphanumeric character:


Authors

John B. Ingraham, Max Baranov, Zak Costello, Karl W. Barber, Wujie Wang, Ahmed Ismail, Vincent Frappier, Dana M. Lord, Christopher Ng-Thow-Hing, Erik R. Van Vlack, Shan Tie, Vincent Xue, Sarah C. Cowles, Alan Leung, Jo ̃ao V. Rodrigues, Claudio L. Morales-Perez, Alex M. Ayoub, Robin Green, Katherine Puentes, Frank Oplinger, Nishant V. Panwar, Fritz Obermeyer, Adam R. Root, Andrew L. Beam, Frank J. Poelwijk, and Gevorg Grigoryan.


Special Thanks

We thank W. F. DeGrado, R. Kormos and Generate:Biomedicines employees A. Ramos, A. Delhagen, A. Jecrois, B. R. P. Saravanan, B. Hannigan, B. Patuto, B. Vogler, D. Moonan, D. Curran, D. Ferguson, E. Brignole, E. Palovcak, J. Lucas, J. McFarland, J. Huaman-Argandona, J. Garlick, K. Tamang, K. Hopson, M. Pattie, M. Jankowiak, M. Saputo, M. Nally, M. Mathur, M. Gibson, N. Shaban, N. Joh, R. Chaudhary, R. Federman, S. Clancy, S. DeCamp, T. Linsky, Y. Liu, and Z. Harteveld for assistance with experimental and computational methods development, discussions and input on manuscript drafts; and B. Turner and staff at the MIT Biophysical Instrumentation Facility for providing training and access to the CD spectrometer. The study used the resources of the MIT Structural Biology Core Facility and the MIT Biophysical Instrumentation Facility. Special thanks to William Wolfe-McGuire for HPC support and troubleshooting, as well as to Anthony Colangelo, Katie Bumatay, Megan McLaughlin, Ryan Sherwood, Stefanie Gesuero, and Zoe Moutsos for their design and communications work.