What We Do


Our research program sits at the intersection of structural and systems biology, where we aim to build models of biomolecules and their interactions that are sufficiently performant for systems phenomena to emerge from molecular interactions. To accomplish this we use computer science to study biology in two fundamentally different ways: as a tool to analyze and model biological data, where machine learning is a particularly powerful hammer, and as a conceptual framework for reasoning about and formalizing biological phenomena, where programming languages and computer programs serve as useful analogs. To us, the former is bioinformatics; the latter computational and systems biology.

Our focus is computational but we host and collaborate with experimental colleagues. We also borrow tools from established fields of quantitative modeling, including control theory and dynamical systems, as no one discipline has a monopoly on good ideas.

Machine Learning Molecules

Protein Structure

e develop machine learning models tailored for biomolecular problems, including the prediction of protein structure from sequence. Increasingly, such models combine the latest advances in deep learning with inductive priors about both the geometry (e.g. equivariance with respect to Lie groups) and physics (e.g. energy conservation) of protein folding. This research pushes the boundaries of not only molecular biology but machine learning too, as the complex nature of molecular data (graphs, time series, topologies and geometries) necessitates bespoke computational primitives. Moving forward, the integration of machine learning with physical theory will likely prove crucial.

Protein-Ligand Interactions

ellular complexity arises chiefly not from the behavior of individual proteins but from the interactions of proteins with ligands including other proteins, nucleic acids, and small molecules. We use machine learning to predict protein-ligand interactions using both 'implicit' models that decompose an interaction into an abstract representation (e.g. sequence of amino acids) and 'explicit' models that rely on the atomic coordinates of the three-dimensional structures of interacting molecules. Naturally, the performance of these models depends on our ability to build powerful protein representations and predict accurate protein structures.

Protein and Proteome Representations

key technology in computational protein science is representation learning, which maps protein sequences to high-dimensional spaces that are organized so that nearby points are functionally related, with different dimensions corresponding to different functional concepts. Crucially, these concepts are not a priori determined but are learned automatically by algorithms without needing expensive 'labelled' data (e.g. protein structures). Representation learning elucidates protein sequence-function relationships without explicit reference to structure, and stands to be a foundational tool for protein design and our understanding of protein biology and the relationship between proteomes and organismal function.

Molecules Systems Biology

Logic & Organization of Signaling Networks

rotein-ligand interactions form networks whose behavior determine cellular function, state, and type. Well-studied examples include transcriptional networks (protein-DNA), metabolic networks (enzyme-metabolite), and signal transduction networks (protein-protein). We focus on the latter, using our machine-learned molecular models to investigate their organization and logic. Our research spans the microscale (individual molecules), macroscale (topology of the entire network), and most importantly the mesoscale, where ideas of reuse and abstraction from computer science find particular relevance. Our aim is to understand the computations performed and computational paradigms employed by these networks, especially as artifacts of evolutionary processes.

Variability and Misregulation in Signaling

he signaling networks that underlie human health (and disease) are ultimately a consequence of normal (and aberrant) molecular interactions. This makes mechanistic molecular models a particularly powerful tool to study both the natural variability of signaling networks and their dysregulation in disease by linking changes in 'input' (e.g. protein sequence) to changes in 'output' (e.g. binding affinity of mutated protein). In effect, such models provide testable hypotheses of the genotype-phenotype map at the molecular level. Viewed as a complement to purely statistical approaches that are commonly used in genome-wide association studies, molecular models provide a biophysical perspective through which to analyze somatic and germline variation.