e develop machine learning models tailored for biomolecular problems, including the prediction of protein structure from sequence. Increasingly, such models combine the latest advances in deep learning with inductive priors about both the geometry (e.g. equivariance with respect to Lie groups) and physics (e.g. energy conservation) of protein folding. This research pushes the boundaries of not only molecular biology but machine learning too, as the complex nature of molecular data (graphs, time series, topologies and geometries) necessitates bespoke computational primitives. Moving forward, the integration of machine learning with physical theory will likely prove crucial.
ellular complexity arises chiefly not from the behavior of individual proteins but from the interactions of proteins with ligands including other proteins, nucleic acids, and small molecules. We use machine learning to predict protein-ligand interactions using both 'implicit' models that decompose an interaction into an abstract representation (e.g. sequence of amino acids) and 'explicit' models that rely on the atomic coordinates of the three-dimensional structures of interacting molecules. Naturally, the performance of these models depends on our ability to build powerful protein representations and predict accurate protein structures.
key technology in computational protein science is representation learning, which maps protein sequences to high-dimensional spaces that are organized so that nearby points are functionally related, with different dimensions corresponding to different functional concepts. Crucially, these concepts are not a priori determined but are learned automatically by algorithms without needing expensive 'labelled' data (e.g. protein structures). Representation learning elucidates protein sequence-function relationships without explicit reference to structure, and stands to be a foundational tool for protein design and our understanding of protein biology and the relationship between proteomes and organismal function.
rotein-ligand interactions form networks whose behavior determine cellular function, state, and type. Well-studied examples include transcriptional networks (protein-DNA), metabolic networks (enzyme-metabolite), and signal transduction networks (protein-protein). We focus on the latter, using our machine-learned molecular models to investigate their organization and logic. Our research spans the microscale (individual molecules), macroscale (topology of the entire network), and most importantly the mesoscale, where ideas of reuse and abstraction from computer science find particular relevance. Our aim is to understand the computations performed and computational paradigms employed by these networks, especially as artifacts of evolutionary processes.
he signaling networks that underlie human health (and disease) are ultimately a consequence of normal (and aberrant) molecular interactions. This makes mechanistic molecular models a particularly powerful tool to study both the natural variability of signaling networks and their dysregulation in disease by linking changes in 'input' (e.g. protein sequence) to changes in 'output' (e.g. binding affinity of mutated protein). In effect, such models provide testable hypotheses of the genotype-phenotype map at the molecular level. Viewed as a complement to purely statistical approaches that are commonly used in genome-wide association studies, molecular models provide a biophysical perspective through which to analyze somatic and germline variation.