MicroWorkshop | Computational Biology – July 10, 2020

Speakers
Andreas Pfenning | Modeling the Evolution of Cell Types in the Central Nervous System
Project Abstract
The brain is an enormously complex organ – even a tiny piece of human brain tissue can contain dozens of different subtypes of neurons, each of which play distinct roles in the neural circuits underlying complex behaviors. New technologies for measuring levels of genes within single cells have provided us with a molecular atlas of these cell types in several mammalian species, but the rapid pace of evolution has made it difficult to reliably trace some neural cell populations across many species over large evolutionary distances. Here, we develop a new computational approach, TMNT (toolkit for modeling nested trees), that jointly models the hierarchy of cell types, species, and brain regions to identify new evolutionary events. When applied to study retinal cell types, our tool identifies new patterns of gene expression and subtypes of rod cells that emerge in species with night vision.
Andreas Pfenning’s Bio
Andreas Pfenning is an assistant professor in the Computational Biology Department in the School of Computer Science at Carnegie Mellon University. Additionally, he has a courtesy appointment in the Department of Biological Sciences and is a member of the Center for the Neural Basis of Cognition, a joint venture between Carnegie Mellon and the University of Pittsburgh. The goal of the Pfenning group is to build a set of computational and genomic tools to study how genome sequence influences neural cells, neural circuits, disease, and behavior. The group is conducting research on the genetic mechanisms of Alzheimer’s disease and addiction, the epigenetics of aging, and the evolution of language behavior. Andreas joined Carnegie Mellon University in 2016 after completing training with Dr. Manolis Kellis as a postdoctoral associate in a joint position between the Computer Science and Artificial Intelligence Laboratory of the Massachusetts Institute of Technology and the Genetics Department of Harvard Medical School. He has a Ph.D. in Computational Biology and Bioinformatics from Duke University and a BS in Computer Science, from Carnegie Mellon (2006).
Gerald Quon | Generating Interpretable Visualizations of Single Cell Genomic Data
Project Abstract
Non-linear dimensionality reduction methods such as t-SNE and UMAP are standard tools for visualizing and exploring genomic datasets. Their principal limitation is that they are not interpretable; it is incredibly challenging to infer how variation along different axes of their plots is related to variation in the original features, such as gene expression patterns. In this talk, I will discuss the development of a scalable, interpretable variational autoencoder (siVAE), a non-linear dimensionality reduction method that can generate embeddings qualitatively similar to t-SNE and UMAP but are interpretable by default. That is, siVAE infers a loading matrix during training that maps contributions of input features (genes) to each axis of the visualization. I will illustrate how siVAE enables fast identification of genes that principally distinguish different cell types and states on different genomic data.
Gerald Quon’s Bio
Gerald Quon is an assistant professor in the Department of Molecular and Cellular Biology at UC Davis, and a member of the Genome Center and UC Davis Comprehensive Cancer Center. He obtained his Ph.D. in Computer Science from the University of Toronto, and completed postdoctoral training at MIT under the guidance of Manolis Kellis. His lab focuses on the development of AI and machine learning-based approaches to building quantitative models of cell state and gene regulation. Broad areas of research he is currently pursuing include (1) integrating transcriptomic and cellular phenotypic data to better understand how gene regulation impacts cellular phenotype; (2) finding recurring spatial patterns of gene expression and cellular organization from spatial transcriptomes; and (3) identifying rewiring of gene regulatory networks across species. His work is currently supported by grants from the Chan Zuckerberg Initiative and NSF.
Sushmita Roy | Inference of Regulatory Network Dynamics on Developmental Lineages
Project Abstract
Regulatory networks connect regulatory proteins (e.g., transcription factors and signaling proteins) to target genes and control what genes are expressed when translating the information encoded in an organism’s genome to context-specific responses. Identification of these networks is important to advance our understanding of many biological processes such as development, disease, response to stress, and evolution. In this talk, I will present computational methods to tackle a few key problems in understanding network dynamics on developmental lineages. Using these approaches we have derived useful insights about mammalian gene regulation including the identification of key regulators in host response and chromatin state dynamics during cell state transitions.
Sushmita Roy’s Bio
Sushmita Roy is an associate professor at the Biostatistics and Medical Informatics Department and a faculty at the Wisconsin Institute for Discovery, University of Wisconsin, Madison. Her research lies at the intersection of machine learning and network-based methods for tackling problems in regulatory genomics. Her group develops and applies computational methods for identifying regulatory networks that exist in living cells, examines their dynamics across different biological contexts and uses these networks to build network-based predictive models of global phenotypes. These approaches harness the increasingly available repertoires of high-throughput molecular measurements and are applicable to diverse yeast, plant and mammalian systems. She works closely with experimentalists who study a variety of biological processes ranging from infectious disease, cell fate specification, host microbe interactions, and evolution of tissue-specific gene expression that all have a shared goal to understand the underlying regulatory network. Dr. Roy is a recipient of an Alfred P. Sloan Foundation Fellowship, an NSF CAREER award, a UW Vilas Foundation Fellow, and a James McDonnell foundation scholar award.
Anshul Kundaje | Deep Learning the Regulatory Code of the Human Genome
Project Abstract
The human genome contains the fundamental code that defines the identity and function of all the cell types and tissues in the human body. Genes are functional sequence units that encode for proteins. But they account for just about 2% of the 3 billion long human genome sequence. What does the rest of the genome encode? How is gene activity controlled in each cell type? Where do the regulatory control elements lie and what is their sequence composition? How do variants and mutations in the genome sequence affect cellular function and disease? These are fundamental questions that remain largely unanswered. The regulatory code that controls gene activity consists of DNA words with complex syntax and grammar (akin to natural language) encoded within hierarchically organized units of regulatory elements. These syntactic units of functional DNA words are sparsely distributed across billions of nucleotides of genomic sequence and remain largely elusive. Deep learning has revolutionized our understanding of natural language, speech, and vision. We strongly believe it has the potential to revolutionize our understanding of the regulatory language of the genome. We have developed integrative supervised deep learning frameworks to learn how genomic sequence encodes millions of experimentally measured regulatory genomic events across 100s of cell types and tissues. We have developed novel methods to interpret our models and extract local and global predictive patterns revealing many insights into the regulatory code. We will demonstrate how we can use deep learning models as oracles and perform millions of in-silico experiments to reveal the regulatory code. Our models also allow us to predict the effects of natural and disease-associated genetic variation i.e. how differences in DNA sequence across healthy and diseased individuals are likely to affect molecular mechanisms associated with complex traits and diseases.
Anshul Kundaje’s Bio
Anshul Kundaje is an assistant professor of Genetics and Computer Science at Stanford University. The Kundaje lab develops statistical and machine learning methods for large-scale integrative analysis of functional genomic data to decode regulatory elements and pathways across diverse cell types and tissues and understand their role in cellular function and disease. Anshul completed his Ph.D. in Computer Science in 2008 from Columbia University. As a postdoc at Stanford University from 2008-2012 and a research scientist at MIT and the Broad Institute from 2012-2014, he led the integrative analysis efforts for two of the largest functional genomics consortia – The Encyclopedia of DNA Elements (ENCODE) and The Roadmap Epigenomics Project. Dr. Kundaje is a recipient of the 2019 Chen Award of Excellence from the Human Genome Organization, 2016 NIH Director’s New Innovator Award and The 2014 Alfred Sloan Foundation Fellowship. Anshul is also a member of the NIH Director’s Advisory Committee for Artificial Intelligence in Biomedical Research.