Main Content
Research focus of the institute
Overview
The Chung lab uses experimental and computational approaches to study fundamental aspects of biological processes at the molecular level. We combine experimental methods, such as RNA-seq, ChIP-seq and HiC, with tailored bioinformatic/computational analyses to uncover mechanisms that enable tumor cells to evade the immune response. In addition to these biological questions we are interested in novel computational approaches for data integration, network reconstruction and dynamic modeling with limited data.
Projects
Chromosome conformation capture: EPI-seq
We developed a novel HiC approach, referred to as Enhancer-Promoter-Interaction sequencing (EPI-seq) to specifically probe enhancer-promoter interactions. This is achieved by combining the DNAse hypersensitivity assay with the in-situ HiC protocol. In this way, we obtain chromosome contact maps at close to base pair resolution with at the same time moderate sequencing costs.
Analysis of EPI-seq data
The ligation events uncovered by EPI-seq are not restricted to restriction enzyme cut sites but are concentrated at open chromatin regions (e.g. enhancers and promoters) at in principle base pair resolution. This new characteristics of EPI-seq data requires novel analysis avenues to profit from the enhanced resolution. We estimate the density at these maps via “Voronoi Tessallation” - an approach widely used in astronomy. We further developed approaches to account for the uneven distribution along the genome due to the enrichment of DNAse hypersensitive sites. In this way we obtain contacts of regions as small as 100 to 500 base pairs with distances >500 base pairs.
Single-cell RNA-seq
We are in the process of building a microfluidic device for droplet-based single-cell RNA-sequencing. We leverage on existing designs and made substantial improvements to maximize the number of cell-bead droplets.
Analysis of RNA-seq data
State-of-the-art approaches for the analysis of RNA-seq data use the negative binomial distribution to model the counts and their overdispersion from the expected Poissonian variance. Here, we propose to use the Dirichlet-Multinomial distribution instead. We derived models for differential gene expression analysis and blind deconvolution of cell types in samples with mixtures of cells (e.g. PBMCs). We implemented efficient ways to estimate the parameters of the models using TensorFlow, which allows now for running these algorithms on CPUs, GPUs and TPUs.
Deep Learning 2.0
Deep Learning has revolutionized the field of machine learning. However, it requires vast amounts of data to train its parameter-rich models to maintain the ability to generalize to test data. In the biomedical sciences we are often confronted with a situation, where we measure a lot of features for a sample (e.g. in RNA-seq we get typically expression measurements for 20,000 to 30,000 genes) but have only few samples. Thus, instead of “big data” we have “broad data”. This “broad data” situation requires special attention because it increases the uncertainty about the underlying processes that generated this data. In the future, we want to combine probability theory with algorithms and approaches from Deep Learning to learn internally consistent models, which exchange meaningful information about their model parameters.