Prof. Kim, Jong Kyoung
Homepage: Laboratory of Single-cell Genomics
Our research interests are in using and developing computational methods for the analysis of single-cell genomic data to elucidate cell-to-cell heterogeneity within a population of cells with the ultimate goal of understanding how gene expression levels are regulated. Recent technological developments for single-cell DNA and RNA sequencing in conjunction with both microfluidic and combinatorial barcoding approaches allow genomes and transcriptomes from tens of thousands of single cells to be assayed. Despite the exponential increase in the amount of single-cell data, the computational tools necessary to achieve robust biological findings are still either undeveloped or in their infancy.
Our ultimate research goal is to understand how genetic, epigenetic, environmental, and stochastic variation regulate the relationship between genotype and phenotype at the single-cell level. Our group with strong statistical and computational backgrounds collaborates with outstanding empirical groups to provide novel solutions to complex biological problems. We will help design experiments properly thus isolating biological variables of interest, utilize and develop computational methods analyzing high-throughput genomic data, and assess whether a biological question of interest and its related hypothesis are sensible for the data.
Our group will provide a framework to predict phenotype from genotype by understanding the roles of genetic, epigenetic, environmental, and stochastic variation in single cells, thus facilitating insights into the etiology and development of cancer. We believe that synergistic collaborations with empirical laboratories could provide novel solutions to complex problems across many areas of biology: development, immunology, neurobiology, cancer, gene regulation and epigenetics.
1.Developing a comprehensive suite of computational tools for analyzing single-cell sequencing data
Single-cell sequencing methods now enable the profiling of tens of thousands of cells in parallel, providing an unbiased view of cell-to-cell variability in multiple molecular readouts within the population of cells. The key experimental challenge in single-cell sequencing is to combine transcriptomic data with other genomic information from the same single cell in parallel. As the spatial information of cells in a tissue and the micro-environmental niche are critical for determining cellular function and identity, methods preserving the spatial context of cells in a tissue are rapidly developing. Furthermore, combining the transcriptome with other single cell genomic and epigenomic data will provide great insight into the regulation of gene expression.
We will develop a comprehensive suite of computational tools to infer the kinetics of stochastic gene expression, to classify cell types and identify rare cell types within the heterogeneous population of cells, to integrate diverse genomic information from the same cell, and to study the spatio-temporal molecular networks of various biological systems at the single-cell level. To facilitate such analyses separating biological variability from the high level of technical noise that affects all single-cell sequencing protocols is critical. Probabilistic approaches to machine learning will provide the main framework for combining many forms of uncertainty in single-cell data, encompassing the modelling of technical noise using extrinsically spiked-in molecules, the integration of multiple genomic information from the same cell, and the identification of hidden structure underlying data.
2.Dissecting intra-tumour heterogeneity
We plan to integrate the methods outlined above with high-throughput screening of CRISPR/Cas9 libraries. Combining single-cell RNA-sequencing with high-throughput screening of CRISPR/Cas9 libraries will allow the effect of genetic mutations upon variability in gene expression levels to be determined. This approach has numerous practical applications. For example, in the context of cancer, it can be used to 1) understand non-genetic anti-cancer drug resistance that arises from epigenetic heterogeneity between cancer cells, and 2) characterize molecular signatures of cancer stem cells from cancer organoids and developing diagnostic tools for cancer early detection
- T. Ilicic*, J. K. Kim*, F. O. Bagger, D. McCarthy, A. A. Kolodziejczyk, J. C. Marioni, S. A. Teichmann, Classication of low quality cells from single cell RNA-seq data, Genome Biology, 17:29, 2016.
- J. K. Kim, A. A. Kolodziejczyk, T. Ilicic, S. A. Teichmann, J. C. Marioni, Characterizing noise structure in single-cell RNA-seq distinguishes genuine from technical stochastic allelic expression, Nature Communications, 6:8687, 2015.
- A. A. Kolodziejczyk*, J. K. Kim*, J. C. Tsang, T. Ilicic, J. Henriksson, K. N. Natarajan, A. C. Tuck, X. Gao, M. Buhler, P. Liu, J. C. Marioni, S. A. Teichmann, Single cell RNA-Sequencing of pluripotent states unlocks modular transcriptional variation, Cell Stem Cell, 17:471-485, 2015.
- P. Brennecke*, S. Anders*, J. K. Kim*, A. A. Kolodziejczyk, X. Zhang, V. Proserpio, B. Baying, V. Benes, S. A. Teichmann, J. C. Marioni, and M. G. Heisler, Accounting for technical noise in single-cell RNA-seq experiments, Nature Methods, 10(11):1093-1095, 2013.
- J. K. Kim and J. C. Marioni, Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data, Genome Biology, 14(1):R7, 2013.
- J. K. Kim and S. Choi, Clustering sequence sets for motif discovery, in Advances in Neural Information Processing Systems (NIPS-2009), Vancouver, Canada, December 7-10, 2009.
- D. W. Lee*, J. K. Kim*, S. Lee, S. Choi, S. Kim, and I. Hwang, Arabidopsis nuclear-encoded plastid transit peptides contain multiple sequence subgroups with distinctive chloroplast-targeting sequence motifs, Plant Cell, 20(6):1603-1622, 2008.