Single-cell transcriptome sequencing (scRNA-Seq), which combines high-throughput single-cell extraction and sequencing capabilities, enables the transcriptome of large numbers of individual cells to be assayed efficiently. Profiling of gene expression at the single-cell level is crucial for addressing many biologically relevant questions, such as the investigation of rare cell types or primary cells (e.g., early development, where each of a small number of cells may have a distinct function) and the identification of subpopulations of cells from a larger heterogeneous population (e.g., discovering cell types in brain tissues). scRNA-Seq assays generate large datasets and involve inference for high-dimensional multivariate distributions with complex and unknown dependence structures among variables.
I will discuss some of the statistical analysis issues that have arisen in the context of a collaboration funded by the Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative, with the aim of classifying neuronal cells in the mouse somatosensory cortex. These issues, ranging from so-called low-level to high-level analyses, include exploratory data analysis (EDA) for quality assessment/control (QA/QC) of scRNA-Seq reads, normalization to account for nuisance technical effects, cluster analysis to identify novel cell types, and differential expression analysis to derive gene expression signatures for the cell types.
Sandrine Dudoit is professor of biostatistics and statistics and chair of the graduate group in biostatistics at the University of California, Berkeley. Professor Dudoit's methodological research interests regard high-dimensional inference and include exploratory data analysis, visualization, loss-based estimation with cross-validation (e.g., density estimation, regression, model selection), and multiple hypothesis testing. Much of her methodological work is motivated by statistical inference questions arising in biological research and, in particular, the design and analysis of high-throughput microarray and sequencing gene expression experiments, for example, mRNA-Seq for transcriptome analysis and genome annotation and ChIP-Seq for DNA-protein interaction profiling (e.g., transcription factor binding). Her contributions include exploratory data analysis, normalization and expression quantitation, differential expression analysis, class discovery, prediction, integration of biological annotation metadata (e.g., gene ontology annotation). She is also interested in statistical computing and, in particular, reproducible research. She is a founding core developer of the Bioconductor Project, an open source and open development software project for the analysis of biomedical and genomic data.
Professor Dudoit is a coauthor of the book Multiple Testing Procedures with Applications to Genomics and a coeditor of the book Bioinformatics and Computational Biology Solutions Using R and Bioconductor. She is associate editor of three journals, including The Annals of Applied Statistics and IEEE/ACM Transactions on Computational Biology and Bioinformatics. Professor Dudoit was named fellow of the American Statistical Association in 2010 and elected member of the International Statistical Institute in 2014.