Professor Sandrine Dudoit presents her work “Single-Cell Transcriptomics: Questions and Learning from Data”

February 26, 2024

During a recent BIDS seminarProfessor Sandrine Dudoit (BIDS Faculty Council Member and CDSS Associate Dean for Faculty and Research) provided a survey of statistical questions related to the analysis of single-cell RNA-Seq data to investigate the differentiation of stem cells in the brain, including, exploratory data analysis, dimensionality reduction, normalization, expression quantitation, cluster analysis, and the inference of cellular lineages.

She presented analyses for two studies from her collaboration with the Ngai Lab which examined the differentiation of horizontal basal cells (HBC) in the mouse olfactory epithelium. Potential applications of this work include the prevention and treatment of neural tissue damage and degeneration (e.g., Alzheimer’s disease). Each study used single-cell RNA-Seq to measure genome-wide expression levels at the resolution of single cells (about 700 cells in the first study and 25K in the second, and thousands of genes in each).

Seminar with Professor Sandrine Dudoit

A significant part of the project involved exploratory data analysis (EDA) and normalization. Professor Dudoit noted the importance of EDA to get an understanding of the “good” features (biological signal) in the data as well as the “bad” features (unwanted technical effects), and of normalization to ensure that observed differences in expression measures between genes or samples reflect biological effects of interest and not unwanted technical effects. She presented the scone normalization framework, which provides a range of normalization procedures and performance measures to guide the selection of an appropriate normalization for a given dataset.

 Normalization scone

Professor Dudoit discussed how the Slingshot lineage inference method revealed that sustentacular cells are produced via direct conversion of horizontal basal cells, whereas microvillous and neuronal cells require an intermediate, proliferative state. She also described how tradeSeq can leverage the continuous aspect of cellular trajectories to identify genes that are differentially expressed either within a lineage or between lineages. Addressing the use of AI in this space, she pointed out that, no matter how clever the tool is, if you give it bad data it will lead to poor results: “Garbage in, garbage out”.

The methods developed in Professor Dudoit’s group are implemented in open-source R software packages released through the Bioconductor Project.