|2019-06-04||Cell lineage inference from SNP and scRNA-Seq data.||Ding J, Lin C, Bar-Joseph Z.||HIVE TC-CMU||Several recent studies focus on the inference of developmental and response trajectories from single cell RNA-Seq (scRNA-Seq) data. A number of computational methods, often referred to as pseudo-time ordering, have been developed for this task. Recently, CRISPR has also been used to reconstruct lineage trees by inserting random mutations. However, both approaches suffer from drawbacks that limit their use. Here, we develop a method to detect significant, cell type specific, sequence mutations from scRNA-Seq data. We show that only a few mutations are enough for reconstructing good branching models. Integrating these mutations with expression data further improves the accuracy of the reconstructed models. As we show, the majority of mutations we identify are likely RNA editing events indicating that such information can be used to distinguish cell types.|
|2019-05-20||SABER amplifies FISH: enhanced multiplexed imaging of RNA and DNA in cells and tissues.||Kishi JY, Lapan SW, Beliveau BJ, West ER, Zhu A, Sasaki HM, Saka SK, Wang Y, Cepko CL, Yin P.||TTD-Harvard||Fluorescence in situ hybridization (FISH) reveals the abundance and positioning of nucleic acid sequences in fixed samples. Despite recent advances in multiplexed amplification of FISH signals, it remains challenging to achieve high levels of simultaneous amplification and sequential detection with high sampling efficiency and simple workflows. Here we introduce signal amplification by exchange reaction (SABER), which endows oligonucleotide-based FISH probes with long, single-stranded DNA concatemers that aggregate a multitude of short complementary fluorescent imager strands. We show that SABER amplified RNA and DNA FISH signals (5- to 450-fold) in fixed cells and tissues. We also applied 17 orthogonal amplifiers against chromosomal targets simultaneously and detected mRNAs with high efficiency. We then used 10-plex SABER-FISH to identify in vivo introduced enhancers with cell-type-specific activity in the mouse retina. SABER represents a simple and versatile molecular toolkit for rapid and cost-effective multiplexed imaging of nucleic acid targets.|
|2019-04-30||Continuous State HMMs for Modeling Time Series Single Cell RNA-Seq Data.||Lin C, Bar-Joseph Z.||HIVE TC-CMU||MOTIVATION:
Methods for reconstructing developmental trajectories from time series single cell RNA-Seq (scRNA-Seq) data can be largely divided into two categories. The first, often referred to as pseudotime ordering methods, are deterministic and rely on dimensionality reduction followed by an ordering step. The second learns a probabilistic branching model to represent the developmental process. While both types have been successful, each suffers from shortcomings that can impact their accuracy.
We developed a new method based on continuous state HMMs (CSHMMs) for representing and modeling time series scRNA-Seq data. We define the CSHMM model and provide efficient learning and inference algorithms which allow the method to determine both the structure of the branching process and the assignment of cells to these branches. Analyzing several developmental single cell datasets we show that the CSHMM method accurately infers branching topology and correctly and continuously assign cells to paths, improving upon prior methods proposed for this task. Analysis of genes based on the continuous cell assignment identifies known and novel markers for different cell types.
Software and Supporting website: www.andrew.cmu.edu/user/chiehl1/CSHMM/.
Supplementary data are available at Bioinformatics online.
|2019-04-06||Imaging mass spectrometry enables molecular profiling of mouse and human pancreatic tissue.||Prentice BM, Hart NJ, Phillips N, Haliyur R, Judd A, Armandala R, Spraggins JM, Lowe CL, Boyd KL, Stein RW, Wright CV, Norris JL, Powers AC, Brissova M, Caprioli RM.||TMC-Vanderbilt||AIMS/HYPOTHESIS:
The molecular response and function of pancreatic islet cells during metabolic stress is a complex process. The anatomical location and small size of pancreatic islets coupled with current methodological limitations have prevented the achievement of a complete, coherent picture of the role that lipids and proteins play in cellular processes under normal conditions and in diseased states. Herein, we describe the development of untargeted tissue imaging mass spectrometry (IMS) technologies for the study of in situ protein and, more specifically, lipid distributions in murine and human pancreases.
We developed matrix-assisted laser desorption/ionisation (MALDI) IMS protocols to study metabolite, lipid and protein distributions in mouse (wild-type and ob/ob mouse models) and human pancreases. IMS allows for the facile discrimination of chemically similar lipid and metabolite isoforms that cannot be distinguished using standard immunohistochemical techniques. Co-registration of MS images with immunofluorescence images acquired from serial tissue sections allowed accurate cross-registration of cell types. By acquiring immunofluorescence images first, this serial section approach guides targeted high spatial resolution IMS analyses (down to 15 μm) of regions of interest and leads to reduced time requirements for data acquisition.
MALDI IMS enabled the molecular identification of specific phospholipid and glycolipid isoforms in pancreatic islets with intra-islet spatial resolution. This technology shows that subtle differences in the chemical structure of phospholipids can dramatically affect their distribution patterns and, presumably, cellular function within the islet and exocrine compartments of the pancreas (e.g. 18:1 vs 18:2 fatty acyl groups in phosphatidylcholine lipids). We also observed the localisation of specific GM3 ganglioside lipids [GM3(d34:1), GM3(d36:1), GM3(d38:1) and GM3(d40:1)] within murine islet cells that were correlated with a higher level of GM3 synthase as verified by immunostaining. However, in human pancreas, GM3 gangliosides were equally distributed in both the endocrine and exocrine tissue, with only one GM3 isoform showing islet-specific localisation.
The development of more complete molecular profiles of pancreatic tissue will provide important insight into the molecular state of the pancreas during islet development, normal function, and diseased states. For example, this study demonstrates that these results can provide novel insight into the potential signalling mechanisms involving phospholipids and glycolipids that would be difficult to detect by targeted methods, and can help raise new hypotheses about the types of physiological control exerted on endocrine hormone-producing cells in islets. Importantly, the in situ measurements afforded by IMS do not require a priori knowledge of molecules of interest and are not susceptible to the limitations of immunohistochemistry, providing the opportunity for novel biomarker discovery. Notably, the presence of multiple GM3 isoforms in mouse islets and the differential localisation of lipids in human tissue underscore the important role these molecules play in regulating insulin modulation and suggest species, organ, and cell specificity. This approach demonstrates the importance of both high spatial resolution and high molecular specificity to accurately survey the molecular composition of complex, multi-functional tissues such as the pancreas.
|2019-03-25||Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH.||Eng CL, Lawson M, Zhu Q, Dries R, Koulena N, Takei Y, Yun J, Cronin C, Karp C, Yuan GC, Cai L.||TTD-Cal Tech||Imaging the transcriptome in situ with high accuracy has been a major challenge in single-cell biology, which is particularly hindered by the limits of optical resolution and the density of transcripts in single cells. Here we demonstrate an evolution of sequential fluorescence in situ hybridization (seqFISH+). We show that seqFISH+ can image mRNAs for 10,000 genes in single cells-with high accuracy and sub-diffraction-limit resolution-in the cortex, subventricular zone and olfactory bulb of mouse brain, using a standard confocal microscope. The transcriptome-level profiling of seqFISH+ allows unbiased identification of cell classes and their spatial organization in tissues. In addition, seqFISH+ reveals subcellular mRNA localization patterns in cells and ligand-receptor pairs across neighbouring cells. This technology demonstrates the ability to generate spatial cell atlases and to perform discovery-driven studies of biological processes in situ.|
|2019-02-20||The single-cell transcriptional landscape of mammalian organogenesis||Junyue Cao, Malte Spielmann, Xiaojie Qiu, Xingfan Huang, Daniel M. Ibrahim, Andrew J. Hill, Fan Zhang, Stefan Mundlos, Lena Christiansen, Frank J. Steemers, Cole Trapnell & Jay Shendure||TMC-Cal Tech||Mammalian organogenesis is a remarkable process. Within a short timeframe, the cells of the three germ layers transform into an embryo that includes most of the major internal and external organs. Here we investigate the transcriptional dynamics of mouse organogenesis at single-cell resolution. Using single-cell combinatorial indexing, we profiled the transcriptomes of around 2 million cells derived from 61 embryos staged between 9.5 and 13.5 days of gestation, in a single experiment. The resulting ‘mouse organogenesis cell atlas’ (MOCA) provides a global view of developmental processes during this critical window. We use Monocle 3 to identify hundreds of cell types and 56 trajectories, many of which are detected only because of the depth of cellular coverage, and collectively define thousands of corresponding marker genes. We explore the dynamics of gene expression within cell types and trajectories over time, including focused analyses of the apical ectodermal ridge, limb mesenchyme and skeletal muscle.|
|2019-02-15||Dhaka: Variational Autoencoder for Unmasking Tumor Heterogeneity from Single Cell Genomic Data.||Rashid S, Shah S, Bar-Joseph Z, Pandya R.||HIVE TC-CMU||MOTIVATION:
Intra-tumor heterogeneity is one of the key confounding factors in deciphering tumor evolution. Malignant cells exhibit variations in their gene expression, copy numbers, and mutation even when originating from a single progenitor cell. Single cell sequencing of tumor cells has recently emerged as a viable option for unmasking the underlying tumor heterogeneity. However, extracting features from single cell genomic data in order to infer their evolutionary trajectory remains computationally challenging due to the extremely noisy and sparse nature of the data.
Here we describe 'Dhaka', a variational autoencoder method which transforms single cell genomic data to a reduced dimension feature space that is more efficient in differentiating between (hidden) tumor subpopulations. Our method is general and can be applied to several different types of genomic data including copy number variation from scDNA-Seq and gene expression from scRNA-Seq experiments. We tested the method on synthetic and 6 single cell cancer datasets where the number of cells ranges from 250 to 6000 for each sample. Analysis of the resulting feature space revealed subpopulations of cells and their marker genes. The features are also able to infer the lineage and/or differentiation trajectory between cells greatly improving upon prior methods suggested for feature extraction and dimensionality reduction of such data.
AVAILABILITY AND IMPLEMENTATION:
All the datasets used in the paper are publicly available and developed software package and supporting info is available on Github https://github.com/MicrosoftGenomics/Dhaka.
|2019-02-05||Data visualization literacy: Definitions, conceptual frameworks, exercises, and assessments||Katy Börner, Andreas Bueckle, and Michael Ginda||HIVE MC-IU||In the information age, the ability to read and construct data visualizations becomes as important as the ability to read and write text. However, while standard definitions and theoretical frameworks to teach and assess textual, mathematical, and visual literacy exist, current data visualization literacy (DVL) definitions and frameworks are not comprehensive enough to guide the design of DVL teaching and assessment. This paper introduces a data visualization literacy framework (DVL-FW) that was specifically developed to define, teach, and assess DVL. The holistic DVL-FW promotes both the reading and construction of data visualizations, a pairing analogous to that of both reading and writing in textual literacy and understanding and applying in mathematical literacy. Specifically, the DVL-FW defines a hierarchical typology of core concepts and details the process steps that are required to extract insights from data. Advancing the state of the art, the DVL-FW interlinks theoretical and procedural knowledge and showcases how both can be combined to design curricula and assessment measures for DVL. Earlier versions of the DVL-FW have been used to teach DVL to more than 8,500 residential and online students, and results from this effort have helped revise and validate the DVL-FW presented here.|
|2018-12-19||Cell Hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics||Marlon Stoeckius, Shiwei Zheng, Brian Houck-Loomis, Stephanie Hao, Bertrand Z. Yeung, William M. Mauck III, Peter Smibert, and Rahul Satija||HIVE MC-NYGC||Despite rapid developments in single cell sequencing, sample-specific batch effects, detection of cell multiplets, and experimental costs remain outstanding challenges. Here, we introduce Cell Hashing, where oligo-tagged antibodies against ubiquitously expressed surface proteins uniquely label cells from distinct samples, which can be subsequently pooled. By sequencing these tags alongside the cellular transcriptome, we can assign each cell to its original sample, robustly identify cross-sample multiplets, and “super-load” commercial droplet-based systems for significant cost reduction. We validate our approach using a complementary genetic approach and demonstrate how hashing can generalize the benefits of single cell multiplexing to diverse samples and experimental designs.|
|2018-12-10||Forecasting innovations in science, technology, and education||Katy Börner, William B. Rouse, Paul Trunfio, and H. Eugene Stanley||HIVE MC-IU||Human survival depends on our ability to predict future outcomes so that we can make informed decisions. Human cognition and perception are optimized for local, short-term decision-making, such as deciding when to fight or flight, whom to mate, or what to eat. For more elaborate decisions (e.g., when to harvest, when to go to war or not, and whom to marry), people used to consult oracles—prophetic predictions of the future inspired by the gods. Over time, oracles were replaced by models of the structure and dynamics of natural, technological, and social systems. In the 21st century, computational models and visualizations of model results inform much of our decision-making: near real-time weather forecasts help us decide when to take an umbrella, plant, or harvest; where to ground airplanes; or when to evacuate inhabitants in the path of a hurricane, tornado, or flood. Long-term weather and climate forecasts predict a future with increasing torrential rains, stronger winds, and more frequent drought, landslides, and forest fires as well as rising sea levels, enabling decision makers to prepare for these changes by building dikes, moving cities and roads, and building larger water reservoirs and better storm sewers.|
|2018-11-23||Protein identification strategies in MALDI imaging mass spectrometry: a brief review.||Ryan DJ, Spraggins JM, Caprioli RM.||TMC-Vanderbilt||Matrix assisted laser desorption/ionization (MALDI) imaging mass spectrometry (IMS) is a powerful technology used to investigate the spatial distributions of thousands of molecules throughout a tissue section from a single experiment. As proteins represent an important group of functional molecules in tissue and cells, the imaging of proteins has been an important point of focus in the development of IMS technologies and methods. Protein identification is crucial for the biological contextualization of molecular imaging data. However, gas-phase fragmentation efficiency of MALDI generated proteins presents significant challenges, making protein identification directly from tissue difficult. This review highlights methods and technologies specifically related to protein identification that have been developed to overcome these challenges in MALDI IMS experiments.|
|2018-10-29||Identification of spatially associated subpopulations by combining scRNAseq and sequential fluorescence in situ hybridization data.||Zhu Q, Shah S, Dries R, Cai L, Yuan GC.||TTD-Cal Tech||How intrinsic gene-regulatory networks interact with a cell's spatial environment to define its identity remains poorly understood. We developed an approach to distinguish between intrinsic and extrinsic effects on global gene expression by integrating analysis of sequencing-based and imaging-based single-cell transcriptomic profiles, using cross-platform cell type mapping combined with a hidden Markov random field model. We applied this approach to dissect the cell-type- and spatial-domain-associated heterogeneity in the mouse visual cortex region. Our analysis identified distinct spatially associated, cell-type-independent signatures in the glutamatergic and astrocyte cell compartments. Using these signatures to analyze single-cell RNA sequencing data, we identified previously unknown spatially associated subpopulations, which were validated by comparison with anatomical structures and Allen Brain Atlas images.|
|2018-06-28||Multiple TOF/TOF Events in a Single Laser Shot for Multiplexed Lipid Identifications in MALDI Imaging Mass Spectrometry.||Prentice BM, McMillen JC, Caprioli RM||TMC-Vanderbilt||Tandem mass spectrometry (MS/MS) is often used to identify lipids in matrix-assisted laser desorption/ionization imaging mass spectrometry (MALDI IMS) workflows. The molecular specificity afforded by MS/MS is crucial on MALDI time-of-flight (TOF) platforms that generally lack high resolution accurate mass measurement capabilities. Unfortunately, imaging MS/MS workflows generally only monitor a single precursor ion over the imaged area, limiting the throughput of this methodology. Herein, we demonstrate that multiple TOF/TOF events performed in each laser shot can be used to improve the throughput of imaging MS/MS. This is shown to enable the simultaneous identification of multiple phosphatidylcholine lipids in rat brain tissue. Uniquely, the separation in time achieved for the precursor ions in the TOF-1 region of the instrument is maintained for the fragment ions as they are analyzed in TOF-2, allowing for the differentiation of fragment ions of the exact same m/z derived from different precursor ions (e.g., the m/z 163 fragment ion from precursor ion m/z 772.5 is easily distinguished from the m/z 163 fragment ion from precursor ion m/z 826.5). This multiplexed imaging MS/MS approach allows for the acquisition of complete fragment ion spectra for multiple precursor ions per laser shot.|