Preprocessing: pp#
Filtering of highly-variable genes, batch-effect correction, per-cell normalization, preprocessing recipes.
Any transformation of the data matrix that is not a tool. Other than tools, preprocessing steps usually don’t return an easily interpretable annotation, but perform a basic transformation on the data matrix.
Basic Preprocessing#
For visual quality control, see highest_expr_genes() and
filter_genes_dispersion() in scanpy.pl.
| Calculate quality control metrics. | |
| Filter cell outliers based on counts and numbers of genes expressed. | |
| Filter genes based on number of cells or counts. | |
| Annotate highly variable genes [Satija et al., 2015, Stuart et al., 2019, Zheng et al., 2017]. | |
| Logarithmize the data matrix. | |
| Principal component analysis [Pedregosa et al., 2011]. | |
| Normalize counts per cell. | |
| Regress out (mostly) unwanted sources of variation. | |
| Scale data to unit variance and zero mean. | |
| Sample observations or variables with or without replacement. | |
| Downsample counts from count matrix. | 
Recipes#
| Normalize and filter as of Zheng et al. [2017]. | |
| Normalize and filter as of [Weinreb et al., 2017]. | |
| Normalize and filter as of Seurat [Satija et al., 2015]. | 
Batch effect correction#
Also see Data integration. Note that a simple batch correction method is available via pp.regress_out(). Checkout scanpy.external for more.
| ComBat function for batch effect correction [Johnson et al., 2006, Leek et al., 2017, Pedersen, 2012]. | 
Doublet detection#
| Predict doublets using Scrublet [Wolock et al., 2019]. | |
| Simulate doublets by adding the counts of random observed transcriptome pairs. | 
Neighbors#
| Compute the nearest neighbors distance matrix and a neighborhood graph of observations [McInnes et al., 2018]. |