scanpy.pp.filter_genes_dispersion#
- scanpy.pp.filter_genes_dispersion(data, *, flavor='seurat', min_disp=None, max_disp=None, min_mean=None, max_mean=None, n_bins=20, n_top_genes=None, log=True, subset=True, copy=False)[source]#
Extract highly variable genes [Satija et al., 2015, Zheng et al., 2017].
Deprecated since version 1.3.6: Use
highly_variable_genes()instead. The new function is equivalent to the present function, except thatthe new function always expects logarithmized data
subset=Falsein the new function, it suffices to merely annotate the genes, tools likepp.pcawill detect the annotationyou can now call:
sc.pl.highly_variable_genes(adata)copyis replaced byinplace
If trying out parameters, pass the data matrix instead of AnnData.
Depending on
flavor, this reproduces the R-implementations of Seurat [Satija et al., 2015] and Cell Ranger [Zheng et al., 2017].The normalized dispersion is obtained by scaling with the mean and standard deviation of the dispersions for genes falling into a given bin for mean expression of genes. This means that for each bin of mean expression, highly variable genes are selected.
Use
flavor='cell_ranger'with care and in the same way as inrecipe_zheng17().- Parameters:
- data
AnnData|csr_array|csc_array|csr_matrix|csc_matrix|ndarray The (annotated) data matrix of shape
n_obs×n_vars. Rows correspond to cells and columns to genes.- flavor
Literal['seurat','cell_ranger'] (default:'seurat') Choose the flavor for computing normalized dispersion. If choosing ‘seurat’, this expects non-logarithmized data – the logarithm of mean and dispersion is taken internally when
logis at its default valueTrue. For ‘cell_ranger’, this is usually called for logarithmized data – in this case you should setlogtoFalse. In their default workflows, Seurat passes the cutoffs whereas Cell Ranger passesn_top_genes.- min_mean
float|None(default:None) - max_mean
float|None(default:None) - min_disp
float|None(default:None) - max_disp
float|None(default:None) If
n_top_genesunequalsNone, these cutoffs for the means and the normalized dispersions are ignored.- n_bins
int(default:20) Number of bins for binning the mean gene expression. Normalization is done with respect to each bin. If just a single gene falls into a bin, the normalized dispersion is artificially set to 1. You’ll be informed about this if you set
settings.verbosity = 4.- n_top_genes
int|None(default:None) Number of highly-variable genes to keep.
- log
bool(default:True) Use the logarithm of the mean to variance ratio.
- subset
bool(default:True) Keep highly-variable genes only (if True) else write a bool array for h ighly-variable genes while keeping all genes
- copy
bool(default:False) If an
AnnDatais passed, determines whether a copy is returned.
- data
- Return type:
- Returns:
If an AnnData
adatais passed, returns or updatesadatadepending oncopy. It filters theadataand adds the annotations- meansadata.var
Means per gene. Logarithmized when
logisTrue.- dispersionsadata.var
Dispersions per gene. Logarithmized when
logisTrue.- dispersions_normadata.var
Normalized dispersions per gene. Logarithmized when
logisTrue.
If a data matrix
Xis passed, the annotation is returned asnp.recarraywith the same information stored in fields:gene_subset,means,dispersions,dispersion_norm.