scanpy.tl.rank_genes_groups#
- scanpy.tl.rank_genes_groups(adata, groupby, *, mask_var=None, use_raw=None, groups='all', reference='rest', n_genes=None, rankby_abs=False, pts=False, key_added=None, copy=False, method=None, corr_method='benjamini-hochberg', tie_correct=False, layer=None, **kwds)[source]#
- Rank genes for characterizing groups. - Expects logarithmized data. - Parameters:
- adata AnnData
- Annotated data matrix. 
- groupby str
- The key of the observations grouping to consider. 
- mask_var ndarray[tuple[int,...],dtype[bool]] |str|None(default:None)
- Select subset of genes to use in statistical tests. 
- use_raw bool|None(default:None)
- Use - rawattribute of- adataif present. The default behavior is to use- rawif present.
- layer str|None(default:None)
- Key from - adata.layerswhose value will be used to perform tests on.
- groups Union[Literal['all'],Iterable[str]] (default:'all')
- Subset of groups, e.g. [ - 'g1',- 'g2',- 'g3'], to which comparison shall be restricted, or- 'all'(default), for all groups. Note that if- reference='rest'all groups will still be used as the reference, not just those specified in- groups.
- reference str(default:'rest')
- If - 'rest', compare each group to the union of the rest of the group. If a group identifier, compare with respect to this group.
- n_genes int|None(default:None)
- The number of genes that appear in the returned tables. Defaults to all genes. 
- method Optional[Literal['logreg','t-test','wilcoxon','t-test_overestim_var']] (default:None)
- The default method is - 't-test',- 't-test_overestim_var'overestimates variance of each group,- 'wilcoxon'uses Wilcoxon rank-sum,- 'logreg'uses logistic regression. See Ntranos et al. [2019], here and here, for why this is meaningful.
- corr_method Literal['benjamini-hochberg','bonferroni'] (default:'benjamini-hochberg')
- p-value correction method. Used only for - 't-test',- 't-test_overestim_var', and- 'wilcoxon'.
- tie_correct bool(default:False)
- Use tie correction for - 'wilcoxon'scores. Used only for- 'wilcoxon'.
- rankby_abs bool(default:False)
- Rank genes by the absolute value of the score, not by the score. The returned scores are never the absolute values. 
- pts bool(default:False)
- Compute the fraction of cells expressing the genes. 
- key_added str|None(default:None)
- The key in - adata.unsinformation is saved to.
- copy bool(default:False)
- Whether to copy - adataor modify it inplace.
- kwds
- Are passed to test methods. Currently this affects only parameters that are passed to - sklearn.linear_model.LogisticRegression. For instance, you can pass- penalty='l1'to try to come up with a minimal set of genes that are good predictors (sparse solution meaning few non-zero fitted coefficients).
 
- adata 
- Return type:
- Returns:
- Returns - Noneif- copy=False, else returns an- AnnDataobject. Sets the following fields:- adata.uns['rank_genes_groups' | key_added]['names']structured- numpy.ndarray(dtype- object)
- Structured array to be indexed by group id storing the gene names. Ordered according to scores. 
- adata.uns['rank_genes_groups' | key_added]['scores']structured- numpy.ndarray(dtype- object)
- Structured array to be indexed by group id storing the z-score underlying the computation of a p-value for each gene for each group. Ordered according to scores. 
- adata.uns['rank_genes_groups' | key_added]['logfoldchanges']structured- numpy.ndarray(dtype- object)
- Structured array to be indexed by group id storing the log2 fold change for each gene for each group. Ordered according to scores. Only provided if method is ‘t-test’ like. Note: this is an approximation calculated from mean-log values. 
- adata.uns['rank_genes_groups' | key_added]['pvals']structured- numpy.ndarray(dtype- float)
- p-values. 
- adata.uns['rank_genes_groups' | key_added]['pvals_adj']structured- numpy.ndarray(dtype- float)
- Corrected p-values. 
- adata.uns['rank_genes_groups' | key_added]['pts']- pandas.DataFrame(dtype- float)
- Fraction of cells expressing the genes for each group. 
- adata.uns['rank_genes_groups' | key_added]['pts_rest']- pandas.DataFrame(dtype- float)
- Only if - referenceis set to- 'rest'. Fraction of cells from the union of the rest of each group expressing the genes.
 
 - Notes - There are slight inconsistencies depending on whether sparse or dense data are passed. See here. - Examples - >>> import scanpy as sc >>> adata = sc.datasets.pbmc68k_reduced() >>> sc.tl.rank_genes_groups(adata, "bulk_labels", method="wilcoxon") >>> # to visualize the results >>> sc.pl.rank_genes_groups(adata)