scanpy.experimental.pp.normalize_pearson_residuals_pca#
- scanpy.experimental.pp.normalize_pearson_residuals_pca(adata, *, theta=100, clip=None, n_comps=50, random_state=0, kwargs_pca=mappingproxy({}), mask_var=_empty, use_highly_variable=None, check_values=True, inplace=True)[source]#
- Apply analytic Pearson residual normalization and PCA, based on Lause et al. [2021]. - The residuals are based on a negative binomial offset model with overdispersion - thetashared across genes. By default, residuals are clipped to- sqrt(n_obs), overdispersion- theta=100is used, and PCA is run with 50 components.- Operates on the subset of highly variable genes in - adata.var['highly_variable']by default. Expects raw count input.- Parameters:
- adata AnnData
- The annotated data matrix of shape - n_obs×- n_vars. Rows correspond to cells and columns to genes.
- theta float(default:100)
- The negative binomial overdispersion parameter - thetafor Pearson residuals. Higher values correspond to less overdispersion (- var = mean + mean^2/theta), and- theta=np.infcorresponds to a Poisson model.
- clip float|None(default:None)
- Determines if and how residuals are clipped: - If - None, residuals are clipped to the interval- [-sqrt(n_obs), sqrt(n_obs)], where- n_obsis the number of cells in the dataset (default behavior).
- If any scalar - c, residuals are clipped to the interval- [-c, c]. Set- clip=np.inffor no clipping.
 
- n_comps int|None(default:50)
- Number of principal components to compute in the PCA step. 
- random_state float(default:0)
- Random seed for setting the initial states for the optimization in the PCA step. 
- kwargs_pca Mapping[str,Any] (default:mappingproxy({}))
- Dictionary of further keyword arguments passed on to - scanpy.pp.pca().
- mask_var ndarray|str|None|Empty(default:_empty)
- To run only on a certain set of genes given by a boolean array or a string referring to an array in - var. By default, uses- .var['highly_variable']if available, else everything.
- use_highly_variable bool|None(default:None)
- Whether to use highly variable genes only, stored in - .var['highly_variable']. By default uses them if they have been determined beforehand.- Deprecated since version 1.10.0: Use - mask_varinstead
- check_values bool(default:True)
- If - True, checks if counts in selected layer are integers as expected by this function, and return a warning if non-integers are found. Otherwise, proceed without checking. Setting this to- Falsecan speed up code for large datasets.
- inplace bool(default:True)
- If - True, update- adatawith results. Otherwise, return results. See below for details of what is returned.
 
- adata 
- Return type:
- Returns:
- If - inplace=False, returns the Pearson residual-based PCA results (as- AnnDataobject). If- inplace=True, updates- adatawith the following fields:- .uns['pearson_residuals_normalization']['pearson_residuals_df']
- The subset of highly variable genes, normalized by Pearson residuals. 
- .uns['pearson_residuals_normalization']['theta']
- The used value of the overdisperion parameter theta. 
- .uns['pearson_residuals_normalization']['clip']
- The used value of the clipping parameter. 
- .obsm['X_pca']
- PCA representation of data after gene selection (if applicable) and Pearson residual normalization. 
- .varm['PCs']
- The principal components containing the loadings. When - inplace=Trueand- use_highly_variable=True, this will contain empty rows for the genes not selected.
- .uns['pca']['variance_ratio']
- Ratio of explained variance. 
- .uns['pca']['variance']
- Explained variance, equivalent to the eigenvalues of the covariance matrix.