scanpy.external.pp.mnn_correct#
- scanpy.external.pp.mnn_correct(*datas, var_index=None, var_subset=None, batch_key='batch', index_unique='-', batch_categories=None, k=20, sigma=1.0, cos_norm_in=True, cos_norm_out=True, svd_dim=None, var_adj=True, compute_angle=False, mnn_order=None, svd_mode='rsvd', do_concatenate=True, save_raw=False, n_jobs=None, **kwargs)[source]#
- Correct batch effects by matching mutual nearest neighbors [Haghverdi et al., 2018] [Kang, 2018]. - This uses the implementation of mnnpy [Kang, 2018]. - Depending on - do_concatenate, returns matrices or- AnnDataobjects in the original order containing corrected expression values or a concatenated matrix or AnnData object.- Be reminded that it is not advised to use the corrected data matrices for differential expression testing. - More information and bug reports here. - Parameters:
- datas AnnData|ndarray
- Expression matrices or AnnData objects. Matrices should be shaped like n_obs × n_vars (n_cell × n_gene) and have consistent number of columns. AnnData objects should have same number of variables. 
- var_index Collection[str] |None(default:None)
- The index (list of str) of vars (genes). Necessary when using only a subset of vars to perform MNN correction, and should be supplied with - var_subset. When- datasare AnnData objects,- var_indexis ignored.
- var_subset Collection[str] |None(default:None)
- The subset of vars (list of str) to be used when performing MNN correction. Typically, a list of highly variable genes (HVGs). When set to - None, uses all vars.
- batch_key str(default:'batch')
- The - batch_keyfor- concatenate(). Only valid when- do_concatenateand supplying- AnnDataobjects.
- index_unique str(default:'-')
- The - index_uniquefor- concatenate(). Only valid when- do_concatenateand supplying- AnnDataobjects.
- batch_categories Collection[Any] |None(default:None)
- The - batch_categoriesfor- concatenate(). Only valid when- do_concatenateand supplying AnnData objects.
- k int(default:20)
- Number of mutual nearest neighbors. 
- sigma float(default:1.0)
- The bandwidth of the Gaussian smoothing kernel used to compute the correction vectors. Default is 1. 
- cos_norm_in bool(default:True)
- Whether cosine normalization should be performed on the input data prior to calculating distances between cells. 
- cos_norm_out bool(default:True)
- Whether cosine normalization should be performed prior to computing corrected expression values. 
- svd_dim int|None(default:None)
- The number of dimensions to use for summarizing biological substructure within each batch. If None, biological components will not be removed from the correction vectors. 
- var_adj bool(default:True)
- Whether to adjust variance of the correction vectors. Note this step takes most computing time. 
- compute_angle bool(default:False)
- Whether to compute the angle between each cell’s correction vector and the biological subspace of the reference batch. 
- mnn_order Sequence[int] |None(default:None)
- The order in which batches are to be corrected. When set to None, datas are corrected sequentially. 
- svd_mode Literal['svd','rsvd','irlb'] (default:'rsvd')
- 'svd'computes SVD using a non-randomized SVD-via-ID algorithm, while- 'rsvd'uses a randomized version.- 'irlb'perfores truncated SVD by implicitly restarted Lanczos bidiagonalization (forked from airysen/irlbpy).
- do_concatenate bool(default:True)
- Whether to concatenate the corrected matrices or AnnData objects. Default is True. 
- save_raw bool(default:False)
- Whether to save the original expression data in the - rawattribute.
- n_jobs int|None(default:None)
- The number of jobs. When set to - None, automatically uses- scanpy.settings.n_jobs.
- kwargs
- optional keyword arguments for irlb. 
 
- datas 
- Return type:
- tuple[- ndarray|- AnnData,- list[- DataFrame],- list[- tuple[- float|- None,- int]] |- None]
- Returns:
- datasndarray|AnnData
- Corrected matrix/matrices or AnnData object/objects, depending on the input type and - do_concatenate.
- mnn_listlist[DataFrame]
- A list containing MNN pairing information as DataFrames in each iteration step. 
- angle_listlist[tuple[float|None,int]] |None
- A list containing angles of each batch. 
 
- datas