scanpy.pp.filter_cells#
- scanpy.pp.filter_cells(data, *, min_counts=None, min_genes=None, max_counts=None, max_genes=None, inplace=True, copy=False)[source]#
Filter cell outliers based on counts and numbers of genes expressed.
For instance, only keep cells with at least
min_countscounts ormin_genesgenes expressed. This is to filter measurement outliers, i.e. “unreliable” observations.Unless you use a
scanpy.settings.preset, only provide one of the optional parametersmin_counts,min_genes,max_counts,max_genesper call.- Parameters:
- data
AnnData|csr_array|csc_array|csr_matrix|csc_matrix|ndarray|Array The (annotated) data matrix of shape
n_obs×n_vars. Rows correspond to cells and columns to genes.- min_counts
int|None(default:None) Minimum number of counts required for a cell to pass filtering.
- min_genes
int|None(default:None) Minimum number of genes expressed required for a cell to pass filtering.
- max_counts
int|None(default:None) Maximum number of counts required for a cell to pass filtering.
- max_genes
int|None(default:None) Maximum number of genes expressed required for a cell to pass filtering.
- inplace
bool(default:True) Perform computation inplace or return result.
- data
- Return type:
- Returns:
Depending on
inplace, returns the following arrays or directly subsets and annotates the data matrix:
Examples
>>> import scanpy as sc >>> adata = sc.datasets.krumsiek11() UserWarning: Observation names are not unique. To make them unique, call `.obs_names_make_unique`. utils.warn_names_duplicates("obs") >>> adata.obs_names_make_unique() >>> adata.n_obs 640 >>> adata.var_names.tolist() ['Gata2', 'Gata1', 'Fog1', 'EKLF', 'Fli1', 'SCL', 'Cebpa', 'Pu.1', 'cJun', 'EgrNab', 'Gfi1'] >>> # add some true zeros >>> adata.X[adata.X < 0.3] = 0 >>> # simply compute the number of genes per cell >>> sc.pp.filter_cells(adata, min_genes=0) >>> adata.n_obs 640 >>> int(adata.obs["n_genes"].min()) 1 >>> # filter manually >>> adata_copy = adata[adata.obs["n_genes"] >= 3] >>> adata_copy.n_obs 554 >>> int(adata_copy.obs["n_genes"].min()) 3 >>> # actually do some filtering >>> sc.pp.filter_cells(adata, min_genes=3) >>> adata.n_obs 554 >>> int(adata.obs["n_genes"].min()) 3