peach.tl.gene_associations

peach.tl.gene_associations#

peach.tl.gene_associations(adata, *, bin_prop=0.1, obsm_key='archetype_distances', obs_key='archetypes', use_layer=None, test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', test_direction='two-sided', min_logfc=0.01, min_cells=10, comparison_group='all', verbose=True, **kwargs)[source]#

Test gene expression associations with archetypal assignments.

Performs Mann-Whitney U tests to identify genes with significantly different expression between each archetype and all other cells (1-vs-all testing paradigm).

Parameters:

adata (AnnData) –
Annotated data object with:
- obsm[obsm_key] : Archetype distance matrix [n_cells, n_archetypes]
- obs[obs_key] : Archetype assignments (from bin_cells_by_archetype)
- X or layers[use_layer] : Gene expression data
bin_prop (float, default: 0.1) – Proportion of cells closest to each archetype to use for binning.
obsm_key (str, default: "archetype_distances") – Key in adata.obsm containing archetype distance matrix.
obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.
use_layer (str | None, default: None) – Layer for gene expression. If None, uses adata.X. Auto-selects ‘logcounts’ or ‘log1p’ if available.
test_method (str, default: "mannwhitneyu") – Statistical test method. Currently supports ‘mannwhitneyu’.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method: ‘benjamini_hochberg’ or ‘bonferroni’.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') –
Scope of FDR correction:
- 'global' : Correct across all tests (most stringent)
- 'per_archetype' : Correct within each archetype
- 'none' : No FDR correction (raw p-values)
test_direction (str, default: "two-sided") – Direction of statistical test: ‘two-sided’, ‘greater’, or ‘less’.
min_logfc (float, default: 0.01) – Minimum absolute log fold change threshold for filtering.
min_cells (int, default: 10) – Minimum cells required per archetype for testing.
comparison_group (str, default: 'all') –
Comparison group for statistical tests:
- 'all' : Compare archetype cells vs ALL other cells
- 'archetypes_only' : Compare vs cells in other archetypes only (excludes archetype_0 and unassigned cells)
verbose (bool, default: True) – Whether to print progress messages.

Returns:

Results with columns:

gene : str - Gene symbol/identifier
archetype : str - Archetype identifier
n_archetype_cells : int - Cells in archetype
n_other_cells : int - Cells in comparison group
mean_archetype : float - Mean expression in archetype
mean_other : float - Mean expression in others
log_fold_change : float - Log fold change
statistic : float - Mann-Whitney U statistic
pvalue : float - Raw p-value
fdr_pvalue : float - FDR-corrected p-value
significant : bool - Whether FDR < 0.05
direction : str - ‘higher’ or ‘lower’ in archetype

Return type:

pd.DataFrame

Raises:

ValueError – If required keys not found in adata.

Examples

>>> # Basic usage
>>> results = pc.tl.gene_associations(adata)
>>> sig_genes = results[results.significant]
>>> # Per-archetype FDR correction (less stringent)
>>> results = pc.tl.gene_associations(adata, fdr_scope="per_archetype")
>>> # Top markers per archetype
>>> for arch in results["archetype"].unique():
...     arch_genes = results[
...         (results["archetype"] == arch) & (results["significant"]) & (results["direction"] == "higher")
...     ].nlargest(10, "log_fold_change")
...     print(f"{arch}: {arch_genes['gene'].tolist()}")

peach.tl.gene_associations

Contents

peach.tl.gene_associations#