peach.tl.gene_associations

peach.tl.gene_associations#

peach.tl.gene_associations(adata, *, bin_prop=0.1, obsm_key='archetype_distances', obs_key='archetypes', use_layer=None, test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', test_direction='two-sided', min_logfc=0.01, min_cells=10, comparison_group='all', verbose=True, **kwargs)[source]#

Test gene expression associations with archetypal assignments.

Performs Mann-Whitney U tests to identify genes with significantly different expression between each archetype and all other cells (1-vs-all testing paradigm).

Parameters:
  • adata (AnnData) –

    Annotated data object with:

    • obsm[obsm_key] : Archetype distance matrix [n_cells, n_archetypes]

    • obs[obs_key] : Archetype assignments (from bin_cells_by_archetype)

    • X or layers[use_layer] : Gene expression data

  • bin_prop (float, default: 0.1) – Proportion of cells closest to each archetype to use for binning.

  • obsm_key (str, default: "archetype_distances") – Key in adata.obsm containing archetype distance matrix.

  • obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.

  • use_layer (str | None, default: None) – Layer for gene expression. If None, uses adata.X. Auto-selects ‘logcounts’ or ‘log1p’ if available.

  • test_method (str, default: "mannwhitneyu") – Statistical test method. Currently supports ‘mannwhitneyu’.

  • fdr_method (str, default: "benjamini_hochberg") – FDR correction method: ‘benjamini_hochberg’ or ‘bonferroni’.

  • fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') –

    Scope of FDR correction:

    • 'global' : Correct across all tests (most stringent)

    • 'per_archetype' : Correct within each archetype

    • 'none' : No FDR correction (raw p-values)

  • test_direction (str, default: "two-sided") – Direction of statistical test: ‘two-sided’, ‘greater’, or ‘less’.

  • min_logfc (float, default: 0.01) – Minimum absolute log fold change threshold for filtering.

  • min_cells (int, default: 10) – Minimum cells required per archetype for testing.

  • comparison_group (str, default: 'all') –

    Comparison group for statistical tests:

    • 'all' : Compare archetype cells vs ALL other cells

    • 'archetypes_only' : Compare vs cells in other archetypes only (excludes archetype_0 and unassigned cells)

  • verbose (bool, default: True) – Whether to print progress messages.

Returns:

Results with columns:

  • gene : str - Gene symbol/identifier

  • archetype : str - Archetype identifier

  • n_archetype_cells : int - Cells in archetype

  • n_other_cells : int - Cells in comparison group

  • mean_archetype : float - Mean expression in archetype

  • mean_other : float - Mean expression in others

  • log_fold_change : float - Log fold change

  • statistic : float - Mann-Whitney U statistic

  • pvalue : float - Raw p-value

  • fdr_pvalue : float - FDR-corrected p-value

  • significant : bool - Whether FDR < 0.05

  • direction : str - ‘higher’ or ‘lower’ in archetype

Return type:

pd.DataFrame

Raises:

ValueError – If required keys not found in adata.

Examples

>>> # Basic usage
>>> results = pc.tl.gene_associations(adata)
>>> sig_genes = results[results.significant]
>>> # Per-archetype FDR correction (less stringent)
>>> results = pc.tl.gene_associations(adata, fdr_scope="per_archetype")
>>> # Top markers per archetype
>>> for arch in results["archetype"].unique():
...     arch_genes = results[
...         (results["archetype"] == arch) & (results["significant"]) & (results["direction"] == "higher")
...     ].nlargest(10, "log_fold_change")
...     print(f"{arch}: {arch_genes['gene'].tolist()}")

See also

peach.tl.pathway_associations

Pathway-level testing

peach.tl.pattern_analysis

Comprehensive pattern analysis

peach._core.types.GeneAssociationResult

Result row structure