peach.tl.statistical#

Statistical Testing for Archetypal Analysis#

User-facing API for statistical analysis of archetype-feature associations.

This module provides comprehensive statistical testing tools for characterizing archetypes by their gene expression, pathway activity, and metadata associations. All functions implement robust statistical methods with proper multiple testing correction.

Main Functions#

gene_associations

Mann-Whitney U tests for gene-archetype associations

pathway_associations

Pathway activity testing for archetype characterization

conditional_associations

Hypergeometric tests for metadata enrichment

pattern_analysis

Comprehensive archetypal pattern analysis

archetype_exclusive_patterns

Identify features exclusively high in single archetypes

specialization_patterns

Compare archetypes to centroid (archetype_0)

tradeoff_patterns

Identify mutual exclusivity between archetypes

Type Definitions#

See peach._core.types for Pydantic models:

  • GeneAssociationResult : gene_associations() row structure

  • PathwayAssociationResult : pathway_associations() row structure

  • ConditionalAssociationResult : conditional_associations() row structure

  • PatternAssociationResult : Pattern test result structure

  • ExclusivePatternResult : Exclusive pattern result structure

  • ComprehensivePatternResults : pattern_analysis() return structure

Examples

>>> import peach as pc
>>> # Gene associations
>>> gene_results = pc.tl.gene_associations(adata)
>>> sig_genes = gene_results[gene_results.significant]
>>> # Pathway associations
>>> pathway_results = pc.tl.pathway_associations(adata)
>>> # Comprehensive pattern analysis
>>> patterns = pc.tl.pattern_analysis(adata)
>>> specialists = patterns["patterns"][patterns["patterns"]["pattern_type"] == "specialization"]

See also

peach._core.utils.statistical_tests

Core implementation

peach._core.types

Type definitions

Functions

archetype_exclusive_patterns(adata, *[, ...])

Identify features exclusively high in single archetypes.

conditional_associations(adata, *, obs_column)

Test associations between archetypes and categorical metadata.

gene_associations(adata, *[, bin_prop, ...])

Test gene expression associations with archetypal assignments.

pathway_associations(adata, *[, ...])

Test pathway activity associations with archetypal assignments.

pattern_analysis(adata, *[, data_obsm_key, ...])

Comprehensive archetypal pattern analysis.

specialization_patterns(adata, *[, ...])

Identify specialization features relative to centroid archetype.

tradeoff_patterns(adata, *[, data_obsm_key, ...])

Identify mutual exclusivity and tradeoff patterns.

peach.tl.statistical.gene_associations(adata, *, bin_prop=0.1, obsm_key='archetype_distances', obs_key='archetypes', use_layer=None, test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', test_direction='two-sided', min_logfc=0.01, min_cells=10, comparison_group='all', verbose=True, **kwargs)[source]#

Test gene expression associations with archetypal assignments.

Performs Mann-Whitney U tests to identify genes with significantly different expression between each archetype and all other cells (1-vs-all testing paradigm).

Parameters:
  • adata (AnnData) –

    Annotated data object with:

    • obsm[obsm_key] : Archetype distance matrix [n_cells, n_archetypes]

    • obs[obs_key] : Archetype assignments (from bin_cells_by_archetype)

    • X or layers[use_layer] : Gene expression data

  • bin_prop (float, default: 0.1) – Proportion of cells closest to each archetype to use for binning.

  • obsm_key (str, default: "archetype_distances") – Key in adata.obsm containing archetype distance matrix.

  • obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.

  • use_layer (str | None, default: None) – Layer for gene expression. If None, uses adata.X. Auto-selects ‘logcounts’ or ‘log1p’ if available.

  • test_method (str, default: "mannwhitneyu") – Statistical test method. Currently supports ‘mannwhitneyu’.

  • fdr_method (str, default: "benjamini_hochberg") – FDR correction method: ‘benjamini_hochberg’ or ‘bonferroni’.

  • fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') –

    Scope of FDR correction:

    • 'global' : Correct across all tests (most stringent)

    • 'per_archetype' : Correct within each archetype

    • 'none' : No FDR correction (raw p-values)

  • test_direction (str, default: "two-sided") – Direction of statistical test: ‘two-sided’, ‘greater’, or ‘less’.

  • min_logfc (float, default: 0.01) – Minimum absolute log fold change threshold for filtering.

  • min_cells (int, default: 10) – Minimum cells required per archetype for testing.

  • comparison_group (str, default: 'all') –

    Comparison group for statistical tests:

    • 'all' : Compare archetype cells vs ALL other cells

    • 'archetypes_only' : Compare vs cells in other archetypes only (excludes archetype_0 and unassigned cells)

  • verbose (bool, default: True) – Whether to print progress messages.

Returns:

Results with columns:

  • gene : str - Gene symbol/identifier

  • archetype : str - Archetype identifier

  • n_archetype_cells : int - Cells in archetype

  • n_other_cells : int - Cells in comparison group

  • mean_archetype : float - Mean expression in archetype

  • mean_other : float - Mean expression in others

  • log_fold_change : float - Log fold change

  • statistic : float - Mann-Whitney U statistic

  • pvalue : float - Raw p-value

  • fdr_pvalue : float - FDR-corrected p-value

  • significant : bool - Whether FDR < 0.05

  • direction : str - ‘higher’ or ‘lower’ in archetype

Return type:

pd.DataFrame

Raises:

ValueError – If required keys not found in adata.

Examples

>>> # Basic usage
>>> results = pc.tl.gene_associations(adata)
>>> sig_genes = results[results.significant]
>>> # Per-archetype FDR correction (less stringent)
>>> results = pc.tl.gene_associations(adata, fdr_scope="per_archetype")
>>> # Top markers per archetype
>>> for arch in results["archetype"].unique():
...     arch_genes = results[
...         (results["archetype"] == arch) & (results["significant"]) & (results["direction"] == "higher")
...     ].nlargest(10, "log_fold_change")
...     print(f"{arch}: {arch_genes['gene'].tolist()}")

See also

peach.tl.pathway_associations

Pathway-level testing

peach.tl.pattern_analysis

Comprehensive pattern analysis

peach._core.types.GeneAssociationResult

Result row structure

peach.tl.statistical.pathway_associations(adata, *, pathway_obsm_key='pathway_scores', obsm_key='archetype_distances', obs_key='archetypes', test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', test_direction='two-sided', min_logfc=0.01, min_cells=10, comparison_group='all', verbose=True, **kwargs)[source]#

Test pathway activity associations with archetypal assignments.

Performs Mann-Whitney U tests to identify pathways with significantly different activity between each archetype and all other cells.

Parameters:
  • adata (AnnData) –

    Annotated data object with:

    • obsm[pathway_obsm_key] : Pathway scores [n_cells, n_pathways]

    • obsm[obsm_key] : Archetype distance matrix

    • obs[obs_key] : Archetype assignments

    • uns[pathway_obsm_key + '_pathways'] : Pathway names (optional)

  • pathway_obsm_key (str, default: "pathway_scores") – Key in adata.obsm containing pathway activity scores.

  • obsm_key (str, default: "archetype_distances") – Key in adata.obsm containing archetype distance matrix.

  • obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.

  • test_method (str, default: "mannwhitneyu") – Statistical test method.

  • fdr_method (str, default: "benjamini_hochberg") – FDR correction method.

  • fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') – Scope of FDR correction.

  • test_direction (str, default: "two-sided") – Direction of statistical test.

  • min_logfc (float, default: 0.01) – Minimum effect size threshold (mean_diff for pathways).

  • min_cells (int, default: 10) – Minimum cells required per archetype.

  • comparison_group (str, default: 'all') – Comparison group: ‘all’ or ‘archetypes_only’.

  • verbose (bool, default: True) – Whether to print progress.

Returns:

Results with columns:

  • pathway : str - Pathway name

  • archetype : str - Archetype identifier

  • n_archetype_cells : int - Cells in archetype

  • n_other_cells : int - Cells in comparison

  • mean_archetype : float - Mean score in archetype

  • mean_other : float - Mean score in others

  • mean_diff : float - Mean difference (primary effect size)

  • log_fold_change : float - Alias for mean_diff

  • statistic : float - Test statistic

  • pvalue : float - Raw p-value

  • fdr_pvalue : float - FDR-corrected p-value

  • significant : bool - Whether significant

  • direction : str - ‘higher’ or ‘lower’

Return type:

pd.DataFrame

Notes

Pathway scores (from AUCell, pySCENIC, etc.) represent activity levels, not expression counts. Mean difference is more interpretable than log fold change for these scores.

Examples

>>> # Basic usage
>>> results = pc.tl.pathway_associations(adata)
>>> # Filter for specific pathway categories
>>> metabolism = results[results["pathway"].str.contains("METABOLISM", case=False)]
>>> # Top pathways per archetype
>>> for arch in results["archetype"].unique():
...     top = results[(results["archetype"] == arch) & (results["significant"])].nlargest(5, "mean_diff")
...     print(f"{arch}: {top['pathway'].tolist()}")

See also

peach.tl.gene_associations

Gene-level testing

peach._core.types.PathwayAssociationResult

Result row structure

peach.tl.statistical.conditional_associations(adata, *, obs_column, archetype_assignments=None, obs_key='archetypes', test_method='hypergeometric', fdr_method='benjamini_hochberg', min_cells=5, verbose=True, **kwargs)[source]#

Test associations between archetypes and categorical metadata.

Performs hypergeometric tests to identify significant enrichment of archetypes within different categorical conditions (samples, treatments, cell types, etc.).

Parameters:
  • adata (AnnData) –

    Annotated data object with:

    • obs[obs_key] : Archetype assignments

    • obs[obs_column] : Categorical variable to test

  • obs_column (str) – Column name in adata.obs containing categorical variable.

  • archetype_assignments (None, optional) – Deprecated. Archetype assignments now read from adata.obs[obs_key].

  • obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.

  • test_method (str, default: "hypergeometric") – Statistical test method (currently only ‘hypergeometric’).

  • fdr_method (str, default: "benjamini_hochberg") – FDR correction method.

  • min_cells (int, default: 5) – Minimum cells required per archetype-condition combination.

  • verbose (bool, default: True) – Whether to print progress.

Returns:

Results with columns:

  • archetype : str - Archetype identifier

  • condition : str - Condition value from obs_column

  • observed : int - Observed count in overlap

  • expected : float - Expected count under null

  • total_archetype : int - Total cells in archetype

  • total_condition : int - Total cells in condition

  • odds_ratio : float - Enrichment measure (>1 = enriched)

  • ci_lower : float - Lower 95% CI for odds ratio

  • ci_upper : float - Upper 95% CI for odds ratio

  • pvalue : float - Hypergeometric p-value

  • fdr_pvalue : float - FDR-corrected p-value

  • significant : bool - Whether significant

Return type:

pd.DataFrame

Examples

>>> # Test sample associations
>>> results = pc.tl.conditional_associations(adata, obs_column="sample")
>>> # Find enriched archetypes per condition
>>> enriched = results[(results["significant"]) & (results["odds_ratio"] > 2)]
>>> # Test treatment effects
>>> treatment_results = pc.tl.conditional_associations(adata, obs_column="treatment")

See also

peach._core.types.ConditionalAssociationResult

Result row structure

peach.tl.statistical.pattern_analysis(adata, *, data_obsm_key='pathway_scores', obs_key='archetypes', include_individual_tests=True, include_pattern_tests=True, include_exclusivity_analysis=True, verbose=True, **kwargs)[source]#

Comprehensive archetypal pattern analysis.

Performs systematic analysis combining three complementary approaches:

  1. Individual tests: Standard 1-vs-all archetype characterization

  2. Pattern tests: Systematic archetype combination testing (specialists, binary tradeoffs, complex patterns)

  3. Exclusivity analysis: Features with opposing patterns

Parameters:
  • adata (AnnData) – Annotated data object with archetypal assignments and scores.

  • data_obsm_key (str, default: "pathway_scores") – Key in adata.obsm containing scores for pattern analysis.

  • obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.

  • include_individual_tests (bool, default: True) – Run individual archetype 1-vs-all tests.

  • include_pattern_tests (bool, default: True) – Run systematic pattern tests (specialists, tradeoffs).

  • include_exclusivity_analysis (bool, default: True) – Analyze mutual exclusivity patterns.

  • verbose (bool, default: True) – Print analysis progress.

Returns:

Dictionary with keys:

  • 'individual' : Individual archetype results

  • 'patterns' : Pattern-based test results

  • 'exclusivity' : Mutual exclusivity results

Return type:

dict[str, pd.DataFrame]

Notes

Pattern Types in ‘patterns’ DataFrame:

  • specialization : Archetype vs archetype_0 (centroid)

  • tradeoff : Multi-archetype high vs low groups

Pattern Code Format: “12xxx_xx345”

  • Position = archetype number (0, 1, 2…)

  • Numbers = high archetypes

  • ‘x’ = low archetypes

  • Underscore separates high from low group

Examples

>>> # Run comprehensive analysis
>>> results = pc.tl.pattern_analysis(adata)
>>> # Access individual results
>>> individual = results["individual"]
>>> # Find specialists (exclusive to one archetype)
>>> patterns = results["patterns"]
>>> specialists = patterns[patterns["pattern_type"] == "specialization"]
>>> # Find mutual exclusivity patterns
>>> if not results["exclusivity"].empty:
...     exclusive = results["exclusivity"]
...     top_tradeoffs = exclusive.nlargest(10, "effect_range")

See also

peach.tl.archetype_exclusive_patterns

Focused exclusive pattern analysis

peach.tl.specialization_patterns

Centroid comparison analysis

peach.tl.tradeoff_patterns

Mutual exclusivity analysis

peach._core.types.ComprehensivePatternResults

Return type structure

peach.tl.statistical.archetype_exclusive_patterns(adata, *, data_obsm_key='pathway_scores', obs_key='archetypes', test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', min_effect_size=0.05, min_cells=10, use_pairwise=True, verbose=True, **kwargs)[source]#

Identify features exclusively high in single archetypes.

Finds genes or pathways specifically elevated in only one archetype compared to all others. Supports two methods:

  1. Pairwise (default): Tests each archetype vs every other archetype individually. Feature is exclusive if significantly higher vs ALL others. More stringent.

  2. 1-vs-all filtering: Tests each archetype vs all other cells. Feature is exclusive if significant in only ONE archetype’s test. More permissive, higher statistical power.

Parameters:
  • adata (AnnData) – Annotated data object with archetypal assignments.

  • data_obsm_key (str, default: "pathway_scores") – Key in adata.obsm for scores. Use None for gene expression.

  • obs_key (str, default: "archetypes") – Column in adata.obs with archetypal assignments.

  • test_method (str, default: "mannwhitneyu") – Statistical test method.

  • fdr_method (str, default: "benjamini_hochberg") – FDR correction method.

  • fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') – Scope of FDR correction.

  • min_effect_size (float, default: 0.05) – Minimum effect size (mean_diff for pathways, log_fc for genes).

  • min_cells (int, default: 10) – Minimum cells per archetype.

  • use_pairwise (bool, default: True) – If True, use rigorous pairwise comparisons. If False, use 1-vs-all filtering.

  • verbose (bool, default: True) – Print progress.

Returns:

Results with columns:

  • pathway/gene : Feature identifier

  • archetype : Exclusive archetype

  • mean_archetype : Mean in exclusive archetype

  • mean_other : Mean in other archetypes

  • mean_diff/log_fold_change : Effect size

  • exclusivity_score : Ratio vs max other archetype

  • pvalue, fdr_pvalue, significant

  • pattern_type : ‘exclusive’ or ‘exclusive_pairwise’

Return type:

pd.DataFrame

Examples

>>> # Pairwise method (more stringent)
>>> exclusive = pc.tl.archetype_exclusive_patterns(adata)
>>> # 1-vs-all method (more permissive)
>>> exclusive = pc.tl.archetype_exclusive_patterns(adata, use_pairwise=False)
>>> # Find markers for specific archetype
>>> arch3_markers = exclusive[exclusive["archetype"] == "archetype_3"]
>>> top_markers = arch3_markers.nlargest(10, "exclusivity_score")

See also

peach.tl.pattern_analysis

Comprehensive pattern analysis

peach._core.types.ExclusivePatternResult

Result row structure

peach.tl.statistical.specialization_patterns(adata, *, data_obsm_key='pathway_scores', obs_key='archetypes', test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', min_cells=10, verbose=True, **kwargs)[source]#

Identify specialization features relative to centroid archetype.

Compares each archetype to archetype_0 (centroid/generalist) to find features representing specialized states or differentiation away from the central cellular state.

Parameters:
  • adata (AnnData) – Annotated data object with archetypal assignments.

  • data_obsm_key (str, default: "pathway_scores") – Key in adata.obsm for scores.

  • obs_key (str, default: "archetypes") – Column in adata.obs with archetypal assignments.

  • test_method (str, default: "mannwhitneyu") – Statistical test method.

  • fdr_method (str, default: "benjamini_hochberg") – FDR correction method.

  • fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') – Scope of FDR correction.

  • min_cells (int, default: 10) – Minimum cells per archetype.

  • verbose (bool, default: True) – Print progress.

Returns:

Results showing specialization from archetype_0.

Return type:

pd.DataFrame

Notes

Archetype_0 typically represents the centroid or generalist state where cells have balanced contributions from all archetypes. Features elevated in other archetypes relative to archetype_0 represent specialized cellular programs.

Examples

>>> spec = pc.tl.specialization_patterns(adata)
>>> # Find archetype_4 specialization features
>>> arch4_spec = spec[(spec["archetype"] == "archetype_4") & (spec["significant"])]

See also

peach.tl.archetype_exclusive_patterns

Exclusive pattern analysis

peach.tl.statistical.tradeoff_patterns(adata, *, data_obsm_key='pathway_scores', obs_key='archetypes', tradeoffs='pairs', test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', min_cells=10, min_effect_size=0.1, verbose=True, **kwargs)[source]#

Identify mutual exclusivity and tradeoff patterns.

Finds features showing opposing patterns between archetypes, indicating biological tradeoffs or mutually exclusive states.

Parameters:
  • adata (AnnData) – Annotated data object with archetypal assignments.

  • data_obsm_key (str, default: "pathway_scores") – Key in adata.obsm for scores.

  • obs_key (str, default: "archetypes") – Column in adata.obs with archetypal assignments.

  • tradeoffs ({'pairs', 'patterns'}, default: 'pairs') –

    Type of tradeoff analysis:

    • 'pairs' : Binary pairwise (A high, B low)

    • 'patterns' : Complex multi-archetype (AB high, CD low)

  • test_method (str, default: "mannwhitneyu") – Statistical test method.

  • fdr_method (str, default: "benjamini_hochberg") – FDR correction method.

  • fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') – Scope of FDR correction.

  • min_cells (int, default: 10) – Minimum cells per group.

  • min_effect_size (float, default: 0.1) – Minimum effect size for tradeoffs.

  • verbose (bool, default: True) – Print progress.

  • **kwargs

    Additional parameters:

    • max_pattern_sizeint, default: 2

      Maximum archetypes per group for complex patterns.

    • exclude_archetype_0bool, default: True

      Exclude archetype_0 from tradeoff patterns.

    • specific_patternsList[str], optional

      Test only specific patterns (e.g., [‘2v3’, ‘1v45’]).

Returns:

Results with tradeoff patterns:

  • pattern_code : Visual pattern code

  • high_archetypes, low_archetypes : Groups

  • mean_high, mean_low : Group means

  • log_fold_change : Effect size

  • pattern_complexity : Number of archetypes involved

Return type:

pd.DataFrame

Examples

>>> # Find pairwise tradeoffs
>>> pairs = pc.tl.tradeoff_patterns(adata, tradeoffs="pairs")
>>> # Find complex patterns
>>> patterns = pc.tl.tradeoff_patterns(adata, tradeoffs="patterns", max_pattern_size=3)
>>> # Test specific hypothesis
>>> specific = pc.tl.tradeoff_patterns(adata, specific_patterns=["2v3", "1v4"])

See also

peach.tl.archetype_exclusive_patterns

Exclusive pattern analysis

peach._core.types.PatternAssociationResult

Result row structure