peach.tl.statistical

peach.tl.statistical#

Statistical Testing for Archetypal Analysis#

User-facing API for statistical analysis of archetype-feature associations.

This module provides comprehensive statistical testing tools for characterizing archetypes by their gene expression, pathway activity, and metadata associations. All functions implement robust statistical methods with proper multiple testing correction.

Main Functions#

gene_associations: Mann-Whitney U tests for gene-archetype associations
pathway_associations: Pathway activity testing for archetype characterization
conditional_associations: Hypergeometric tests for metadata enrichment
pattern_analysis: Comprehensive archetypal pattern analysis
archetype_exclusive_patterns: Identify features exclusively high in single archetypes
specialization_patterns: Compare archetypes to centroid (archetype_0)
tradeoff_patterns: Identify mutual exclusivity between archetypes

Type Definitions#

See peach._core.types for Pydantic models:

GeneAssociationResult : gene_associations() row structure
PathwayAssociationResult : pathway_associations() row structure
ConditionalAssociationResult : conditional_associations() row structure
PatternAssociationResult : Pattern test result structure
ExclusivePatternResult : Exclusive pattern result structure
ComprehensivePatternResults : pattern_analysis() return structure

Examples

>>> import peach as pc
>>> # Gene associations
>>> gene_results = pc.tl.gene_associations(adata)
>>> sig_genes = gene_results[gene_results.significant]
>>> # Pathway associations
>>> pathway_results = pc.tl.pathway_associations(adata)
>>> # Comprehensive pattern analysis
>>> patterns = pc.tl.pattern_analysis(adata)
>>> specialists = patterns["patterns"][patterns["patterns"]["pattern_type"] == "specialization"]

See also

peach._core.utils.statistical_tests: Core implementation
peach._core.types: Type definitions

Functions

`archetype_exclusive_patterns`(adata, *[, ...])	Identify features exclusively high in single archetypes.
`conditional_associations`(adata, *, obs_column)	Test associations between archetypes and categorical metadata.
`gene_associations`(adata, *[, bin_prop, ...])	Test gene expression associations with archetypal assignments.
`pathway_associations`(adata, *[, ...])	Test pathway activity associations with archetypal assignments.
`pattern_analysis`(adata, *[, data_obsm_key, ...])	Comprehensive archetypal pattern analysis.
`specialization_patterns`(adata, *[, ...])	Identify specialization features relative to centroid archetype.
`tradeoff_patterns`(adata, *[, data_obsm_key, ...])	Identify mutual exclusivity and tradeoff patterns.

peach.tl.statistical.gene_associations(adata, *, bin_prop=0.1, obsm_key='archetype_distances', obs_key='archetypes', use_layer=None, test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', test_direction='two-sided', min_logfc=0.01, min_cells=10, comparison_group='all', verbose=True, **kwargs)[source]#

Test gene expression associations with archetypal assignments.

Performs Mann-Whitney U tests to identify genes with significantly different expression between each archetype and all other cells (1-vs-all testing paradigm).

Parameters:

adata (AnnData) –
Annotated data object with:
- obsm[obsm_key] : Archetype distance matrix [n_cells, n_archetypes]
- obs[obs_key] : Archetype assignments (from bin_cells_by_archetype)
- X or layers[use_layer] : Gene expression data
bin_prop (float, default: 0.1) – Proportion of cells closest to each archetype to use for binning.
obsm_key (str, default: "archetype_distances") – Key in adata.obsm containing archetype distance matrix.
obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.
use_layer (str | None, default: None) – Layer for gene expression. If None, uses adata.X. Auto-selects ‘logcounts’ or ‘log1p’ if available.
test_method (str, default: "mannwhitneyu") – Statistical test method. Currently supports ‘mannwhitneyu’.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method: ‘benjamini_hochberg’ or ‘bonferroni’.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') –
Scope of FDR correction:
- 'global' : Correct across all tests (most stringent)
- 'per_archetype' : Correct within each archetype
- 'none' : No FDR correction (raw p-values)
test_direction (str, default: "two-sided") – Direction of statistical test: ‘two-sided’, ‘greater’, or ‘less’.
min_logfc (float, default: 0.01) – Minimum absolute log fold change threshold for filtering.
min_cells (int, default: 10) – Minimum cells required per archetype for testing.
comparison_group (str, default: 'all') –
Comparison group for statistical tests:
- 'all' : Compare archetype cells vs ALL other cells
- 'archetypes_only' : Compare vs cells in other archetypes only (excludes archetype_0 and unassigned cells)
verbose (bool, default: True) – Whether to print progress messages.

Returns:

Results with columns:

gene : str - Gene symbol/identifier
archetype : str - Archetype identifier
n_archetype_cells : int - Cells in archetype
n_other_cells : int - Cells in comparison group
mean_archetype : float - Mean expression in archetype
mean_other : float - Mean expression in others
log_fold_change : float - Log fold change
statistic : float - Mann-Whitney U statistic
pvalue : float - Raw p-value
fdr_pvalue : float - FDR-corrected p-value
significant : bool - Whether FDR < 0.05
direction : str - ‘higher’ or ‘lower’ in archetype

Return type:

pd.DataFrame

Raises:

ValueError – If required keys not found in adata.

Examples

>>> # Basic usage
>>> results = pc.tl.gene_associations(adata)
>>> sig_genes = results[results.significant]
>>> # Per-archetype FDR correction (less stringent)
>>> results = pc.tl.gene_associations(adata, fdr_scope="per_archetype")
>>> # Top markers per archetype
>>> for arch in results["archetype"].unique():
...     arch_genes = results[
...         (results["archetype"] == arch) & (results["significant"]) & (results["direction"] == "higher")
...     ].nlargest(10, "log_fold_change")
...     print(f"{arch}: {arch_genes['gene'].tolist()}")

See also

peach.tl.pathway_associations: Pathway-level testing
peach.tl.pattern_analysis: Comprehensive pattern analysis
peach._core.types.GeneAssociationResult: Result row structure

peach.tl.statistical.pathway_associations(adata, *, pathway_obsm_key='pathway_scores', obsm_key='archetype_distances', obs_key='archetypes', test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', test_direction='two-sided', min_logfc=0.01, min_cells=10, comparison_group='all', verbose=True, **kwargs)[source]#

Test pathway activity associations with archetypal assignments.

Performs Mann-Whitney U tests to identify pathways with significantly different activity between each archetype and all other cells.

Parameters:

adata (AnnData) –
Annotated data object with:
- obsm[pathway_obsm_key] : Pathway scores [n_cells, n_pathways]
- obsm[obsm_key] : Archetype distance matrix
- obs[obs_key] : Archetype assignments
- uns[pathway_obsm_key + '_pathways'] : Pathway names (optional)
pathway_obsm_key (str, default: "pathway_scores") – Key in adata.obsm containing pathway activity scores.
obsm_key (str, default: "archetype_distances") – Key in adata.obsm containing archetype distance matrix.
obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.
test_method (str, default: "mannwhitneyu") – Statistical test method.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') – Scope of FDR correction.
test_direction (str, default: "two-sided") – Direction of statistical test.
min_logfc (float, default: 0.01) – Minimum effect size threshold (mean_diff for pathways).
min_cells (int, default: 10) – Minimum cells required per archetype.
comparison_group (str, default: 'all') – Comparison group: ‘all’ or ‘archetypes_only’.
verbose (bool, default: True) – Whether to print progress.

Returns:

Results with columns:

pathway : str - Pathway name
archetype : str - Archetype identifier
n_archetype_cells : int - Cells in archetype
n_other_cells : int - Cells in comparison
mean_archetype : float - Mean score in archetype
mean_other : float - Mean score in others
mean_diff : float - Mean difference (primary effect size)
log_fold_change : float - Alias for mean_diff
statistic : float - Test statistic
pvalue : float - Raw p-value
fdr_pvalue : float - FDR-corrected p-value
significant : bool - Whether significant
direction : str - ‘higher’ or ‘lower’

Return type:

pd.DataFrame

Notes

Pathway scores (from AUCell, pySCENIC, etc.) represent activity levels, not expression counts. Mean difference is more interpretable than log fold change for these scores.

Examples

>>> # Basic usage
>>> results = pc.tl.pathway_associations(adata)
>>> # Filter for specific pathway categories
>>> metabolism = results[results["pathway"].str.contains("METABOLISM", case=False)]
>>> # Top pathways per archetype
>>> for arch in results["archetype"].unique():
...     top = results[(results["archetype"] == arch) & (results["significant"])].nlargest(5, "mean_diff")
...     print(f"{arch}: {top['pathway'].tolist()}")

See also

peach.tl.gene_associations: Gene-level testing
peach._core.types.PathwayAssociationResult: Result row structure

peach.tl.statistical.conditional_associations(adata, *, obs_column, archetype_assignments=None, obs_key='archetypes', test_method='hypergeometric', fdr_method='benjamini_hochberg', min_cells=5, verbose=True, **kwargs)[source]#

Test associations between archetypes and categorical metadata.

Performs hypergeometric tests to identify significant enrichment of archetypes within different categorical conditions (samples, treatments, cell types, etc.).

Parameters:

adata (AnnData) –
Annotated data object with:
- obs[obs_key] : Archetype assignments
- obs[obs_column] : Categorical variable to test
obs_column (str) – Column name in adata.obs containing categorical variable.
archetype_assignments (None, optional) – Deprecated. Archetype assignments now read from adata.obs[obs_key].
obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.
test_method (str, default: "hypergeometric") – Statistical test method (currently only ‘hypergeometric’).
fdr_method (str, default: "benjamini_hochberg") – FDR correction method.
min_cells (int, default: 5) – Minimum cells required per archetype-condition combination.
verbose (bool, default: True) – Whether to print progress.

Returns:

Results with columns:

archetype : str - Archetype identifier
condition : str - Condition value from obs_column
observed : int - Observed count in overlap
expected : float - Expected count under null
total_archetype : int - Total cells in archetype
total_condition : int - Total cells in condition
odds_ratio : float - Enrichment measure (>1 = enriched)
ci_lower : float - Lower 95% CI for odds ratio
ci_upper : float - Upper 95% CI for odds ratio
pvalue : float - Hypergeometric p-value
fdr_pvalue : float - FDR-corrected p-value
significant : bool - Whether significant

Return type:

pd.DataFrame

Examples

>>> # Test sample associations
>>> results = pc.tl.conditional_associations(adata, obs_column="sample")
>>> # Find enriched archetypes per condition
>>> enriched = results[(results["significant"]) & (results["odds_ratio"] > 2)]
>>> # Test treatment effects
>>> treatment_results = pc.tl.conditional_associations(adata, obs_column="treatment")

See also

peach._core.types.ConditionalAssociationResult: Result row structure

peach.tl.statistical.pattern_analysis(adata, *, data_obsm_key='pathway_scores', obs_key='archetypes', include_individual_tests=True, include_pattern_tests=True, include_exclusivity_analysis=True, verbose=True, **kwargs)[source]#

Comprehensive archetypal pattern analysis.

Performs systematic analysis combining three complementary approaches:

Individual tests: Standard 1-vs-all archetype characterization
Pattern tests: Systematic archetype combination testing (specialists, binary tradeoffs, complex patterns)
Exclusivity analysis: Features with opposing patterns

Parameters:

adata (AnnData) – Annotated data object with archetypal assignments and scores.
data_obsm_key (str, default: "pathway_scores") – Key in adata.obsm containing scores for pattern analysis.
obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.
include_individual_tests (bool, default: True) – Run individual archetype 1-vs-all tests.
include_pattern_tests (bool, default: True) – Run systematic pattern tests (specialists, tradeoffs).
include_exclusivity_analysis (bool, default: True) – Analyze mutual exclusivity patterns.
verbose (bool, default: True) – Print analysis progress.

Returns:

Dictionary with keys:

'individual' : Individual archetype results
'patterns' : Pattern-based test results
'exclusivity' : Mutual exclusivity results

Return type:

dict[str, pd.DataFrame]

Notes

Pattern Types in ‘patterns’ DataFrame:

specialization : Archetype vs archetype_0 (centroid)
tradeoff : Multi-archetype high vs low groups

Pattern Code Format: “12xxx_xx345”

Position = archetype number (0, 1, 2…)
Numbers = high archetypes
‘x’ = low archetypes
Underscore separates high from low group

Examples

>>> # Run comprehensive analysis
>>> results = pc.tl.pattern_analysis(adata)
>>> # Access individual results
>>> individual = results["individual"]
>>> # Find specialists (exclusive to one archetype)
>>> patterns = results["patterns"]
>>> specialists = patterns[patterns["pattern_type"] == "specialization"]
>>> # Find mutual exclusivity patterns
>>> if not results["exclusivity"].empty:
...     exclusive = results["exclusivity"]
...     top_tradeoffs = exclusive.nlargest(10, "effect_range")

See also

peach.tl.archetype_exclusive_patterns: Focused exclusive pattern analysis
peach.tl.specialization_patterns: Centroid comparison analysis
peach.tl.tradeoff_patterns: Mutual exclusivity analysis
peach._core.types.ComprehensivePatternResults: Return type structure

peach.tl.statistical.archetype_exclusive_patterns(adata, *, data_obsm_key='pathway_scores', obs_key='archetypes', test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', min_effect_size=0.05, min_cells=10, use_pairwise=True, verbose=True, **kwargs)[source]#

Identify features exclusively high in single archetypes.

Finds genes or pathways specifically elevated in only one archetype compared to all others. Supports two methods:

Pairwise (default): Tests each archetype vs every other archetype individually. Feature is exclusive if significantly higher vs ALL others. More stringent.
1-vs-all filtering: Tests each archetype vs all other cells. Feature is exclusive if significant in only ONE archetype’s test. More permissive, higher statistical power.

Parameters:

adata (AnnData) – Annotated data object with archetypal assignments.
data_obsm_key (str, default: "pathway_scores") – Key in adata.obsm for scores. Use None for gene expression.
obs_key (str, default: "archetypes") – Column in adata.obs with archetypal assignments.
test_method (str, default: "mannwhitneyu") – Statistical test method.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') – Scope of FDR correction.
min_effect_size (float, default: 0.05) – Minimum effect size (mean_diff for pathways, log_fc for genes).
min_cells (int, default: 10) – Minimum cells per archetype.
use_pairwise (bool, default: True) – If True, use rigorous pairwise comparisons. If False, use 1-vs-all filtering.
verbose (bool, default: True) – Print progress.

Returns:

Results with columns:

pathway/gene : Feature identifier
archetype : Exclusive archetype
mean_archetype : Mean in exclusive archetype
mean_other : Mean in other archetypes
mean_diff/log_fold_change : Effect size
exclusivity_score : Ratio vs max other archetype
pvalue, fdr_pvalue, significant
pattern_type : ‘exclusive’ or ‘exclusive_pairwise’

Return type:

pd.DataFrame

Examples

>>> # Pairwise method (more stringent)
>>> exclusive = pc.tl.archetype_exclusive_patterns(adata)
>>> # 1-vs-all method (more permissive)
>>> exclusive = pc.tl.archetype_exclusive_patterns(adata, use_pairwise=False)
>>> # Find markers for specific archetype
>>> arch3_markers = exclusive[exclusive["archetype"] == "archetype_3"]
>>> top_markers = arch3_markers.nlargest(10, "exclusivity_score")

See also

peach.tl.pattern_analysis: Comprehensive pattern analysis
peach._core.types.ExclusivePatternResult: Result row structure

peach.tl.statistical.specialization_patterns(adata, *, data_obsm_key='pathway_scores', obs_key='archetypes', test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', min_cells=10, verbose=True, **kwargs)[source]#

Identify specialization features relative to centroid archetype.

Compares each archetype to archetype_0 (centroid/generalist) to find features representing specialized states or differentiation away from the central cellular state.

Parameters:

adata (AnnData) – Annotated data object with archetypal assignments.
data_obsm_key (str, default: "pathway_scores") – Key in adata.obsm for scores.
obs_key (str, default: "archetypes") – Column in adata.obs with archetypal assignments.
test_method (str, default: "mannwhitneyu") – Statistical test method.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') – Scope of FDR correction.
min_cells (int, default: 10) – Minimum cells per archetype.
verbose (bool, default: True) – Print progress.

Returns:

Results showing specialization from archetype_0.

Return type:

pd.DataFrame

Notes

Archetype_0 typically represents the centroid or generalist state where cells have balanced contributions from all archetypes. Features elevated in other archetypes relative to archetype_0 represent specialized cellular programs.

Examples

>>> spec = pc.tl.specialization_patterns(adata)
>>> # Find archetype_4 specialization features
>>> arch4_spec = spec[(spec["archetype"] == "archetype_4") & (spec["significant"])]

See also

peach.tl.archetype_exclusive_patterns: Exclusive pattern analysis

peach.tl.statistical.tradeoff_patterns(adata, *, data_obsm_key='pathway_scores', obs_key='archetypes', tradeoffs='pairs', test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', min_cells=10, min_effect_size=0.1, verbose=True, **kwargs)[source]#

Identify mutual exclusivity and tradeoff patterns.

Finds features showing opposing patterns between archetypes, indicating biological tradeoffs or mutually exclusive states.

Parameters:

adata (AnnData) – Annotated data object with archetypal assignments.
data_obsm_key (str, default: "pathway_scores") – Key in adata.obsm for scores.
obs_key (str, default: "archetypes") – Column in adata.obs with archetypal assignments.
tradeoffs ({'pairs', 'patterns'}, default: 'pairs') –
Type of tradeoff analysis:
- 'pairs' : Binary pairwise (A high, B low)
- 'patterns' : Complex multi-archetype (AB high, CD low)
test_method (str, default: "mannwhitneyu") – Statistical test method.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') – Scope of FDR correction.
min_cells (int, default: 10) – Minimum cells per group.
min_effect_size (float, default: 0.1) – Minimum effect size for tradeoffs.
verbose (bool, default: True) – Print progress.
**kwargs –
Additional parameters:
- max_pattern_sizeint, default: 2
  Maximum archetypes per group for complex patterns.
- exclude_archetype_0bool, default: True
  Exclude archetype_0 from tradeoff patterns.
- specific_patternsList[str], optional
  Test only specific patterns (e.g., [‘2v3’, ‘1v45’]).

Returns:

Results with tradeoff patterns:

pattern_code : Visual pattern code
high_archetypes, low_archetypes : Groups
mean_high, mean_low : Group means
log_fold_change : Effect size
pattern_complexity : Number of archetypes involved

Return type:

pd.DataFrame

Examples

>>> # Find pairwise tradeoffs
>>> pairs = pc.tl.tradeoff_patterns(adata, tradeoffs="pairs")
>>> # Find complex patterns
>>> patterns = pc.tl.tradeoff_patterns(adata, tradeoffs="patterns", max_pattern_size=3)
>>> # Test specific hypothesis
>>> specific = pc.tl.tradeoff_patterns(adata, specific_patterns=["2v3", "1v4"])

See also

peach.tl.archetype_exclusive_patterns: Exclusive pattern analysis
peach._core.types.PatternAssociationResult: Result row structure

peach.tl.statistical

Contents

peach.tl.statistical#

Statistical Testing for Archetypal Analysis#

Main Functions#

Type Definitions#