peach.tl.statistical#
Statistical Testing for Archetypal Analysis#
User-facing API for statistical analysis of archetype-feature associations.
This module provides comprehensive statistical testing tools for characterizing archetypes by their gene expression, pathway activity, and metadata associations. All functions implement robust statistical methods with proper multiple testing correction.
Main Functions#
- gene_associations
Mann-Whitney U tests for gene-archetype associations
- pathway_associations
Pathway activity testing for archetype characterization
- conditional_associations
Hypergeometric tests for metadata enrichment
- pattern_analysis
Comprehensive archetypal pattern analysis
- archetype_exclusive_patterns
Identify features exclusively high in single archetypes
- specialization_patterns
Compare archetypes to centroid (archetype_0)
- tradeoff_patterns
Identify mutual exclusivity between archetypes
Type Definitions#
See peach._core.types for Pydantic models:
GeneAssociationResult: gene_associations() row structurePathwayAssociationResult: pathway_associations() row structureConditionalAssociationResult: conditional_associations() row structurePatternAssociationResult: Pattern test result structureExclusivePatternResult: Exclusive pattern result structureComprehensivePatternResults: pattern_analysis() return structure
Examples
>>> import peach as pc
>>> # Gene associations
>>> gene_results = pc.tl.gene_associations(adata)
>>> sig_genes = gene_results[gene_results.significant]
>>> # Pathway associations
>>> pathway_results = pc.tl.pathway_associations(adata)
>>> # Comprehensive pattern analysis
>>> patterns = pc.tl.pattern_analysis(adata)
>>> specialists = patterns["patterns"][patterns["patterns"]["pattern_type"] == "specialization"]
See also
peach._core.utils.statistical_testsCore implementation
peach._core.typesType definitions
Functions
|
Identify features exclusively high in single archetypes. |
|
Test associations between archetypes and categorical metadata. |
|
Test gene expression associations with archetypal assignments. |
|
Test pathway activity associations with archetypal assignments. |
|
Comprehensive archetypal pattern analysis. |
|
Identify specialization features relative to centroid archetype. |
|
Identify mutual exclusivity and tradeoff patterns. |
- peach.tl.statistical.gene_associations(adata, *, bin_prop=0.1, obsm_key='archetype_distances', obs_key='archetypes', use_layer=None, test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', test_direction='two-sided', min_logfc=0.01, min_cells=10, comparison_group='all', verbose=True, **kwargs)[source]#
Test gene expression associations with archetypal assignments.
Performs Mann-Whitney U tests to identify genes with significantly different expression between each archetype and all other cells (1-vs-all testing paradigm).
- Parameters:
adata (AnnData) –
Annotated data object with:
obsm[obsm_key]: Archetype distance matrix [n_cells, n_archetypes]obs[obs_key]: Archetype assignments (from bin_cells_by_archetype)Xorlayers[use_layer]: Gene expression data
bin_prop (float, default: 0.1) – Proportion of cells closest to each archetype to use for binning.
obsm_key (str, default: "archetype_distances") – Key in adata.obsm containing archetype distance matrix.
obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.
use_layer (str | None, default: None) – Layer for gene expression. If None, uses adata.X. Auto-selects ‘logcounts’ or ‘log1p’ if available.
test_method (str, default: "mannwhitneyu") – Statistical test method. Currently supports ‘mannwhitneyu’.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method: ‘benjamini_hochberg’ or ‘bonferroni’.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') –
Scope of FDR correction:
'global': Correct across all tests (most stringent)'per_archetype': Correct within each archetype'none': No FDR correction (raw p-values)
test_direction (str, default: "two-sided") – Direction of statistical test: ‘two-sided’, ‘greater’, or ‘less’.
min_logfc (float, default: 0.01) – Minimum absolute log fold change threshold for filtering.
min_cells (int, default: 10) – Minimum cells required per archetype for testing.
comparison_group (str, default: 'all') –
Comparison group for statistical tests:
'all': Compare archetype cells vs ALL other cells'archetypes_only': Compare vs cells in other archetypes only (excludes archetype_0 and unassigned cells)
verbose (bool, default: True) – Whether to print progress messages.
- Returns:
Results with columns:
gene: str - Gene symbol/identifierarchetype: str - Archetype identifiern_archetype_cells: int - Cells in archetypen_other_cells: int - Cells in comparison groupmean_archetype: float - Mean expression in archetypemean_other: float - Mean expression in otherslog_fold_change: float - Log fold changestatistic: float - Mann-Whitney U statisticpvalue: float - Raw p-valuefdr_pvalue: float - FDR-corrected p-valuesignificant: bool - Whether FDR < 0.05direction: str - ‘higher’ or ‘lower’ in archetype
- Return type:
pd.DataFrame
- Raises:
ValueError – If required keys not found in adata.
Examples
>>> # Basic usage >>> results = pc.tl.gene_associations(adata) >>> sig_genes = results[results.significant] >>> # Per-archetype FDR correction (less stringent) >>> results = pc.tl.gene_associations(adata, fdr_scope="per_archetype") >>> # Top markers per archetype >>> for arch in results["archetype"].unique(): ... arch_genes = results[ ... (results["archetype"] == arch) & (results["significant"]) & (results["direction"] == "higher") ... ].nlargest(10, "log_fold_change") ... print(f"{arch}: {arch_genes['gene'].tolist()}")
See also
peach.tl.pathway_associationsPathway-level testing
peach.tl.pattern_analysisComprehensive pattern analysis
peach._core.types.GeneAssociationResultResult row structure
- peach.tl.statistical.pathway_associations(adata, *, pathway_obsm_key='pathway_scores', obsm_key='archetype_distances', obs_key='archetypes', test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', test_direction='two-sided', min_logfc=0.01, min_cells=10, comparison_group='all', verbose=True, **kwargs)[source]#
Test pathway activity associations with archetypal assignments.
Performs Mann-Whitney U tests to identify pathways with significantly different activity between each archetype and all other cells.
- Parameters:
adata (AnnData) –
Annotated data object with:
obsm[pathway_obsm_key]: Pathway scores [n_cells, n_pathways]obsm[obsm_key]: Archetype distance matrixobs[obs_key]: Archetype assignmentsuns[pathway_obsm_key + '_pathways']: Pathway names (optional)
pathway_obsm_key (str, default: "pathway_scores") – Key in adata.obsm containing pathway activity scores.
obsm_key (str, default: "archetype_distances") – Key in adata.obsm containing archetype distance matrix.
obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.
test_method (str, default: "mannwhitneyu") – Statistical test method.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') – Scope of FDR correction.
test_direction (str, default: "two-sided") – Direction of statistical test.
min_logfc (float, default: 0.01) – Minimum effect size threshold (mean_diff for pathways).
min_cells (int, default: 10) – Minimum cells required per archetype.
comparison_group (str, default: 'all') – Comparison group: ‘all’ or ‘archetypes_only’.
verbose (bool, default: True) – Whether to print progress.
- Returns:
Results with columns:
pathway: str - Pathway namearchetype: str - Archetype identifiern_archetype_cells: int - Cells in archetypen_other_cells: int - Cells in comparisonmean_archetype: float - Mean score in archetypemean_other: float - Mean score in othersmean_diff: float - Mean difference (primary effect size)log_fold_change: float - Alias for mean_diffstatistic: float - Test statisticpvalue: float - Raw p-valuefdr_pvalue: float - FDR-corrected p-valuesignificant: bool - Whether significantdirection: str - ‘higher’ or ‘lower’
- Return type:
pd.DataFrame
Notes
Pathway scores (from AUCell, pySCENIC, etc.) represent activity levels, not expression counts. Mean difference is more interpretable than log fold change for these scores.
Examples
>>> # Basic usage >>> results = pc.tl.pathway_associations(adata) >>> # Filter for specific pathway categories >>> metabolism = results[results["pathway"].str.contains("METABOLISM", case=False)] >>> # Top pathways per archetype >>> for arch in results["archetype"].unique(): ... top = results[(results["archetype"] == arch) & (results["significant"])].nlargest(5, "mean_diff") ... print(f"{arch}: {top['pathway'].tolist()}")
See also
peach.tl.gene_associationsGene-level testing
peach._core.types.PathwayAssociationResultResult row structure
- peach.tl.statistical.conditional_associations(adata, *, obs_column, archetype_assignments=None, obs_key='archetypes', test_method='hypergeometric', fdr_method='benjamini_hochberg', min_cells=5, verbose=True, **kwargs)[source]#
Test associations between archetypes and categorical metadata.
Performs hypergeometric tests to identify significant enrichment of archetypes within different categorical conditions (samples, treatments, cell types, etc.).
- Parameters:
adata (AnnData) –
Annotated data object with:
obs[obs_key]: Archetype assignmentsobs[obs_column]: Categorical variable to test
obs_column (str) – Column name in adata.obs containing categorical variable.
archetype_assignments (None, optional) – Deprecated. Archetype assignments now read from adata.obs[obs_key].
obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.
test_method (str, default: "hypergeometric") – Statistical test method (currently only ‘hypergeometric’).
fdr_method (str, default: "benjamini_hochberg") – FDR correction method.
min_cells (int, default: 5) – Minimum cells required per archetype-condition combination.
verbose (bool, default: True) – Whether to print progress.
- Returns:
Results with columns:
archetype: str - Archetype identifiercondition: str - Condition value from obs_columnobserved: int - Observed count in overlapexpected: float - Expected count under nulltotal_archetype: int - Total cells in archetypetotal_condition: int - Total cells in conditionodds_ratio: float - Enrichment measure (>1 = enriched)ci_lower: float - Lower 95% CI for odds ratioci_upper: float - Upper 95% CI for odds ratiopvalue: float - Hypergeometric p-valuefdr_pvalue: float - FDR-corrected p-valuesignificant: bool - Whether significant
- Return type:
pd.DataFrame
Examples
>>> # Test sample associations >>> results = pc.tl.conditional_associations(adata, obs_column="sample") >>> # Find enriched archetypes per condition >>> enriched = results[(results["significant"]) & (results["odds_ratio"] > 2)] >>> # Test treatment effects >>> treatment_results = pc.tl.conditional_associations(adata, obs_column="treatment")
See also
peach._core.types.ConditionalAssociationResultResult row structure
- peach.tl.statistical.pattern_analysis(adata, *, data_obsm_key='pathway_scores', obs_key='archetypes', include_individual_tests=True, include_pattern_tests=True, include_exclusivity_analysis=True, verbose=True, **kwargs)[source]#
Comprehensive archetypal pattern analysis.
Performs systematic analysis combining three complementary approaches:
Individual tests: Standard 1-vs-all archetype characterization
Pattern tests: Systematic archetype combination testing (specialists, binary tradeoffs, complex patterns)
Exclusivity analysis: Features with opposing patterns
- Parameters:
adata (AnnData) – Annotated data object with archetypal assignments and scores.
data_obsm_key (str, default: "pathway_scores") – Key in adata.obsm containing scores for pattern analysis.
obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.
include_individual_tests (bool, default: True) – Run individual archetype 1-vs-all tests.
include_pattern_tests (bool, default: True) – Run systematic pattern tests (specialists, tradeoffs).
include_exclusivity_analysis (bool, default: True) – Analyze mutual exclusivity patterns.
verbose (bool, default: True) – Print analysis progress.
- Returns:
Dictionary with keys:
'individual': Individual archetype results'patterns': Pattern-based test results'exclusivity': Mutual exclusivity results
- Return type:
Notes
Pattern Types in ‘patterns’ DataFrame:
specialization: Archetype vs archetype_0 (centroid)tradeoff: Multi-archetype high vs low groups
Pattern Code Format: “12xxx_xx345”
Position = archetype number (0, 1, 2…)
Numbers = high archetypes
‘x’ = low archetypes
Underscore separates high from low group
Examples
>>> # Run comprehensive analysis >>> results = pc.tl.pattern_analysis(adata) >>> # Access individual results >>> individual = results["individual"] >>> # Find specialists (exclusive to one archetype) >>> patterns = results["patterns"] >>> specialists = patterns[patterns["pattern_type"] == "specialization"] >>> # Find mutual exclusivity patterns >>> if not results["exclusivity"].empty: ... exclusive = results["exclusivity"] ... top_tradeoffs = exclusive.nlargest(10, "effect_range")
See also
peach.tl.archetype_exclusive_patternsFocused exclusive pattern analysis
peach.tl.specialization_patternsCentroid comparison analysis
peach.tl.tradeoff_patternsMutual exclusivity analysis
peach._core.types.ComprehensivePatternResultsReturn type structure
- peach.tl.statistical.archetype_exclusive_patterns(adata, *, data_obsm_key='pathway_scores', obs_key='archetypes', test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', min_effect_size=0.05, min_cells=10, use_pairwise=True, verbose=True, **kwargs)[source]#
Identify features exclusively high in single archetypes.
Finds genes or pathways specifically elevated in only one archetype compared to all others. Supports two methods:
Pairwise (default): Tests each archetype vs every other archetype individually. Feature is exclusive if significantly higher vs ALL others. More stringent.
1-vs-all filtering: Tests each archetype vs all other cells. Feature is exclusive if significant in only ONE archetype’s test. More permissive, higher statistical power.
- Parameters:
adata (AnnData) – Annotated data object with archetypal assignments.
data_obsm_key (str, default: "pathway_scores") – Key in adata.obsm for scores. Use None for gene expression.
obs_key (str, default: "archetypes") – Column in adata.obs with archetypal assignments.
test_method (str, default: "mannwhitneyu") – Statistical test method.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') – Scope of FDR correction.
min_effect_size (float, default: 0.05) – Minimum effect size (mean_diff for pathways, log_fc for genes).
min_cells (int, default: 10) – Minimum cells per archetype.
use_pairwise (bool, default: True) – If True, use rigorous pairwise comparisons. If False, use 1-vs-all filtering.
verbose (bool, default: True) – Print progress.
- Returns:
Results with columns:
pathway/gene: Feature identifierarchetype: Exclusive archetypemean_archetype: Mean in exclusive archetypemean_other: Mean in other archetypesmean_diff/log_fold_change: Effect sizeexclusivity_score: Ratio vs max other archetypepvalue,fdr_pvalue,significantpattern_type: ‘exclusive’ or ‘exclusive_pairwise’
- Return type:
pd.DataFrame
Examples
>>> # Pairwise method (more stringent) >>> exclusive = pc.tl.archetype_exclusive_patterns(adata) >>> # 1-vs-all method (more permissive) >>> exclusive = pc.tl.archetype_exclusive_patterns(adata, use_pairwise=False) >>> # Find markers for specific archetype >>> arch3_markers = exclusive[exclusive["archetype"] == "archetype_3"] >>> top_markers = arch3_markers.nlargest(10, "exclusivity_score")
See also
peach.tl.pattern_analysisComprehensive pattern analysis
peach._core.types.ExclusivePatternResultResult row structure
- peach.tl.statistical.specialization_patterns(adata, *, data_obsm_key='pathway_scores', obs_key='archetypes', test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', min_cells=10, verbose=True, **kwargs)[source]#
Identify specialization features relative to centroid archetype.
Compares each archetype to archetype_0 (centroid/generalist) to find features representing specialized states or differentiation away from the central cellular state.
- Parameters:
adata (AnnData) – Annotated data object with archetypal assignments.
data_obsm_key (str, default: "pathway_scores") – Key in adata.obsm for scores.
obs_key (str, default: "archetypes") – Column in adata.obs with archetypal assignments.
test_method (str, default: "mannwhitneyu") – Statistical test method.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') – Scope of FDR correction.
min_cells (int, default: 10) – Minimum cells per archetype.
verbose (bool, default: True) – Print progress.
- Returns:
Results showing specialization from archetype_0.
- Return type:
pd.DataFrame
Notes
Archetype_0 typically represents the centroid or generalist state where cells have balanced contributions from all archetypes. Features elevated in other archetypes relative to archetype_0 represent specialized cellular programs.
Examples
>>> spec = pc.tl.specialization_patterns(adata) >>> # Find archetype_4 specialization features >>> arch4_spec = spec[(spec["archetype"] == "archetype_4") & (spec["significant"])]
See also
peach.tl.archetype_exclusive_patternsExclusive pattern analysis
- peach.tl.statistical.tradeoff_patterns(adata, *, data_obsm_key='pathway_scores', obs_key='archetypes', tradeoffs='pairs', test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', min_cells=10, min_effect_size=0.1, verbose=True, **kwargs)[source]#
Identify mutual exclusivity and tradeoff patterns.
Finds features showing opposing patterns between archetypes, indicating biological tradeoffs or mutually exclusive states.
- Parameters:
adata (AnnData) – Annotated data object with archetypal assignments.
data_obsm_key (str, default: "pathway_scores") – Key in adata.obsm for scores.
obs_key (str, default: "archetypes") – Column in adata.obs with archetypal assignments.
tradeoffs ({'pairs', 'patterns'}, default: 'pairs') –
Type of tradeoff analysis:
'pairs': Binary pairwise (A high, B low)'patterns': Complex multi-archetype (AB high, CD low)
test_method (str, default: "mannwhitneyu") – Statistical test method.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') – Scope of FDR correction.
min_cells (int, default: 10) – Minimum cells per group.
min_effect_size (float, default: 0.1) – Minimum effect size for tradeoffs.
verbose (bool, default: True) – Print progress.
**kwargs –
Additional parameters:
max_pattern_sizeint, default: 2Maximum archetypes per group for complex patterns.
exclude_archetype_0bool, default: TrueExclude archetype_0 from tradeoff patterns.
specific_patternsList[str], optionalTest only specific patterns (e.g., [‘2v3’, ‘1v45’]).
- Returns:
Results with tradeoff patterns:
pattern_code: Visual pattern codehigh_archetypes,low_archetypes: Groupsmean_high,mean_low: Group meanslog_fold_change: Effect sizepattern_complexity: Number of archetypes involved
- Return type:
pd.DataFrame
Examples
>>> # Find pairwise tradeoffs >>> pairs = pc.tl.tradeoff_patterns(adata, tradeoffs="pairs") >>> # Find complex patterns >>> patterns = pc.tl.tradeoff_patterns(adata, tradeoffs="patterns", max_pattern_size=3) >>> # Test specific hypothesis >>> specific = pc.tl.tradeoff_patterns(adata, specific_patterns=["2v3", "1v4"])
See also
peach.tl.archetype_exclusive_patternsExclusive pattern analysis
peach._core.types.PatternAssociationResultResult row structure