peach.tl.gene_associations#
- peach.tl.gene_associations(adata, *, bin_prop=0.1, obsm_key='archetype_distances', obs_key='archetypes', use_layer=None, test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', test_direction='two-sided', min_logfc=0.01, min_cells=10, comparison_group='all', verbose=True, **kwargs)[source]#
Test gene expression associations with archetypal assignments.
Performs Mann-Whitney U tests to identify genes with significantly different expression between each archetype and all other cells (1-vs-all testing paradigm).
- Parameters:
adata (AnnData) –
Annotated data object with:
obsm[obsm_key]: Archetype distance matrix [n_cells, n_archetypes]obs[obs_key]: Archetype assignments (from bin_cells_by_archetype)Xorlayers[use_layer]: Gene expression data
bin_prop (float, default: 0.1) – Proportion of cells closest to each archetype to use for binning.
obsm_key (str, default: "archetype_distances") – Key in adata.obsm containing archetype distance matrix.
obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.
use_layer (str | None, default: None) – Layer for gene expression. If None, uses adata.X. Auto-selects ‘logcounts’ or ‘log1p’ if available.
test_method (str, default: "mannwhitneyu") – Statistical test method. Currently supports ‘mannwhitneyu’.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method: ‘benjamini_hochberg’ or ‘bonferroni’.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') –
Scope of FDR correction:
'global': Correct across all tests (most stringent)'per_archetype': Correct within each archetype'none': No FDR correction (raw p-values)
test_direction (str, default: "two-sided") – Direction of statistical test: ‘two-sided’, ‘greater’, or ‘less’.
min_logfc (float, default: 0.01) – Minimum absolute log fold change threshold for filtering.
min_cells (int, default: 10) – Minimum cells required per archetype for testing.
comparison_group (str, default: 'all') –
Comparison group for statistical tests:
'all': Compare archetype cells vs ALL other cells'archetypes_only': Compare vs cells in other archetypes only (excludes archetype_0 and unassigned cells)
verbose (bool, default: True) – Whether to print progress messages.
- Returns:
Results with columns:
gene: str - Gene symbol/identifierarchetype: str - Archetype identifiern_archetype_cells: int - Cells in archetypen_other_cells: int - Cells in comparison groupmean_archetype: float - Mean expression in archetypemean_other: float - Mean expression in otherslog_fold_change: float - Log fold changestatistic: float - Mann-Whitney U statisticpvalue: float - Raw p-valuefdr_pvalue: float - FDR-corrected p-valuesignificant: bool - Whether FDR < 0.05direction: str - ‘higher’ or ‘lower’ in archetype
- Return type:
pd.DataFrame
- Raises:
ValueError – If required keys not found in adata.
Examples
>>> # Basic usage >>> results = pc.tl.gene_associations(adata) >>> sig_genes = results[results.significant] >>> # Per-archetype FDR correction (less stringent) >>> results = pc.tl.gene_associations(adata, fdr_scope="per_archetype") >>> # Top markers per archetype >>> for arch in results["archetype"].unique(): ... arch_genes = results[ ... (results["archetype"] == arch) & (results["significant"]) & (results["direction"] == "higher") ... ].nlargest(10, "log_fold_change") ... print(f"{arch}: {arch_genes['gene'].tolist()}")
See also
peach.tl.pathway_associationsPathway-level testing
peach.tl.pattern_analysisComprehensive pattern analysis
peach._core.types.GeneAssociationResultResult row structure