peach.tl

peach.tl#

Tools for archetypal analysis.

peach.tl.train_archetypal(adata, n_archetypes=5, n_epochs=50, *, layer=None, pca_key='X_pca', hidden_dims=None, inflation_factor=1.5, model_config=None, optimizer_config=None, device='cpu', save_path=None, archetypal_weight=None, kld_weight=None, reconstruction_weight=0.0, vae_recon_weight=0.0, diversity_weight=0.0, activation_func='relu', track_stability=True, validate_constraints=True, lr_factor=0.1, lr_patience=10, seed=42, constraint_tolerance=0.001, stability_history_size=20, store_coords_key='archetype_coordinates', early_stopping=False, early_stopping_patience=10, early_stopping_metric='archetype_r2', min_improvement=0.0001, validation_check_interval=5, validation_data_loader=None, **kwargs)[source]#

Train Deep Archetypal Analysis model to discover cellular archetypes.

This function performs archetypal analysis using a variational autoencoder architecture to identify extreme cellular states (archetypes) that capture the main axes of biological variation. Each cell is represented as a convex combination of the learned archetypes.

The model uses PCHA initialization with inflation factors for optimal archetype positioning and achieves state-of-the-art performance (R² > 0.89) on real single-cell datasets.

Parameters:

adata (AnnData) – Annotated data object containing single-cell expression data. Must have PCA coordinates in adata.obsm[pca_key]. Typically generated using scanpy.pp.pca(adata).
n_archetypes (int, default: 5) – Number of archetypal patterns to learn. Should be chosen based on biological knowledge or using hyperparameter optimization. Common values: 3-10 for most datasets.
n_epochs (int, default: 50) – Number of training epochs. Larger datasets may require more epochs. For datasets >5K cells, consider 100-200 epochs.
layer (str | None, default: None) – AnnData layer to use for training. If None, uses PCA coordinates from adata.obsm[pca_key] (recommended).
pca_key (str, default: "X_pca") – Key in adata.obsm containing PCA coordinates. The model works best with 5-50 PCA components. Auto-detects: ‘X_pca’, ‘X_PCA’, ‘PCA’.
hidden_dims (list[int] | None, default: None) – Encoder/decoder layer dimensions. If None, uses [256, 128, 64]. Smaller architectures like [128, 64] train faster but may underfit. Larger architectures like [512, 256, 128] may overfit on small datasets.
inflation_factor (float, default: 1.5) – PCHA inflation factor for archetype initialization. Values > 1.0 push initial archetypes further from the data centroid, improving separation. Recommended range: 1.2-2.0. Higher values for more distinct archetypes.
model_config (dict | None, default: None) –
Additional model configuration parameters (for advanced users):
- archetypal_weight : float - Archetypal loss weight, default 0.9
- kld_weight : float - KL divergence weight, default 0.1
- diversity_weight : float - Archetype diversity weight, default 0.05
- use_barycentric : bool - Use softmax constraints, default True
optimizer_config (dict | None, default: None) –
Optimizer configuration parameters:
- lr : float - Learning rate, default 1e-3
- weight_decay : float - L2 regularization, default 0.0
- betas : tuple[float, float] - Adam momentum parameters
device (str, default: "cpu") – Compute device for training. One of “cpu”, “cuda”, “mps”.
save_path (str | None, default: None) – Path to save model checkpoints during training.
archetypal_weight (float | None, default: None) – Weight for archetypal loss component. Uses model’s configured value if None.
kld_weight (float | None, default: None) – Weight for KL divergence loss. Uses model’s configured value if None.
reconstruction_weight (float, default: 0.0) – Legacy reconstruction weight parameter.
vae_recon_weight (float, default: 0.0) – VAE reconstruction weight.
diversity_weight (float, default: 0.0) – Weight for archetype diversity loss.
activation_func (str, default: "relu") – Activation function for the model.
track_stability (bool, default: True) – Whether to monitor archetype drift during training. Adds stability metrics to history: archetype_drift_mean, archetype_variance_mean, etc.
validate_constraints (bool, default: True) – Whether to validate archetypal constraints during training. Adds constraint metrics to history: constraints_satisfied, A_sum_error, etc.
lr_factor (float, default: 0.1) – Factor for learning rate reduction on plateau.
lr_patience (int, default: 10) – Number of epochs with no improvement before reducing learning rate.
seed (int, default: 42) – Random seed for reproducibility.
constraint_tolerance (float, default: 1e-3) – Tolerance for constraint validation.
stability_history_size (int, default: 20) – Number of epochs to track for stability analysis.
store_coords_key (str, default: "archetype_coordinates") – Key to store learned archetype positions in adata.uns.
early_stopping (bool, default: False) – Whether to use early stopping based on validation metrics.
early_stopping_patience (int, default: 10) – Patience for early stopping (number of checks without improvement).
early_stopping_metric (str, default: "archetype_r2") – Metric to monitor for early stopping. One of: ‘archetype_r2’, ‘loss’, ‘rmse’.
min_improvement (float, default: 1e-4) – Minimum improvement required to reset patience counter.
validation_check_interval (int, default: 5) – How often to check validation metrics (in epochs).
validation_data_loader (DataLoader | None, default: None) – Validation data loader for early stopping. Uses training data if None.
**kwargs – Additional arguments passed to the core training function.

Returns:

Training results dictionary with the following structure:

Guaranteed keys (always present):

historydict
Training metrics per epoch. Keys depend on tracking options:
- Core metrics (always): loss, archetypal_loss, archetype_r2, rmse
- KLD metrics: kld_loss, KLD
- Stability metrics (if track_stability=True): archetype_drift_mean, archetype_drift_max, archetype_variance_mean, etc.
- Constraint metrics (if validate_constraints=True): constraints_satisfied, A_sum_error, B_sum_error, constraint_violation_rate
- Validation metrics (if early_stopping=True): val_loss, val_archetype_r2
final_modeltorch.nn.Module
Trained Deep_AA model instance.
modeltorch.nn.Module
Alias for final_model (same object, for compatibility).
final_optimizertorch.optim.Optimizer
Final optimizer state.
final_analysisdict
Final training analysis containing:
- final_constraint_validation : dict with constraint metrics
- archetypal_weights : dict with A_matrix and B_matrix analysis
- final_coordinates : dict with ‘A’, ‘B’, ‘Y’ tensors
- error : str (only if analysis failed)
epoch_archetype_positionslist[torch.Tensor]
Archetype positions at each epoch, shape (n_archetypes, input_dim).
training_configdict
Training configuration with keys: n_epochs, actual_epochs, early_stop_triggered, archetypal_weight, kld_weight, reconstruction_weight, activation_func, seed, constraint_tolerance, stability_history_size, early_stopping, early_stopping_patience, early_stopping_metric.

Convenience keys (conditional - use .get() to access safely):

final_archetype_r2float | None
Last value of history['archetype_r2'] if tracked.
final_rmsefloat | None
Last value of history['rmse'] if tracked.
final_maefloat | None
Last value of history['mae'] if tracked.
final_lossfloat | None
Last value of history['loss'] if tracked.
convergence_epochint | None
Equals training_config['actual_epochs'].

Return type:

dict

Raises:

ValueError – If adata.obsm[pca_key] is not found. Run scanpy.pp.pca() first. If n_archetypes exceeds PCA dimensions.
RuntimeError – If CUDA device is requested but not available.
Stores –
------ –
The function stores the following in AnnData: –

:raises - adata.uns[store_coords_key] : np.ndarray: Archetype positions in PCA space, shape (n_archetypes, n_pcs). Default key: ‘archetype_coordinates’.

Notes

Archetypal Analysis Theory: Archetypal analysis represents each data point as a convex combination of extreme points (archetypes). Unlike clustering, which partitions data, archetypal analysis allows cells to have partial membership in multiple archetypes, better reflecting biological continuity.

Model Architecture: Uses a variational autoencoder where the latent space directly represents archetypal coordinates (A matrix). Archetypes are learned as model parameters (Y matrix) rather than constructed from data points.

Accessing Results Safely:

For guaranteed keys, direct access works: results['history']
For convenience keys, use .get(): results.get('final_archetype_r2')
Or access from history: results['history']['archetype_r2'][-1]

Type Validation: For IDE autocomplete and runtime validation:

from peach._core.types import TrainingResults, validate_training_results

validated = validate_training_results(results)

Examples

Basic usage with default parameters:

>>> import scanpy as sc
>>> import peach as pc
>>> # Prepare data with PCA
>>> sc.pp.pca(adata, n_comps=30)
>>> # Train archetypal model
>>> results = pc.tl.train_archetypal(adata, n_archetypes=5, n_epochs=100)
>>> # Access final R² safely (may be None if not tracked)
>>> r2 = results.get("final_archetype_r2")
>>> if r2 is not None:
...     print(f"Final R²: {r2:.3f}")
>>> # Or access from history (guaranteed if metric was tracked)
>>> if results["history"].get("archetype_r2"):
...     r2 = results["history"]["archetype_r2"][-1]
...     print(f"Final R²: {r2:.3f}")

Advanced usage with custom configuration:

>>> model_config = {"hidden_dims": [512, 256, 128], "inflation_factor": 2.0}
>>> results = pc.tl.train_archetypal(
...     adata,
...     n_archetypes=4,
...     n_epochs=150,
...     model_config=model_config,
...     early_stopping=True,
...     early_stopping_patience=15,
...     device="cuda",
... )
>>> # Check if early stopping triggered
>>> config = results["training_config"]
>>> if config["early_stop_triggered"]:
...     print(f"Converged at epoch {config['actual_epochs']}")

Accessing archetype coordinates stored in AnnData:

>>> # After training, coordinates are in adata.uns
>>> archetype_coords = adata.uns["archetype_coordinates"]
>>> print(f"Learned {archetype_coords.shape[0]} archetypes")
>>> print(f"Each archetype has {archetype_coords.shape[1]} PCA dimensions")

See also

peach.tl.archetypal_coordinates: Extract cell-archetype distances
peach.tl.assign_archetypes: Assign cells to discovered archetypes
peach.tl.extract_archetype_weights: Get barycentric coordinates for cells
peach.pl.training_metrics: Visualize training curves
peach._core.types.TrainingResults: Type definition for return structure

peach.tl.archetypal_coordinates(adata, *, pca_key='X_pca', archetype_coords_key='archetype_coordinates', obsm_key='archetype_distances', uns_prefix='archetype', verbose=True, **kwargs)[source]#

Extract archetypal coordinates for all cells.

Parameters:

adata (AnnData) – Annotated data object with trained model coordinates
pca_key (str, default: "X_pca") – Key in adata.obsm containing PCA coordinates
archetype_coords_key (str, default: "archetype_coordinates") – Key in adata.uns containing archetype coordinates
obsm_key (str, default: "archetype_distances") – Key to store distance matrix in adata.obsm
uns_prefix (str, default: "archetype") – Prefix for keys stored in adata.uns
verbose (bool, default: True) – Whether to print progress messages
**kwargs – Additional arguments passed to compute_archetype_distances

Returns:

Dictionary with archetypal coordinates and distances

Return type:

dict

peach.tl.assign_archetypes(adata, *, percentage_per_archetype=0.1, obsm_key='archetype_distances', obs_key='archetypes', include_central_archetype=True, verbose=True, **kwargs)[source]#

Assign cells to archetypes based on distances.

Parameters:

adata (AnnData) – Annotated data object with archetype distances
percentage_per_archetype (float, default: 0.1) – Percentage of cells to assign to each archetype
obsm_key (str, default: "archetype_distances") – Key in adata.obsm containing distance matrix
obs_key (str, default: "archetypes") – Key to store assignments in adata.obs
include_central_archetype (bool, default: True) – Whether to include a central archetype (cells far from all extreme archetypes)
verbose (bool, default: True) – Whether to print progress messages
**kwargs – Additional arguments passed to bin_cells_by_archetype

Return type:

None

peach.tl.extract_archetype_weights(adata, model=None, *, pca_key='X_pca', weights_key='cell_archetype_weights', batch_size=256, device='cpu', verbose=True)[source]#

Extract cell archetype weights from trained Deep_AA model.

This function computes the barycentric coordinates (weights) for each cell that describe how it’s composed of the learned archetypes.

Parameters:

adata (AnnData) – Annotated data object with PCA coordinates
model (Deep_AA model, optional) – Trained model. If None, will look for model in adata.uns[‘trained_model’]
pca_key (str, default: "X_pca") – Key in adata.obsm containing PCA coordinates
weights_key (str, default: "cell_archetype_weights") – Key to store weights in adata.obsm
batch_size (int, default: 256) – Batch size for processing
device (str, default: "cpu") – Device for computation (‘cpu’, ‘cuda’, or ‘mps’)
verbose (bool, default: True) – Whether to print progress

Returns:

Cell archetype weights of shape (n_cells, n_archetypes) Also stores weights in adata.obsm[weights_key]

Return type:

np.ndarray

Examples

>>> # After training
>>> results = pc.tl.train_archetypal(adata, n_archetypes=5)
>>>
>>> # Extract weights
>>> weights = pc.tl.extract_archetype_weights(adata, results["model"])
>>>
>>> # Weights are now in adata.obsm['cell_archetype_weights']
>>> print(adata.obsm["cell_archetype_weights"].shape)

peach.tl.compute_conditional_centroids(adata, condition_column, *, pca_key='X_pca', store_key='conditional_centroids', exclude_archetypes=None, groupby=None, verbose=True)[source]#

Compute centroid positions in PCA space for each level of a categorical condition.

This function calculates the mean position (centroid) in PCA space for cells belonging to each level of a categorical variable. Useful for visualizing how different conditions (e.g., treatment phases, timepoints) relate to the archetypal structure.

Following R template patterns: - Uses ALL PCs for centroid calculation (equivalent to R’s colMeans) - Stores full PC centroid but extracts first 3 for 3D visualization - Excludes ‘no_archetype’ and ‘archetype_0’ cells by default

Parameters:

adata (AnnData) – Annotated data object with PCA coordinates in adata.obsm[pca_key].
condition_column (str) – Name of categorical column in adata.obs to group by. Examples: ‘treatment_phase’, ‘timepoint’, ‘batch’.
pca_key (str, default: "X_pca") – Key in adata.obsm containing PCA coordinates.
store_key (str, default: "conditional_centroids") – Key in adata.uns to store results.
exclude_archetypes (list, optional) – Archetype labels to exclude from centroid calculation. Default: [‘no_archetype’, ‘archetype_0’] (following R template). Set to empty list [] to include all cells.
groupby (str, optional) – Second categorical column for multi-group trajectories. If provided, centroids are computed for each (group, level) combination. Example: groupby=’response_group’ to get separate trajectories per response.
verbose (bool, default: True) – Whether to print progress messages.

Returns:

Dictionary with keys:

condition_column : str - name of the condition column
n_levels : int - number of unique levels
levels : List[str] - list of level names
centroids : Dict[str, List[float]] - level → full PCA coordinates
centroids_3d : Dict[str, List[float]] - level → [x, y, z] first 3 PCs
cell_counts : Dict[str, int] - level → cell count
pca_key : str - PCA key used
exclude_archetypes : List[str] - archetypes excluded
groupby : Optional[str] - groupby column if used
group_centroids : Optional[Dict] - if groupby: {group: {level: coords}}
group_centroids_3d : Optional[Dict] - if groupby: {group: {level: [x,y,z]}}
group_cell_counts : Optional[Dict] - if groupby: {group: {level: count}}

Return type:

dict

Raises:

ValueError – If condition_column not in adata.obs or PCA coordinates not found.
Stores –
------ –
The function stores results in AnnData: –

:raises - adata.uns[store_key][condition_column] : dict: Full results dictionary as returned.

Examples

>>> # Compute centroids for treatment phase
>>> result = pc.tl.compute_conditional_centroids(adata, "treatment_phase")
>>> print(result["centroids_3d"])
{'chemo-naive': [1.2, 0.5, -0.3], 'IDS': [0.8, 1.1, 0.2]}

>>> # Then visualize with trajectory
>>> fig = pc.pl.archetypal_space(
...     adata, show_centroids=True, centroid_condition="treatment_phase", centroid_order=["chemo-naive", "IDS"]
... )

>>> # Multi-group centroids for trajectory comparison
>>> result = pc.tl.compute_conditional_centroids(adata, "treatment_phase", groupby="response_group")
>>> fig = pc.pl.archetypal_space(
...     adata,
...     show_centroids=True,
...     centroid_condition="treatment_phase",
...     centroid_groupby="response_group",
...     centroid_order=["chemo-naive", "IDS"],
...     centroid_colors={"long": "magenta", "short": "cyan"},
... )

See also

peach.pl.archetypal_space: Visualize with centroid trajectory overlay

peach.tl.assign_to_centroids(adata, condition_column, *, pca_key='X_pca', centroid_key='conditional_centroids', bin_prop=0.15, obs_key='centroid_assignments', exclude_archetypes=None, verbose=True)[source]#

Assign cells to nearest centroid based on distance (top bin_prop% closest).

This function mirrors assign_archetypes but for condition-based centroids. It enables using treatment phase centroids as trajectory endpoints in single_trajectory_analysis by creating categorical assignments that CellRank can use as terminal states.

Parameters:

adata (AnnData) – Annotated data object. Must have: - PCA coordinates in adata.obsm[pca_key] - Centroids computed via compute_conditional_centroids in adata.uns[centroid_key]
condition_column (str) – Name of the condition column used in compute_conditional_centroids. This identifies which centroid set to use.
pca_key (str, default: "X_pca") – Key in adata.obsm containing PCA coordinates.
centroid_key (str, default: "conditional_centroids") – Key in adata.uns containing centroid results from compute_conditional_centroids.
bin_prop (float, default: 0.15) – Proportion of cells to assign to each centroid (top 15% closest). Similar to percentage_per_archetype in assign_archetypes.
obs_key (str, default: "centroid_assignments") – Key in adata.obs to store assignments.
exclude_archetypes (list, optional) – Archetype labels to exclude from assignment. Default: [‘no_archetype’] - these cells get ‘unassigned’.
verbose (bool, default: True) – Whether to print progress messages.

Returns:

Modifies adata.obs[obs_key] with Categorical assignments. Values are condition levels (e.g., ‘chemo_naive’, ‘IDS’) or ‘unassigned’.

Return type:

None

Examples

>>> # First compute centroids for treatment phases
>>> pc.tl.compute_conditional_centroids(adata, "treatment_stage")
>>>
>>> # Then assign cells to nearest centroid (top 15% closest)
>>> pc.tl.assign_to_centroids(adata, "treatment_stage", bin_prop=0.15)
>>>
>>> # Check assignments
>>> print(adata.obs["centroid_assignments"].value_counts())
>>>
>>> # Now can use with CellRank for trajectory analysis
>>> # (setup_cellrank can use centroid_assignments as terminal states)

See also

compute_conditional_centroids: Compute centroids for condition levels
assign_archetypes: Similar function for archetype assignments
single_trajectory_analysis: Uses centroid assignments for trajectory analysis

peach.tl.gene_associations(adata, *, bin_prop=0.1, obsm_key='archetype_distances', obs_key='archetypes', use_layer=None, test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', test_direction='two-sided', min_logfc=0.01, min_cells=10, comparison_group='all', verbose=True, **kwargs)[source]#

Test gene expression associations with archetypal assignments.

Performs Mann-Whitney U tests to identify genes with significantly different expression between each archetype and all other cells (1-vs-all testing paradigm).

Parameters:

adata (AnnData) –
Annotated data object with:
- obsm[obsm_key] : Archetype distance matrix [n_cells, n_archetypes]
- obs[obs_key] : Archetype assignments (from bin_cells_by_archetype)
- X or layers[use_layer] : Gene expression data
bin_prop (float, default: 0.1) – Proportion of cells closest to each archetype to use for binning.
obsm_key (str, default: "archetype_distances") – Key in adata.obsm containing archetype distance matrix.
obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.
use_layer (str | None, default: None) – Layer for gene expression. If None, uses adata.X. Auto-selects ‘logcounts’ or ‘log1p’ if available.
test_method (str, default: "mannwhitneyu") – Statistical test method. Currently supports ‘mannwhitneyu’.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method: ‘benjamini_hochberg’ or ‘bonferroni’.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') –
Scope of FDR correction:
- 'global' : Correct across all tests (most stringent)
- 'per_archetype' : Correct within each archetype
- 'none' : No FDR correction (raw p-values)
test_direction (str, default: "two-sided") – Direction of statistical test: ‘two-sided’, ‘greater’, or ‘less’.
min_logfc (float, default: 0.01) – Minimum absolute log fold change threshold for filtering.
min_cells (int, default: 10) – Minimum cells required per archetype for testing.
comparison_group (str, default: 'all') –
Comparison group for statistical tests:
- 'all' : Compare archetype cells vs ALL other cells
- 'archetypes_only' : Compare vs cells in other archetypes only (excludes archetype_0 and unassigned cells)
verbose (bool, default: True) – Whether to print progress messages.

Returns:

Results with columns:

gene : str - Gene symbol/identifier
archetype : str - Archetype identifier
n_archetype_cells : int - Cells in archetype
n_other_cells : int - Cells in comparison group
mean_archetype : float - Mean expression in archetype
mean_other : float - Mean expression in others
log_fold_change : float - Log fold change
statistic : float - Mann-Whitney U statistic
pvalue : float - Raw p-value
fdr_pvalue : float - FDR-corrected p-value
significant : bool - Whether FDR < 0.05
direction : str - ‘higher’ or ‘lower’ in archetype

Return type:

pd.DataFrame

Raises:

ValueError – If required keys not found in adata.

Examples

>>> # Basic usage
>>> results = pc.tl.gene_associations(adata)
>>> sig_genes = results[results.significant]
>>> # Per-archetype FDR correction (less stringent)
>>> results = pc.tl.gene_associations(adata, fdr_scope="per_archetype")
>>> # Top markers per archetype
>>> for arch in results["archetype"].unique():
...     arch_genes = results[
...         (results["archetype"] == arch) & (results["significant"]) & (results["direction"] == "higher")
...     ].nlargest(10, "log_fold_change")
...     print(f"{arch}: {arch_genes['gene'].tolist()}")

See also

peach.tl.pathway_associations: Pathway-level testing
peach.tl.pattern_analysis: Comprehensive pattern analysis
peach._core.types.GeneAssociationResult: Result row structure

peach.tl.pathway_associations(adata, *, pathway_obsm_key='pathway_scores', obsm_key='archetype_distances', obs_key='archetypes', test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', test_direction='two-sided', min_logfc=0.01, min_cells=10, comparison_group='all', verbose=True, **kwargs)[source]#

Test pathway activity associations with archetypal assignments.

Performs Mann-Whitney U tests to identify pathways with significantly different activity between each archetype and all other cells.

Parameters:

adata (AnnData) –
Annotated data object with:
- obsm[pathway_obsm_key] : Pathway scores [n_cells, n_pathways]
- obsm[obsm_key] : Archetype distance matrix
- obs[obs_key] : Archetype assignments
- uns[pathway_obsm_key + '_pathways'] : Pathway names (optional)
pathway_obsm_key (str, default: "pathway_scores") – Key in adata.obsm containing pathway activity scores.
obsm_key (str, default: "archetype_distances") – Key in adata.obsm containing archetype distance matrix.
obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.
test_method (str, default: "mannwhitneyu") – Statistical test method.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') – Scope of FDR correction.
test_direction (str, default: "two-sided") – Direction of statistical test.
min_logfc (float, default: 0.01) – Minimum effect size threshold (mean_diff for pathways).
min_cells (int, default: 10) – Minimum cells required per archetype.
comparison_group (str, default: 'all') – Comparison group: ‘all’ or ‘archetypes_only’.
verbose (bool, default: True) – Whether to print progress.

Returns:

Results with columns:

pathway : str - Pathway name
archetype : str - Archetype identifier
n_archetype_cells : int - Cells in archetype
n_other_cells : int - Cells in comparison
mean_archetype : float - Mean score in archetype
mean_other : float - Mean score in others
mean_diff : float - Mean difference (primary effect size)
log_fold_change : float - Alias for mean_diff
statistic : float - Test statistic
pvalue : float - Raw p-value
fdr_pvalue : float - FDR-corrected p-value
significant : bool - Whether significant
direction : str - ‘higher’ or ‘lower’

Return type:

pd.DataFrame

Notes

Pathway scores (from AUCell, pySCENIC, etc.) represent activity levels, not expression counts. Mean difference is more interpretable than log fold change for these scores.

Examples

>>> # Basic usage
>>> results = pc.tl.pathway_associations(adata)
>>> # Filter for specific pathway categories
>>> metabolism = results[results["pathway"].str.contains("METABOLISM", case=False)]
>>> # Top pathways per archetype
>>> for arch in results["archetype"].unique():
...     top = results[(results["archetype"] == arch) & (results["significant"])].nlargest(5, "mean_diff")
...     print(f"{arch}: {top['pathway'].tolist()}")

See also

peach.tl.gene_associations: Gene-level testing
peach._core.types.PathwayAssociationResult: Result row structure

peach.tl.pattern_analysis(adata, *, data_obsm_key='pathway_scores', obs_key='archetypes', include_individual_tests=True, include_pattern_tests=True, include_exclusivity_analysis=True, verbose=True, **kwargs)[source]#

Comprehensive archetypal pattern analysis.

Performs systematic analysis combining three complementary approaches:

Individual tests: Standard 1-vs-all archetype characterization
Pattern tests: Systematic archetype combination testing (specialists, binary tradeoffs, complex patterns)
Exclusivity analysis: Features with opposing patterns

Parameters:

adata (AnnData) – Annotated data object with archetypal assignments and scores.
data_obsm_key (str, default: "pathway_scores") – Key in adata.obsm containing scores for pattern analysis.
obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.
include_individual_tests (bool, default: True) – Run individual archetype 1-vs-all tests.
include_pattern_tests (bool, default: True) – Run systematic pattern tests (specialists, tradeoffs).
include_exclusivity_analysis (bool, default: True) – Analyze mutual exclusivity patterns.
verbose (bool, default: True) – Print analysis progress.

Returns:

Dictionary with keys:

'individual' : Individual archetype results
'patterns' : Pattern-based test results
'exclusivity' : Mutual exclusivity results

Return type:

dict[str, pd.DataFrame]

Notes

Pattern Types in ‘patterns’ DataFrame:

specialization : Archetype vs archetype_0 (centroid)
tradeoff : Multi-archetype high vs low groups

Pattern Code Format: “12xxx_xx345”

Position = archetype number (0, 1, 2…)
Numbers = high archetypes
‘x’ = low archetypes
Underscore separates high from low group

Examples

>>> # Run comprehensive analysis
>>> results = pc.tl.pattern_analysis(adata)
>>> # Access individual results
>>> individual = results["individual"]
>>> # Find specialists (exclusive to one archetype)
>>> patterns = results["patterns"]
>>> specialists = patterns[patterns["pattern_type"] == "specialization"]
>>> # Find mutual exclusivity patterns
>>> if not results["exclusivity"].empty:
...     exclusive = results["exclusivity"]
...     top_tradeoffs = exclusive.nlargest(10, "effect_range")

See also

peach.tl.archetype_exclusive_patterns: Focused exclusive pattern analysis
peach.tl.specialization_patterns: Centroid comparison analysis
peach.tl.tradeoff_patterns: Mutual exclusivity analysis
peach._core.types.ComprehensivePatternResults: Return type structure

peach.tl.conditional_associations(adata, *, obs_column, archetype_assignments=None, obs_key='archetypes', test_method='hypergeometric', fdr_method='benjamini_hochberg', min_cells=5, verbose=True, **kwargs)[source]#

Test associations between archetypes and categorical metadata.

Performs hypergeometric tests to identify significant enrichment of archetypes within different categorical conditions (samples, treatments, cell types, etc.).

Parameters:

adata (AnnData) –
Annotated data object with:
- obs[obs_key] : Archetype assignments
- obs[obs_column] : Categorical variable to test
obs_column (str) – Column name in adata.obs containing categorical variable.
archetype_assignments (None, optional) – Deprecated. Archetype assignments now read from adata.obs[obs_key].
obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.
test_method (str, default: "hypergeometric") – Statistical test method (currently only ‘hypergeometric’).
fdr_method (str, default: "benjamini_hochberg") – FDR correction method.
min_cells (int, default: 5) – Minimum cells required per archetype-condition combination.
verbose (bool, default: True) – Whether to print progress.

Returns:

Results with columns:

archetype : str - Archetype identifier
condition : str - Condition value from obs_column
observed : int - Observed count in overlap
expected : float - Expected count under null
total_archetype : int - Total cells in archetype
total_condition : int - Total cells in condition
odds_ratio : float - Enrichment measure (>1 = enriched)
ci_lower : float - Lower 95% CI for odds ratio
ci_upper : float - Upper 95% CI for odds ratio
pvalue : float - Hypergeometric p-value
fdr_pvalue : float - FDR-corrected p-value
significant : bool - Whether significant

Return type:

pd.DataFrame

Examples

>>> # Test sample associations
>>> results = pc.tl.conditional_associations(adata, obs_column="sample")
>>> # Find enriched archetypes per condition
>>> enriched = results[(results["significant"]) & (results["odds_ratio"] > 2)]
>>> # Test treatment effects
>>> treatment_results = pc.tl.conditional_associations(adata, obs_column="treatment")

See also

peach._core.types.ConditionalAssociationResult: Result row structure

peach.tl.archetype_exclusive_patterns(adata, *, data_obsm_key='pathway_scores', obs_key='archetypes', test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', min_effect_size=0.05, min_cells=10, use_pairwise=True, verbose=True, **kwargs)[source]#

Identify features exclusively high in single archetypes.

Finds genes or pathways specifically elevated in only one archetype compared to all others. Supports two methods:

Pairwise (default): Tests each archetype vs every other archetype individually. Feature is exclusive if significantly higher vs ALL others. More stringent.
1-vs-all filtering: Tests each archetype vs all other cells. Feature is exclusive if significant in only ONE archetype’s test. More permissive, higher statistical power.

Parameters:

adata (AnnData) – Annotated data object with archetypal assignments.
data_obsm_key (str, default: "pathway_scores") – Key in adata.obsm for scores. Use None for gene expression.
obs_key (str, default: "archetypes") – Column in adata.obs with archetypal assignments.
test_method (str, default: "mannwhitneyu") – Statistical test method.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') – Scope of FDR correction.
min_effect_size (float, default: 0.05) – Minimum effect size (mean_diff for pathways, log_fc for genes).
min_cells (int, default: 10) – Minimum cells per archetype.
use_pairwise (bool, default: True) – If True, use rigorous pairwise comparisons. If False, use 1-vs-all filtering.
verbose (bool, default: True) – Print progress.

Returns:

Results with columns:

pathway/gene : Feature identifier
archetype : Exclusive archetype
mean_archetype : Mean in exclusive archetype
mean_other : Mean in other archetypes
mean_diff/log_fold_change : Effect size
exclusivity_score : Ratio vs max other archetype
pvalue, fdr_pvalue, significant
pattern_type : ‘exclusive’ or ‘exclusive_pairwise’

Return type:

pd.DataFrame

Examples

>>> # Pairwise method (more stringent)
>>> exclusive = pc.tl.archetype_exclusive_patterns(adata)
>>> # 1-vs-all method (more permissive)
>>> exclusive = pc.tl.archetype_exclusive_patterns(adata, use_pairwise=False)
>>> # Find markers for specific archetype
>>> arch3_markers = exclusive[exclusive["archetype"] == "archetype_3"]
>>> top_markers = arch3_markers.nlargest(10, "exclusivity_score")

See also

peach.tl.pattern_analysis: Comprehensive pattern analysis
peach._core.types.ExclusivePatternResult: Result row structure

peach.tl.specialization_patterns(adata, *, data_obsm_key='pathway_scores', obs_key='archetypes', test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', min_cells=10, verbose=True, **kwargs)[source]#

Identify specialization features relative to centroid archetype.

Compares each archetype to archetype_0 (centroid/generalist) to find features representing specialized states or differentiation away from the central cellular state.

Parameters:

adata (AnnData) – Annotated data object with archetypal assignments.
data_obsm_key (str, default: "pathway_scores") – Key in adata.obsm for scores.
obs_key (str, default: "archetypes") – Column in adata.obs with archetypal assignments.
test_method (str, default: "mannwhitneyu") – Statistical test method.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') – Scope of FDR correction.
min_cells (int, default: 10) – Minimum cells per archetype.
verbose (bool, default: True) – Print progress.

Returns:

Results showing specialization from archetype_0.

Return type:

pd.DataFrame

Notes

Archetype_0 typically represents the centroid or generalist state where cells have balanced contributions from all archetypes. Features elevated in other archetypes relative to archetype_0 represent specialized cellular programs.

Examples

>>> spec = pc.tl.specialization_patterns(adata)
>>> # Find archetype_4 specialization features
>>> arch4_spec = spec[(spec["archetype"] == "archetype_4") & (spec["significant"])]

See also

peach.tl.archetype_exclusive_patterns: Exclusive pattern analysis

peach.tl.tradeoff_patterns(adata, *, data_obsm_key='pathway_scores', obs_key='archetypes', tradeoffs='pairs', test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', min_cells=10, min_effect_size=0.1, verbose=True, **kwargs)[source]#

Identify mutual exclusivity and tradeoff patterns.

Finds features showing opposing patterns between archetypes, indicating biological tradeoffs or mutually exclusive states.

Parameters:

adata (AnnData) – Annotated data object with archetypal assignments.
data_obsm_key (str, default: "pathway_scores") – Key in adata.obsm for scores.
obs_key (str, default: "archetypes") – Column in adata.obs with archetypal assignments.
tradeoffs ({'pairs', 'patterns'}, default: 'pairs') –
Type of tradeoff analysis:
- 'pairs' : Binary pairwise (A high, B low)
- 'patterns' : Complex multi-archetype (AB high, CD low)
test_method (str, default: "mannwhitneyu") – Statistical test method.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') – Scope of FDR correction.
min_cells (int, default: 10) – Minimum cells per group.
min_effect_size (float, default: 0.1) – Minimum effect size for tradeoffs.
verbose (bool, default: True) – Print progress.
**kwargs –
Additional parameters:
- max_pattern_sizeint, default: 2
  Maximum archetypes per group for complex patterns.
- exclude_archetype_0bool, default: True
  Exclude archetype_0 from tradeoff patterns.
- specific_patternsList[str], optional
  Test only specific patterns (e.g., [‘2v3’, ‘1v45’]).

Returns:

Results with tradeoff patterns:

pattern_code : Visual pattern code
high_archetypes, low_archetypes : Groups
mean_high, mean_low : Group means
log_fold_change : Effect size
pattern_complexity : Number of archetypes involved

Return type:

pd.DataFrame

Examples

>>> # Find pairwise tradeoffs
>>> pairs = pc.tl.tradeoff_patterns(adata, tradeoffs="pairs")
>>> # Find complex patterns
>>> patterns = pc.tl.tradeoff_patterns(adata, tradeoffs="patterns", max_pattern_size=3)
>>> # Test specific hypothesis
>>> specific = pc.tl.tradeoff_patterns(adata, specific_patterns=["2v3", "1v4"])

See also

peach.tl.archetype_exclusive_patterns: Exclusive pattern analysis
peach._core.types.PatternAssociationResult: Result row structure

peach.tl.hyperparameter_search(adata, *, n_archetypes_range=[3, 4, 5, 6], cv_folds=3, max_epochs_cv=15, pca_key='X_pca', device='cpu', base_model_config=None, **kwargs)[source]#

Perform cross-validation hyperparameter search for archetypal analysis.

Systematically searches hyperparameter space using K-fold cross-validation to find optimal model configurations. Results support manual selection of the best configuration for final training.

Parameters:

adata (AnnData) – Annotated data object with PCA coordinates in adata.obsm[pca_key]. Run scanpy.pp.pca(adata) first.
n_archetypes_range (list[int], default: [3, 4, 5, 6]) – Range of archetype numbers to test. Each value is evaluated via cross-validation.
cv_folds (int, default: 3) – Number of cross-validation folds. Higher values give more reliable estimates but take longer.
max_epochs_cv (int, default: 15) – Maximum training epochs per CV fold. Early stopping typically triggers before this limit.
pca_key (str, default: "X_pca") – Key in adata.obsm containing PCA coordinates. Auto-detects: ‘X_pca’, ‘X_PCA’, ‘PCA’.
device (str, default: "cpu") – Computing device (‘cpu’, ‘cuda’, or ‘mps’). Default is ‘cpu’ for stability across platforms.
base_model_config (dict | None, default: None) –
Additional base model configuration. If None, uses defaults:
- input_dim : Auto-detected from PCA dimensions
- barycentric_mode : True
- device : From device parameter
**kwargs –
Additional arguments passed to SearchConfig:
- hidden_dims_options : list[list[int]] - Architectures to test
- inflation_factor_range : list[float] - Inflation factors to test
- speed_preset : str - “fast”, “balanced”, or “thorough”
- use_pcha_init : bool - Use PCHA initialization
- subsample_fraction : float - Subsampling for large datasets
- max_cells_cv : int - Maximum cells for CV
- random_state : int - Random seed

Returns:

Complete cross-validation results with analysis methods:

Attributes:

config_results : dict[str, CVResults] - Per-configuration results
summary_df : pd.DataFrame - Summary table
ranked_configs : list[dict] - Configs ranked by R²
cv_info : dict - Search metadata

Methods:

summary_report() : str - Text summary for decision support
rank_by_metric(metric) : list[dict] - Rank by any metric
plot_elbow_r2() : Figure - Primary visualization
plot_metric(metric) : Figure - Generic metric visualization
save(path) / load(path) - Persistence

Ranked config structure:

Each dict in ranked_configs contains:

hyperparameters : dict with n_archetypes, hidden_dims, etc.
metric_value : float - R² value
std_error : float - Standard error across folds
config_summary : str - Human-readable description

Return type:

CVSummary

Raises:

ValueError – If adata.obsm[pca_key] not found.

Notes

Workflow Position: This is Phase 2 of the PEACH pipeline. After finding good hyperparameters here, manually select the best configuration (Phase 3) and train the final model with pc.tl.train_archetypal() (Phase 4).

Large Datasets: Datasets larger than max_cells_cv (default 15000) are automatically subsampled for CV. This doesn’t affect final training.

Selecting Best Configuration: Use summary_report() for a quick overview, rank_by_metric() for detailed rankings, and plot_elbow_r2() to visualize the elbow curve.

Examples

Basic hyperparameter search:

>>> import scanpy as sc
>>> import peach as pc
>>> # Prepare data
>>> sc.pp.pca(adata, n_comps=30)
>>> # Search hyperparameters
>>> cv_summary = pc.tl.hyperparameter_search(adata, n_archetypes_range=[3, 4, 5, 6], cv_folds=5, device="cuda")
>>> # Review results
>>> print(cv_summary.summary_report())

Analyze and visualize results:

>>> # Get top configurations
>>> top_configs = cv_summary.rank_by_metric("archetype_r2")[:3]
>>> for config in top_configs:
...     print(f"{config['config_summary']}: R²={config['metric_value']:.4f}")
>>> # Elbow curve
>>> fig = cv_summary.plot_elbow_r2()
>>> fig.show()
>>> # Compare metrics
>>> fig = cv_summary.plot_metric("rmse")
>>> fig.show()

Use selected hyperparameters for final training:

>>> # Select best configuration
>>> best_config = top_configs[0]["hyperparameters"]
>>> n_archetypes = best_config["n_archetypes"]
>>> # Train final model (Phase 4)
>>> results = pc.tl.train_archetypal(
...     adata, n_archetypes=n_archetypes, n_epochs=200, model_config={"hidden_dims": best_config["hidden_dims"]}
... )

Save and load results:

>>> # Save for later
>>> cv_summary.save("cv_results.pkl")
>>> # Load in new session
>>> from peach._core.utils.grid_search_results import CVSummary
>>> cv_summary = CVSummary.load("cv_results.pkl")

See also

peach.tl.train_archetypal: Train final model with selected hyperparameters
peach._core.utils.hyperparameter_search.SearchConfig: Full configuration options
peach._core.utils.grid_search_results.CVSummary: Return type details

class peach.tl.SearchConfig(n_archetypes_range=None, hidden_dims_options=None, cv_folds=5, max_epochs_cv=100, early_stopping_patience=5, subsample_fraction=0.5, max_cells_cv=15000, speed_preset='balanced', use_pcha_init=True, inflation_factor_range=None, random_state=42)[source]#

Bases: object

Configuration for hyperparameter search space and CV settings.

Defines the hyperparameter grid to search and cross-validation parameters. Speed presets automatically adjust epochs and early stopping for different use cases.

Parameters:

n_archetypes_range (list[int] | None, default: None) – Range of archetype numbers to test. If None, defaults to [2, 3, 4, 5, 6, 7].
hidden_dims_options (list[list[int]] | None, default: None) – Network architectures to test. If None, defaults to standard options: [[128, 64], [256, 128, 64], [128], [512, 256, 128]].
inflation_factor_range (list[float] | None, default: None) – Inflation factors to test. If None, uses [1.5] (Helsinki optimal). Set to multiple values (e.g., [1.0, 1.5, 2.0]) to search inflation.
cv_folds (int, default: 5) – Number of cross-validation folds.
max_epochs_cv (int, default: 100) – Maximum epochs per CV fold (overridden by speed_preset).
early_stopping_patience (int, default: 5) – Patience for early stopping (overridden by speed_preset).
subsample_fraction (float, default: 0.5) – Fraction of data to use for CV when dataset > max_cells_cv.
max_cells_cv (int, default: 15000) – Maximum cells for CV. Larger datasets are subsampled.
speed_preset (str, default: "balanced") –
Training speed preset. Options:
- "fast" : 25 epochs, patience=3 (quick exploration)
- "balanced" : 50 epochs, patience=5 (recommended)
- "thorough" : 100 epochs, patience=8 (comprehensive)
use_pcha_init (bool, default: True) – Whether to use PCHA initialization for archetypes.
random_state (int, default: 42) – Random seed for reproducibility.

_search_inflation#

Internal flag indicating whether inflation is being searched (True if inflation_factor_range was explicitly provided).

Type:: bool

Raises:: ValueError – If n_archetypes_range contains non-positive integers. If cv_folds <= 0. If max_epochs_cv <= 0. If subsample_fraction not in (0, 1]. If max_cells_cv <= 0. If inflation_factor_range contains non-positive values.

Examples

>>> # Basic configuration
>>> config = SearchConfig(n_archetypes_range=[3, 4, 5], cv_folds=3, speed_preset="fast")
>>> # Search inflation factors
>>> config = SearchConfig(n_archetypes_range=[4, 5, 6], inflation_factor_range=[1.0, 1.5, 2.0], cv_folds=5)
>>> print(f"Searching inflation: {config._search_inflation}")  # True

See also

ArchetypalGridSearch: Uses this configuration
peach.tl.hyperparameter_search: User-facing wrapper

n_archetypes_range: list[int] = None#

hidden_dims_options: list[list[int]] = None#

cv_folds: int = 5#

max_epochs_cv: int = 100#

early_stopping_patience: int = 5#

subsample_fraction: float = 0.5#

max_cells_cv: int = 15000#

speed_preset: str = 'balanced'#

use_pcha_init: bool = True#

inflation_factor_range: list[float] = None#

random_state: int = 42#

__post_init__()[source]#: Initialize defaults and validate inputs.

peach.tl.setup_cellrank(adata, high_purity_threshold=0.8, n_neighbors=30, n_pcs=11, compute_paga=True, solver='gmres', tol=1e-06, terminal_obs_key='archetypes', verbose=True)[source]#

Set up CellRank workflow for archetypal or centroid-based trajectory analysis.

This function orchestrates the complete pipeline from terminal state assignments to fate probabilities, including neighbors computation, UMAP, PAGA, and GPCCA.

Parameters:

adata (AnnData) – Annotated data matrix. Must contain: - adata.obs[terminal_obs_key] : Terminal state assignments - adata.obsm[‘X_pca’] : PCA coordinates If using archetypes (default): - adata.obsm[‘cell_archetype_weights’] : Barycentric weights - adata.obsm[‘archetype_distances’] : Distances to archetypes
high_purity_threshold (float, optional (default: 0.80)) – Percentile threshold for defining high-purity cells. 0.80 means top 20% of cells per archetype. Only used when terminal_obs_key=’archetypes’.
n_neighbors (int, optional (default: 30)) – Number of neighbors for k-NN graph construction
n_pcs (int, optional (default: 11)) – Number of principal components to use
compute_paga (bool, optional (default: True)) – Whether to compute PAGA connectivity
solver (str, optional (default: 'gmres')) – Solver for fate probability computation (‘gmres’, ‘direct’, etc.)
tol (float, optional (default: 1e-6)) – Tolerance for iterative solver
terminal_obs_key (str, optional (default: 'archetypes')) – Key in adata.obs containing terminal state assignments. Use ‘archetypes’ for standard archetype-based analysis or ‘centroid_assignments’ for treatment phase centroid trajectories (requires running pc.tl.assign_to_centroids() first).
verbose (bool, optional (default: True)) – Print progress messages

Returns:

ck (cellrank.kernels.ConnectivityKernel) – Computed transition kernel
g (cellrank.estimators.GPCCA) – GPCCA estimator with fate probabilities
Stores in adata
—————
adata.obs[‘terminal_states’] (pd.Series) – Terminal state assignments for high-purity cells
adata.obsm[‘fate_probabilities’] (np.ndarray) – Fate probability matrix (n_obs × n_lineages)
adata.uns[‘lineage_names’] (list) – List of lineage names (archetype or centroid names)
adata.uns[‘cellrank_gpcca’] (GPCCA) – GPCCA estimator object for downstream functions
adata.obsm[‘X_umap’] (np.ndarray) – UMAP coordinates (if not already present)
adata.uns[‘neighbors’] (dict) – k-NN graph (if not already present)
adata.uns[‘paga’] (dict) – PAGA results (if compute_paga=True)

Examples

Basic archetype-based analysis:

>>> import peach as pc
>>> ck, g = pc.tl.setup_cellrank(adata)

Treatment phase centroid-based analysis:

>>> # First compute centroids and assign cells
>>> pc.tl.compute_conditional_centroids(adata, condition_column="treatment_phase")
>>> pc.tl.assign_to_centroids(adata, condition_column="treatment_phase")
>>> # Then run CellRank with centroid assignments
>>> ck, g = pc.tl.setup_cellrank(adata, terminal_obs_key="centroid_assignments")

Access results:

>>> fate_probs = adata.obsm["fate_probabilities"]
>>> lineages = adata.uns["lineage_names"]
>>> terminal_states = adata.obs["terminal_states"]

Customize parameters:

>>> ck, g = pc.tl.setup_cellrank(
...     adata,
...     high_purity_threshold=0.90,  # Top 10% of cells
...     n_neighbors=50,
...     compute_paga=False,
... )

Notes

Requires CellRank installation: pip install cellrank
For GAMR models, set R_HOME before importing cellrank: ```python import os

os.environ[“R_HOME”] = “/Library/Frameworks/R.framework/Resources” ```

See also

assign_to_centroids: Assign cells to treatment phase centroids
compute_lineage_pseudotimes: Convert fate probabilities to pseudotime
compute_lineage_drivers: Identify genes driving lineage commitment

References

peach.tl.compute_lineage_pseudotimes(adata, lineage_names=None, fate_prob_key='fate_probabilities')[source]#

Convert fate probabilities to lineage-specific pseudotimes.

Creates continuous pseudotime variables for each lineage by using fate probabilities as progression measures. Stores results in adata.obs.

Parameters:

adata (AnnData) – Annotated data matrix. Must contain: - adata.obsm[‘fate_probabilities’] : Fate probability matrix - adata.uns[‘lineage_names’] : List of lineage names
lineage_names (list of str, optional) – Specific lineages to compute pseudotime for. If None, computes for all lineages in adata.uns[‘lineage_names’]
fate_prob_key (str, optional (default: 'fate_probabilities')) – Key in adata.obsm containing fate probabilities

Returns:

Stores pseudotime variables in adata.obs with keys: ‘pseudotime_to_{lineage}’ for each lineage

Return type:

None

Examples

Compute pseudotimes for all lineages:

>>> import peach as pc
>>> pc.tl.compute_lineage_pseudotimes(adata)

Access specific pseudotime:

>>> pseudotime = adata.obs["pseudotime_to_archetype_5"]

Compute for specific lineages:

>>> pc.tl.compute_lineage_pseudotimes(adata, lineage_names=["archetype_3", "archetype_5"])

Use for gene trend analysis:

>>> import cellrank as cr
>>> cr.pl.gene_trends(
...     adata, model=cr.models.GAMR(adata), genes=["RARRES1", "SOD2"], time_key="pseudotime_to_archetype_5"
... )

Notes

Must run setup_cellrank() first to compute fate probabilities
Pseudotime values are simply the fate probabilities (range: 0-1)
Higher pseudotime = higher probability of committing to that lineage

See also

setup_cellrank: Compute fate probabilities
compute_lineage_drivers: Identify genes driving lineage commitment

peach.tl.compute_lineage_drivers(adata, lineage, n_genes=100, method='cellrank', **kwargs)[source]#

Identify genes driving commitment to a specific lineage.

Computes correlation between gene expression and fate probabilities to identify lineage-specific marker genes.

Parameters:

adata (AnnData) – Annotated data matrix with fate probabilities computed
lineage (str) – Target lineage name (e.g., ‘archetype_5’)
n_genes (int, optional (default: 100)) – Number of top genes to return
method (str, optional (default: 'cellrank')) – Method for computing drivers: - ‘cellrank’ : Use CellRank’s compute_lineage_drivers (requires GPCCA object) - ‘correlation’ : Simple Spearman correlation (faster, works without GPCCA)
**kwargs – Additional arguments passed to method

Returns:

drivers – Top driver genes with statistics: - ‘gene’ : Gene name - ‘lineage’ : Target lineage name - ‘correlation’ : Spearman correlation with fate probability - ‘pvalue’ : P-value from correlation test

Return type:

pd.DataFrame

Examples

Using CellRank method (GPCCA is automatically stored by setup_cellrank):

>>> import peach as pc
>>> ck, g = pc.tl.setup_cellrank(adata)
>>> drivers = pc.tl.compute_lineage_drivers(adata, lineage="archetype_5", method="cellrank")

Using correlation method (simpler, faster):

>>> drivers = pc.tl.compute_lineage_drivers(adata, lineage="archetype_5", method="correlation", n_genes=50)

Top genes:

>>> print(drivers.head(10))

Notes

‘cellrank’ method is more sophisticated (uses GAM models)
‘correlation’ method is faster and works without storing GPCCA object
For publication, recommend ‘cellrank’ method with GAMR models

See also

setup_cellrank: Compute fate probabilities
compute_lineage_pseudotimes: Create pseudotime variables

peach.tl.compute_transition_frequencies(adata, start_weight_threshold=0.5, fate_prob_threshold=0.3, lineages=None)[source]#

Compute frequency of transitions between archetypal states.

Identifies cells transitioning from one archetype to another based on their starting archetypal weights and fate probabilities from CellRank.

A transition is counted when a cell has: - High barycentric weight for source archetype (> start_weight_threshold) - High fate probability for target archetype (> fate_prob_threshold)

Parameters:

adata (AnnData) – Annotated data object with CellRank results. Must contain: - adata.obsm[‘cell_archetype_weights’]: Barycentric weights [n_obs, n_archetypes] - adata.obs[‘archetypes’]: Categorical archetype assignments - adata.obsm[‘fate_probabilities’]: Fate probability matrix [n_obs, n_lineages] - adata.uns[‘lineage_names’]: List of lineage/archetype names - adata.uns[‘cellrank_gpcca’]: GPCCA estimator object (from setup_cellrank)
start_weight_threshold (float, default=0.5) – Minimum barycentric weight to consider a cell as “starting” from an archetype. - 0.5 = top 50% cells per archetype (balanced) - 0.7 = top 30% cells (more stringent) - 0.3 = top 70% cells (more permissive)
fate_prob_threshold (float, default=0.3) – Minimum fate probability to consider a cell as “transitioning to” an archetype. - 0.3 = 30% commitment probability (standard) - 0.5 = 50% commitment (stringent) - 0.2 = 20% commitment (permissive)
lineages (list of str, optional) – Specific lineages/archetypes to analyze. If None, uses all lineages from adata.uns[‘lineage_names’] that start with ‘archetype_’.

Returns:

Transition frequency matrix with shape [n_archetypes, n_archetypes]. - Index: Source archetypes (starting weight) - Columns: Target archetypes (fate probability) - Values: Integer counts of cells satisfying both thresholds - Diagonal: Cells maintaining their archetype identity - Off-diagonal: Cross-archetype transitions

Example:: archetype_0 archetype_1 archetype_2 archetype_3

archetype_0 150 45 23 12 archetype_1 12 200 67 8 archetype_2 8 34 180 45 archetype_3 5 15 30 190

Return type:

pd.DataFrame

Raises:

ValueError – If required CellRank results are missing (run setup_cellrank() first)

Notes

archetype_0 (centroid) uses categorical assignment instead of weight threshold
Returns raw counts (not normalized probabilities)
Cells can appear in multiple transitions if they meet multiple criteria
Use with PAGA connectivity for complete trajectory analysis

Examples

Basic usage with default thresholds:

>>> import peach as pc
>>> # After running setup_cellrank()
>>> transitions = pc.tl.compute_transition_frequencies(adata)
>>> print(transitions)

Stringent thresholds for high-confidence transitions:

>>> transitions = pc.tl.compute_transition_frequencies(
...     adata,
...     start_weight_threshold=0.7,  # Top 30% cells
...     fate_prob_threshold=0.5,  # 50% commitment
... )

Analyze specific archetypes only:

>>> transitions = pc.tl.compute_transition_frequencies(
...     adata, lineages=["archetype_1", "archetype_2", "archetype_3"]
... )

Visualize with seaborn:

>>> import seaborn as sns
>>> import matplotlib.pyplot as plt
>>> sns.heatmap(transitions, annot=True, fmt="d", cmap="YlOrRd")
>>> plt.title("Archetype Transition Frequencies")
>>> plt.show()

See also

setup_cellrank: Complete CellRank workflow setup
compute_lineage_pseudotimes: Convert fate probabilities to pseudotime
compute_lineage_drivers: Identify driver genes for lineage commitment

peach.tl.single_trajectory_analysis(adata, trajectory, trajectories=None, selection_method='discrete', source_weight_threshold=0.4, target_fate_threshold=0.4, verbose=True)[source]#

Analyze single archetype-to-archetype trajectory.

Filters cells based on source archetype assignment/weight and target fate probability. Returns a subset AnnData ready for CellRank gene trends analysis.

IMPORTANT: This function requires CellRank setup to be run first:

>>> ck, g = pc.tl.setup_cellrank(adata, high_purity_threshold=0.80)
>>> pc.tl.compute_lineage_pseudotimes(adata)

For driver genes, use CellRank directly:

>>> drivers = g.compute_lineage_drivers(lineages="archetype_3")

Parameters:

adata (AnnData) – Annotated data matrix. Must contain (from setup_cellrank): - adata.obsm[‘fate_probabilities’] : Fate probability matrix - adata.uns[‘lineage_names’] : List of lineage names - adata.obs[‘pseudotime_to_{archetype}’] : Pseudotime from compute_lineage_pseudotimes - adata.obs[‘archetypes’] : Discrete archetype assignments (for selection_method=’discrete’) - adata.obsm[‘cell_archetype_weights’] : Barycentric weights (for selection_method=’weight’)
trajectory (tuple) – Archetype pair as (source_idx, target_idx), e.g., (0, 3) for archetype_0 → archetype_3.
trajectories (list of tuple, optional) – Multiple trajectory pairs to analyze sequentially. If provided, trajectory is ignored and returns list of results.
selection_method (str, default: 'discrete') – How to select source cells: - ‘discrete’ : Filter by adata.obs[‘archetypes’] == source_archetype - ‘weight’ : Filter by weights[:, source_idx] >= source_weight_threshold - ‘both’ : Compute both and report comparison (uses ‘discrete’ for subset)
source_weight_threshold (float, default: 0.4) – Minimum barycentric weight for source archetype (only used if selection_method=’weight’).
target_fate_threshold (float, default: 0.4) – Minimum fate probability for target archetype selection.
verbose (bool, default: True) – Print progress messages.

Returns:

Tuple[SingleTrajectoryResult, AnnData] –
- result : SingleTrajectoryResult with trajectory metadata
- adata_traj : Subset AnnData containing only trajectory cells, ready for CellRank gene trends. If trajectories list provided, returns list of tuples.
Stores in adata
—————
adata.obs[‘trajectory_{src}_to_{tgt}_cells’] (bool) – Boolean mask for cells in trajectory.
adata.uns[‘trajectory_{src}_to_{tgt}’] (dict) – Trajectory analysis metadata.

Examples

Complete workflow with CellRank:

>>> import peach as pc
>>> import cellrank as cr
>>>
>>> # 1. Setup CellRank (computes fate probabilities)
>>> ck, g = pc.tl.setup_cellrank(adata, high_purity_threshold=0.80)
>>> pc.tl.compute_lineage_pseudotimes(adata)
>>>
>>> # 2. Analyze trajectory (returns subset AnnData)
>>> result, adata_traj = pc.tl.single_trajectory_analysis(adata, trajectory=(4, 5), selection_method="discrete")
>>> print(f"Found {result.n_trajectory_cells} cells")
>>>
>>> # 3. Get drivers from CellRank
>>> drivers = g.compute_lineage_drivers(lineages="archetype_5")
>>> top_genes = drivers.index[:5].tolist()
>>>
>>> # 4. Plot gene trends using subset
>>> cr.pl.gene_trends(adata_traj, model=cr.models.GAMR(adata_traj), genes=top_genes, time_key=result.pseudotime_key)

Compare selection methods:

>>> result, adata_traj = pc.tl.single_trajectory_analysis(adata, trajectory=(1, 2), selection_method="both")
>>> print(f"Discrete: {result.n_discrete_cells} cells")
>>> print(f"Weight-based: {result.n_weight_cells} cells")

Notes

Requires setup_cellrank() and compute_lineage_pseudotimes() to be run first
Driver computation is NOT included - use CellRank’s g.compute_lineage_drivers() directly
Pseudotime uses CellRank-computed values from compute_lineage_pseudotimes()

See also

setup_cellrank: Complete CellRank workflow setup (computes fate probabilities)
compute_lineage_pseudotimes: Compute pseudotime to each lineage

peach.tl.spatial_neighbors(adata, *, spatial_key='spatial', n_neighs=30, coord_type='generic', **kwargs)[source]#

Build spatial neighbor graph from tissue coordinates.

Wrapper around squidpy.gr.spatial_neighbors() with PEACH-appropriate defaults for bead-based and imaging-based spatial transcriptomics.

Parameters:

adata (AnnData) – Annotated data with spatial coordinates in adata.obsm[spatial_key].
spatial_key (str, default: "spatial") – Key in adata.obsm containing 2D spatial coordinates.
n_neighs (int, default: 30) – Number of nearest neighbors for the spatial graph.
coord_type (str, default: "generic") – Coordinate type. Use “generic” for Slide-seq/MERFISH (continuous coords) or “grid” for Visium (hexagonal grid). See squidpy docs for details.
**kwargs (Any) – Additional arguments passed to squidpy.gr.spatial_neighbors().

Returns:

Modifies adata in place:

adata.obsp['spatial_connectivities']: sparse connectivity matrix
adata.obsp['spatial_distances']: sparse distance matrix

Return type:

None

peach.tl.archetype_nhood_enrichment(adata, *, cluster_key='archetypes', n_perms=1000, seed=42, **kwargs)[source]#

Test spatial neighborhood enrichment between archetype groups.

For each pair of archetypes, tests whether cells of one archetype are found more or less frequently in the spatial neighborhood of the other archetype than expected by chance (permutation test).

Parameters:

adata (AnnData) – Annotated data with spatial graph (run spatial_neighbors first) and archetype assignments in adata.obs[cluster_key].
cluster_key (str, default: "archetypes") – Column in adata.obs with archetype labels.
n_perms (int, default: 1000) – Number of permutations for significance testing.
seed (int, default: 42) – Random seed for permutation reproducibility.
**kwargs (Any) – Additional arguments passed to squidpy.gr.nhood_enrichment().

Returns:

Dictionary with ‘zscore’ and ‘count’ arrays [n_archetypes x n_archetypes]. Also stored in adata.uns['archetype_nhood_enrichment'].

Return type:

dict

Examples

>>> pc.tl.spatial_neighbors(adata)
>>> result = pc.tl.archetype_nhood_enrichment(adata)
>>> print(result['zscore'])  # positive = enriched, negative = depleted

peach.tl.archetype_co_occurrence(adata, *, cluster_key='archetypes', spatial_key='spatial', interval=50, **kwargs)[source]#

Compute distance-dependent co-occurrence of archetype groups.

Measures how the co-occurrence ratio between archetype pairs varies with spatial distance. Useful for identifying distance-dependent spatial relationships (e.g., archetypes that co-occur at short but not long distances).

Parameters:

adata (AnnData) – Annotated data with spatial coordinates and archetype assignments.
cluster_key (str, default: "archetypes") – Column in adata.obs with archetype labels.
spatial_key (str, default: "spatial") – Key in adata.obsm with spatial coordinates.
interval (int, default: 50) – Number of distance intervals to evaluate.
**kwargs (Any) – Additional arguments passed to squidpy.gr.co_occurrence().

Returns:

Dictionary with ‘occ’ (co-occurrence ratios) and ‘interval’ (distance bins). Also stored in adata.uns['archetype_co_occurrence'].

Return type:

dict

Examples

>>> pc.tl.spatial_neighbors(adata)
>>> result = pc.tl.archetype_co_occurrence(adata)
>>> print(result['occ'].shape)  # [n_archetypes, n_archetypes, n_intervals]

peach.tl.archetype_spatial_autocorr(adata, *, weights_key='cell_archetype_weights', mode='moran', n_perms=100, n_jobs=1, **kwargs)[source]#

Compute spatial autocorrelation (Moran’s I or Geary’s C) per archetype weight.

Tests whether each archetype weight is spatially smooth (positive autocorrelation), spatially random, or spatially checkered (negative autocorrelation). Moran’s I > 0 indicates spatial clustering of that archetype; I ~ 0 indicates random distribution; I < 0 indicates dispersal.

Parameters:

adata (AnnData) – Annotated data with spatial graph (run spatial_neighbors first) and archetype weights in adata.obsm[weights_key].
weights_key (str, default: "archetype_weights") – Key in adata.obsm with archetype weight matrix [n_cells, n_archetypes].
mode (str, default: "moran") – Autocorrelation statistic: “moran” (Moran’s I) or “geary” (Geary’s C).
n_perms (int, default: 100) – Number of permutations for p-value estimation. Set to None for analytical p-values only.
n_jobs (int, default: 1) – Number of parallel jobs. Default 1 for macOS compatibility.
**kwargs (Any) – Additional arguments passed to squidpy.gr.spatial_autocorr().

Returns:

DataFrame with autocorrelation statistics per archetype weight. Also stored in adata.uns['archetype_spatial_autocorr'].

Return type:

pandas.DataFrame

Examples

>>> pc.tl.spatial_neighbors(adata)
>>> autocorr = pc.tl.archetype_spatial_autocorr(adata)
>>> print(autocorr)  # Moran's I per archetype

peach.tl.archetype_interaction_boundaries(adata, *, cell_type_col='Cell_Type', weights_key='cell_archetype_weights', cell_type_a=None, cell_type_b=None)[source]#

Detect spatial fronts where archetype weight mixtures diverge between cell types.

Uses the existing spatial neighbor graph (from pc.tl.spatial_neighbors()) and continuous archetype weights (from pc.tl.extract_archetype_weights()) to identify tissue regions where two cell types run different archetype programs in the same neighborhood.

For each cell, computes the mean archetype weight vector of its spatial neighbors that belong to cell type A and cell type B separately, then measures divergence (Jensen-Shannon) between these two distributions.

High-scoring cells sit at spatial fronts where the local archetype program of one cell type is shifting in a different direction from the other — e.g., macrophages trending toward inflammatory archetypes while neighboring fibroblasts trend toward fibrotic archetypes.

Parameters:

adata (AnnData) –
Annotated data with:
- Spatial neighbor graph in adata.obsp['spatial_connectivities'] (from pc.tl.spatial_neighbors())
- Archetype weights in adata.obsm[weights_key] (from pc.tl.extract_archetype_weights())
- Cell type labels in adata.obs[cell_type_col]
cell_type_col (str, default: "Cell_Type") – Column in adata.obs with cell type labels.
weights_key (str, default: "archetype_weights") – Key in adata.obsm with archetype weight matrix [n_cells, n_archetypes]. Each row sums to 1.
cell_type_a (str | None, default: None) – First cell type for pairwise comparison. If None, uses the two most abundant cell types.
cell_type_b (str | None, default: None) – Second cell type.

Returns:

Dictionary with:

'boundary_scores': np.ndarray [n_cells] — per-cell JSD boundary score
'mean_weights_a': np.ndarray [n_cells, n_archetypes] — mean weight vector of type-A neighbors per cell
'mean_weights_b': np.ndarray [n_cells, n_archetypes] — mean weight vector of type-B neighbors per cell
'cross_correlations': pd.DataFrame — per-archetype Spearman r between cell types across space
'cell_type_a', 'cell_type_b': str — cell type names
'n_archetypes': int — number of archetype dimensions

Also stored in adata.uns['archetype_interaction_boundaries'] and boundary scores in adata.obs['boundary_score'].

Return type:

dict

Examples

>>> pc.tl.spatial_neighbors(adata, n_neighs=30)
>>> weights = pc.tl.extract_archetype_weights(adata)
>>> result = pc.tl.archetype_interaction_boundaries(
...     adata, cell_type_col="Cell_Type",
...     cell_type_a="Myeloid", cell_type_b="Treg",
... )
>>> pc.pl.interaction_boundaries(adata)

Modules

`peach.tl.archetypal`	Core archetypal analysis functions.
`peach.tl.cellrank_integration`	CellRank integration for archetypal trajectory analysis.
`peach.tl.hyperparameters`	Hyperparameter optimization for archetypal analysis.
`peach.tl.spatial`	Spatial archetype analysis functions.
`peach.tl.statistical`	Statistical Testing for Archetypal Analysis

peach.tl

Contents

peach.tl#