peach.tl#
Tools for archetypal analysis.
- peach.tl.train_archetypal(adata, n_archetypes=5, n_epochs=50, *, layer=None, pca_key='X_pca', hidden_dims=None, inflation_factor=1.5, model_config=None, optimizer_config=None, device='cpu', save_path=None, archetypal_weight=None, kld_weight=None, reconstruction_weight=0.0, vae_recon_weight=0.0, diversity_weight=0.0, activation_func='relu', track_stability=True, validate_constraints=True, lr_factor=0.1, lr_patience=10, seed=42, constraint_tolerance=0.001, stability_history_size=20, store_coords_key='archetype_coordinates', early_stopping=False, early_stopping_patience=10, early_stopping_metric='archetype_r2', min_improvement=0.0001, validation_check_interval=5, validation_data_loader=None, **kwargs)[source]#
Train Deep Archetypal Analysis model to discover cellular archetypes.
This function performs archetypal analysis using a variational autoencoder architecture to identify extreme cellular states (archetypes) that capture the main axes of biological variation. Each cell is represented as a convex combination of the learned archetypes.
The model uses PCHA initialization with inflation factors for optimal archetype positioning and achieves state-of-the-art performance (R² > 0.89) on real single-cell datasets.
- Parameters:
adata (AnnData) – Annotated data object containing single-cell expression data. Must have PCA coordinates in
adata.obsm[pca_key]. Typically generated usingscanpy.pp.pca(adata).n_archetypes (int, default: 5) – Number of archetypal patterns to learn. Should be chosen based on biological knowledge or using hyperparameter optimization. Common values: 3-10 for most datasets.
n_epochs (int, default: 50) – Number of training epochs. Larger datasets may require more epochs. For datasets >5K cells, consider 100-200 epochs.
layer (str | None, default: None) – AnnData layer to use for training. If None, uses PCA coordinates from
adata.obsm[pca_key](recommended).pca_key (str, default: "X_pca") – Key in
adata.obsmcontaining PCA coordinates. The model works best with 5-50 PCA components. Auto-detects: ‘X_pca’, ‘X_PCA’, ‘PCA’.hidden_dims (list[int] | None, default: None) – Encoder/decoder layer dimensions. If None, uses [256, 128, 64]. Smaller architectures like [128, 64] train faster but may underfit. Larger architectures like [512, 256, 128] may overfit on small datasets.
inflation_factor (float, default: 1.5) – PCHA inflation factor for archetype initialization. Values > 1.0 push initial archetypes further from the data centroid, improving separation. Recommended range: 1.2-2.0. Higher values for more distinct archetypes.
model_config (dict | None, default: None) –
Additional model configuration parameters (for advanced users):
archetypal_weight: float - Archetypal loss weight, default 0.9kld_weight: float - KL divergence weight, default 0.1diversity_weight: float - Archetype diversity weight, default 0.05use_barycentric: bool - Use softmax constraints, default True
optimizer_config (dict | None, default: None) –
Optimizer configuration parameters:
lr: float - Learning rate, default 1e-3weight_decay: float - L2 regularization, default 0.0betas: tuple[float, float] - Adam momentum parameters
device (str, default: "cpu") – Compute device for training. One of “cpu”, “cuda”, “mps”.
save_path (str | None, default: None) – Path to save model checkpoints during training.
archetypal_weight (float | None, default: None) – Weight for archetypal loss component. Uses model’s configured value if None.
kld_weight (float | None, default: None) – Weight for KL divergence loss. Uses model’s configured value if None.
reconstruction_weight (float, default: 0.0) – Legacy reconstruction weight parameter.
vae_recon_weight (float, default: 0.0) – VAE reconstruction weight.
diversity_weight (float, default: 0.0) – Weight for archetype diversity loss.
activation_func (str, default: "relu") – Activation function for the model.
track_stability (bool, default: True) – Whether to monitor archetype drift during training. Adds stability metrics to history: archetype_drift_mean, archetype_variance_mean, etc.
validate_constraints (bool, default: True) – Whether to validate archetypal constraints during training. Adds constraint metrics to history: constraints_satisfied, A_sum_error, etc.
lr_factor (float, default: 0.1) – Factor for learning rate reduction on plateau.
lr_patience (int, default: 10) – Number of epochs with no improvement before reducing learning rate.
seed (int, default: 42) – Random seed for reproducibility.
constraint_tolerance (float, default: 1e-3) – Tolerance for constraint validation.
stability_history_size (int, default: 20) – Number of epochs to track for stability analysis.
store_coords_key (str, default: "archetype_coordinates") – Key to store learned archetype positions in
adata.uns.early_stopping (bool, default: False) – Whether to use early stopping based on validation metrics.
early_stopping_patience (int, default: 10) – Patience for early stopping (number of checks without improvement).
early_stopping_metric (str, default: "archetype_r2") – Metric to monitor for early stopping. One of: ‘archetype_r2’, ‘loss’, ‘rmse’.
min_improvement (float, default: 1e-4) – Minimum improvement required to reset patience counter.
validation_check_interval (int, default: 5) – How often to check validation metrics (in epochs).
validation_data_loader (DataLoader | None, default: None) – Validation data loader for early stopping. Uses training data if None.
**kwargs – Additional arguments passed to the core training function.
- Returns:
Training results dictionary with the following structure:
Guaranteed keys (always present):
historydictTraining metrics per epoch. Keys depend on tracking options:
Core metrics (always):
loss,archetypal_loss,archetype_r2,rmseKLD metrics:
kld_loss,KLDStability metrics (if track_stability=True):
archetype_drift_mean,archetype_drift_max,archetype_variance_mean, etc.Constraint metrics (if validate_constraints=True):
constraints_satisfied,A_sum_error,B_sum_error,constraint_violation_rateValidation metrics (if early_stopping=True):
val_loss,val_archetype_r2
final_modeltorch.nn.ModuleTrained Deep_AA model instance.
modeltorch.nn.ModuleAlias for
final_model(same object, for compatibility).
final_optimizertorch.optim.OptimizerFinal optimizer state.
final_analysisdictFinal training analysis containing:
final_constraint_validation: dict with constraint metricsarchetypal_weights: dict with A_matrix and B_matrix analysisfinal_coordinates: dict with ‘A’, ‘B’, ‘Y’ tensorserror: str (only if analysis failed)
epoch_archetype_positionslist[torch.Tensor]Archetype positions at each epoch, shape (n_archetypes, input_dim).
training_configdictTraining configuration with keys:
n_epochs,actual_epochs,early_stop_triggered,archetypal_weight,kld_weight,reconstruction_weight,activation_func,seed,constraint_tolerance,stability_history_size,early_stopping,early_stopping_patience,early_stopping_metric.
Convenience keys (conditional - use .get() to access safely):
final_archetype_r2float | NoneLast value of
history['archetype_r2']if tracked.
final_rmsefloat | NoneLast value of
history['rmse']if tracked.
final_maefloat | NoneLast value of
history['mae']if tracked.
final_lossfloat | NoneLast value of
history['loss']if tracked.
convergence_epochint | NoneEquals
training_config['actual_epochs'].
- Return type:
- Raises:
ValueError – If
adata.obsm[pca_key]is not found. Runscanpy.pp.pca()first. Ifn_archetypesexceeds PCA dimensions.RuntimeError – If CUDA device is requested but not available.
Stores –
------ –
The function stores the following in AnnData: –
:raises -
adata.uns[store_coords_key]: np.ndarray: Archetype positions in PCA space, shape (n_archetypes, n_pcs). Default key: ‘archetype_coordinates’.Notes
Archetypal Analysis Theory: Archetypal analysis represents each data point as a convex combination of extreme points (archetypes). Unlike clustering, which partitions data, archetypal analysis allows cells to have partial membership in multiple archetypes, better reflecting biological continuity.
Model Architecture: Uses a variational autoencoder where the latent space directly represents archetypal coordinates (A matrix). Archetypes are learned as model parameters (Y matrix) rather than constructed from data points.
Accessing Results Safely:
For guaranteed keys, direct access works:
results['history']For convenience keys, use
.get():results.get('final_archetype_r2')Or access from history:
results['history']['archetype_r2'][-1]
Type Validation: For IDE autocomplete and runtime validation:
from peach._core.types import TrainingResults, validate_training_results validated = validate_training_results(results)
Examples
Basic usage with default parameters:
>>> import scanpy as sc >>> import peach as pc >>> # Prepare data with PCA >>> sc.pp.pca(adata, n_comps=30) >>> # Train archetypal model >>> results = pc.tl.train_archetypal(adata, n_archetypes=5, n_epochs=100) >>> # Access final R² safely (may be None if not tracked) >>> r2 = results.get("final_archetype_r2") >>> if r2 is not None: ... print(f"Final R²: {r2:.3f}") >>> # Or access from history (guaranteed if metric was tracked) >>> if results["history"].get("archetype_r2"): ... r2 = results["history"]["archetype_r2"][-1] ... print(f"Final R²: {r2:.3f}")
Advanced usage with custom configuration:
>>> model_config = {"hidden_dims": [512, 256, 128], "inflation_factor": 2.0} >>> results = pc.tl.train_archetypal( ... adata, ... n_archetypes=4, ... n_epochs=150, ... model_config=model_config, ... early_stopping=True, ... early_stopping_patience=15, ... device="cuda", ... ) >>> # Check if early stopping triggered >>> config = results["training_config"] >>> if config["early_stop_triggered"]: ... print(f"Converged at epoch {config['actual_epochs']}")
Accessing archetype coordinates stored in AnnData:
>>> # After training, coordinates are in adata.uns >>> archetype_coords = adata.uns["archetype_coordinates"] >>> print(f"Learned {archetype_coords.shape[0]} archetypes") >>> print(f"Each archetype has {archetype_coords.shape[1]} PCA dimensions")
See also
peach.tl.archetypal_coordinatesExtract cell-archetype distances
peach.tl.assign_archetypesAssign cells to discovered archetypes
peach.tl.extract_archetype_weightsGet barycentric coordinates for cells
peach.pl.training_metricsVisualize training curves
peach._core.types.TrainingResultsType definition for return structure
- peach.tl.archetypal_coordinates(adata, *, pca_key='X_pca', archetype_coords_key='archetype_coordinates', obsm_key='archetype_distances', uns_prefix='archetype', verbose=True, **kwargs)[source]#
Extract archetypal coordinates for all cells.
- Parameters:
adata (AnnData) – Annotated data object with trained model coordinates
pca_key (str, default: "X_pca") – Key in adata.obsm containing PCA coordinates
archetype_coords_key (str, default: "archetype_coordinates") – Key in adata.uns containing archetype coordinates
obsm_key (str, default: "archetype_distances") – Key to store distance matrix in adata.obsm
uns_prefix (str, default: "archetype") – Prefix for keys stored in adata.uns
verbose (bool, default: True) – Whether to print progress messages
**kwargs – Additional arguments passed to compute_archetype_distances
- Returns:
Dictionary with archetypal coordinates and distances
- Return type:
- peach.tl.assign_archetypes(adata, *, percentage_per_archetype=0.1, obsm_key='archetype_distances', obs_key='archetypes', include_central_archetype=True, verbose=True, **kwargs)[source]#
Assign cells to archetypes based on distances.
- Parameters:
adata (AnnData) – Annotated data object with archetype distances
percentage_per_archetype (float, default: 0.1) – Percentage of cells to assign to each archetype
obsm_key (str, default: "archetype_distances") – Key in adata.obsm containing distance matrix
obs_key (str, default: "archetypes") – Key to store assignments in adata.obs
include_central_archetype (bool, default: True) – Whether to include a central archetype (cells far from all extreme archetypes)
verbose (bool, default: True) – Whether to print progress messages
**kwargs – Additional arguments passed to bin_cells_by_archetype
- Return type:
- peach.tl.extract_archetype_weights(adata, model=None, *, pca_key='X_pca', weights_key='cell_archetype_weights', batch_size=256, device='cpu', verbose=True)[source]#
Extract cell archetype weights from trained Deep_AA model.
This function computes the barycentric coordinates (weights) for each cell that describe how it’s composed of the learned archetypes.
- Parameters:
adata (AnnData) – Annotated data object with PCA coordinates
model (Deep_AA model, optional) – Trained model. If None, will look for model in adata.uns[‘trained_model’]
pca_key (str, default: "X_pca") – Key in adata.obsm containing PCA coordinates
weights_key (str, default: "cell_archetype_weights") – Key to store weights in adata.obsm
batch_size (int, default: 256) – Batch size for processing
device (str, default: "cpu") – Device for computation (‘cpu’, ‘cuda’, or ‘mps’)
verbose (bool, default: True) – Whether to print progress
- Returns:
Cell archetype weights of shape (n_cells, n_archetypes) Also stores weights in adata.obsm[weights_key]
- Return type:
np.ndarray
Examples
>>> # After training >>> results = pc.tl.train_archetypal(adata, n_archetypes=5) >>> >>> # Extract weights >>> weights = pc.tl.extract_archetype_weights(adata, results["model"]) >>> >>> # Weights are now in adata.obsm['cell_archetype_weights'] >>> print(adata.obsm["cell_archetype_weights"].shape)
- peach.tl.compute_conditional_centroids(adata, condition_column, *, pca_key='X_pca', store_key='conditional_centroids', exclude_archetypes=None, groupby=None, verbose=True)[source]#
Compute centroid positions in PCA space for each level of a categorical condition.
This function calculates the mean position (centroid) in PCA space for cells belonging to each level of a categorical variable. Useful for visualizing how different conditions (e.g., treatment phases, timepoints) relate to the archetypal structure.
Following R template patterns: - Uses ALL PCs for centroid calculation (equivalent to R’s colMeans) - Stores full PC centroid but extracts first 3 for 3D visualization - Excludes ‘no_archetype’ and ‘archetype_0’ cells by default
- Parameters:
adata (AnnData) – Annotated data object with PCA coordinates in adata.obsm[pca_key].
condition_column (str) – Name of categorical column in adata.obs to group by. Examples: ‘treatment_phase’, ‘timepoint’, ‘batch’.
pca_key (str, default: "X_pca") – Key in adata.obsm containing PCA coordinates.
store_key (str, default: "conditional_centroids") – Key in adata.uns to store results.
exclude_archetypes (list, optional) – Archetype labels to exclude from centroid calculation. Default: [‘no_archetype’, ‘archetype_0’] (following R template). Set to empty list [] to include all cells.
groupby (str, optional) – Second categorical column for multi-group trajectories. If provided, centroids are computed for each (group, level) combination. Example: groupby=’response_group’ to get separate trajectories per response.
verbose (bool, default: True) – Whether to print progress messages.
- Returns:
Dictionary with keys:
condition_column: str - name of the condition columnn_levels: int - number of unique levelslevels: List[str] - list of level namescentroids: Dict[str, List[float]] - level → full PCA coordinatescentroids_3d: Dict[str, List[float]] - level → [x, y, z] first 3 PCscell_counts: Dict[str, int] - level → cell countpca_key: str - PCA key usedexclude_archetypes: List[str] - archetypes excludedgroupby: Optional[str] - groupby column if usedgroup_centroids: Optional[Dict] - if groupby: {group: {level: coords}}group_centroids_3d: Optional[Dict] - if groupby: {group: {level: [x,y,z]}}group_cell_counts: Optional[Dict] - if groupby: {group: {level: count}}
- Return type:
- Raises:
ValueError – If condition_column not in adata.obs or PCA coordinates not found.
Stores –
------ –
The function stores results in AnnData: –
:raises -
adata.uns[store_key][condition_column]: dict: Full results dictionary as returned.Examples
>>> # Compute centroids for treatment phase >>> result = pc.tl.compute_conditional_centroids(adata, "treatment_phase") >>> print(result["centroids_3d"]) {'chemo-naive': [1.2, 0.5, -0.3], 'IDS': [0.8, 1.1, 0.2]}
>>> # Then visualize with trajectory >>> fig = pc.pl.archetypal_space( ... adata, show_centroids=True, centroid_condition="treatment_phase", centroid_order=["chemo-naive", "IDS"] ... )
>>> # Multi-group centroids for trajectory comparison >>> result = pc.tl.compute_conditional_centroids(adata, "treatment_phase", groupby="response_group") >>> fig = pc.pl.archetypal_space( ... adata, ... show_centroids=True, ... centroid_condition="treatment_phase", ... centroid_groupby="response_group", ... centroid_order=["chemo-naive", "IDS"], ... centroid_colors={"long": "magenta", "short": "cyan"}, ... )
See also
peach.pl.archetypal_spaceVisualize with centroid trajectory overlay
- peach.tl.assign_to_centroids(adata, condition_column, *, pca_key='X_pca', centroid_key='conditional_centroids', bin_prop=0.15, obs_key='centroid_assignments', exclude_archetypes=None, verbose=True)[source]#
Assign cells to nearest centroid based on distance (top bin_prop% closest).
This function mirrors assign_archetypes but for condition-based centroids. It enables using treatment phase centroids as trajectory endpoints in single_trajectory_analysis by creating categorical assignments that CellRank can use as terminal states.
- Parameters:
adata (AnnData) – Annotated data object. Must have: - PCA coordinates in adata.obsm[pca_key] - Centroids computed via compute_conditional_centroids in adata.uns[centroid_key]
condition_column (str) – Name of the condition column used in compute_conditional_centroids. This identifies which centroid set to use.
pca_key (str, default: "X_pca") – Key in adata.obsm containing PCA coordinates.
centroid_key (str, default: "conditional_centroids") – Key in adata.uns containing centroid results from compute_conditional_centroids.
bin_prop (float, default: 0.15) – Proportion of cells to assign to each centroid (top 15% closest). Similar to percentage_per_archetype in assign_archetypes.
obs_key (str, default: "centroid_assignments") – Key in adata.obs to store assignments.
exclude_archetypes (list, optional) – Archetype labels to exclude from assignment. Default: [‘no_archetype’] - these cells get ‘unassigned’.
verbose (bool, default: True) – Whether to print progress messages.
- Returns:
Modifies adata.obs[obs_key] with Categorical assignments. Values are condition levels (e.g., ‘chemo_naive’, ‘IDS’) or ‘unassigned’.
- Return type:
None
Examples
>>> # First compute centroids for treatment phases >>> pc.tl.compute_conditional_centroids(adata, "treatment_stage") >>> >>> # Then assign cells to nearest centroid (top 15% closest) >>> pc.tl.assign_to_centroids(adata, "treatment_stage", bin_prop=0.15) >>> >>> # Check assignments >>> print(adata.obs["centroid_assignments"].value_counts()) >>> >>> # Now can use with CellRank for trajectory analysis >>> # (setup_cellrank can use centroid_assignments as terminal states)
See also
compute_conditional_centroidsCompute centroids for condition levels
assign_archetypesSimilar function for archetype assignments
single_trajectory_analysisUses centroid assignments for trajectory analysis
- peach.tl.gene_associations(adata, *, bin_prop=0.1, obsm_key='archetype_distances', obs_key='archetypes', use_layer=None, test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', test_direction='two-sided', min_logfc=0.01, min_cells=10, comparison_group='all', verbose=True, **kwargs)[source]#
Test gene expression associations with archetypal assignments.
Performs Mann-Whitney U tests to identify genes with significantly different expression between each archetype and all other cells (1-vs-all testing paradigm).
- Parameters:
adata (AnnData) –
Annotated data object with:
obsm[obsm_key]: Archetype distance matrix [n_cells, n_archetypes]obs[obs_key]: Archetype assignments (from bin_cells_by_archetype)Xorlayers[use_layer]: Gene expression data
bin_prop (float, default: 0.1) – Proportion of cells closest to each archetype to use for binning.
obsm_key (str, default: "archetype_distances") – Key in adata.obsm containing archetype distance matrix.
obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.
use_layer (str | None, default: None) – Layer for gene expression. If None, uses adata.X. Auto-selects ‘logcounts’ or ‘log1p’ if available.
test_method (str, default: "mannwhitneyu") – Statistical test method. Currently supports ‘mannwhitneyu’.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method: ‘benjamini_hochberg’ or ‘bonferroni’.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') –
Scope of FDR correction:
'global': Correct across all tests (most stringent)'per_archetype': Correct within each archetype'none': No FDR correction (raw p-values)
test_direction (str, default: "two-sided") – Direction of statistical test: ‘two-sided’, ‘greater’, or ‘less’.
min_logfc (float, default: 0.01) – Minimum absolute log fold change threshold for filtering.
min_cells (int, default: 10) – Minimum cells required per archetype for testing.
comparison_group (str, default: 'all') –
Comparison group for statistical tests:
'all': Compare archetype cells vs ALL other cells'archetypes_only': Compare vs cells in other archetypes only (excludes archetype_0 and unassigned cells)
verbose (bool, default: True) – Whether to print progress messages.
- Returns:
Results with columns:
gene: str - Gene symbol/identifierarchetype: str - Archetype identifiern_archetype_cells: int - Cells in archetypen_other_cells: int - Cells in comparison groupmean_archetype: float - Mean expression in archetypemean_other: float - Mean expression in otherslog_fold_change: float - Log fold changestatistic: float - Mann-Whitney U statisticpvalue: float - Raw p-valuefdr_pvalue: float - FDR-corrected p-valuesignificant: bool - Whether FDR < 0.05direction: str - ‘higher’ or ‘lower’ in archetype
- Return type:
pd.DataFrame
- Raises:
ValueError – If required keys not found in adata.
Examples
>>> # Basic usage >>> results = pc.tl.gene_associations(adata) >>> sig_genes = results[results.significant] >>> # Per-archetype FDR correction (less stringent) >>> results = pc.tl.gene_associations(adata, fdr_scope="per_archetype") >>> # Top markers per archetype >>> for arch in results["archetype"].unique(): ... arch_genes = results[ ... (results["archetype"] == arch) & (results["significant"]) & (results["direction"] == "higher") ... ].nlargest(10, "log_fold_change") ... print(f"{arch}: {arch_genes['gene'].tolist()}")
See also
peach.tl.pathway_associationsPathway-level testing
peach.tl.pattern_analysisComprehensive pattern analysis
peach._core.types.GeneAssociationResultResult row structure
- peach.tl.pathway_associations(adata, *, pathway_obsm_key='pathway_scores', obsm_key='archetype_distances', obs_key='archetypes', test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', test_direction='two-sided', min_logfc=0.01, min_cells=10, comparison_group='all', verbose=True, **kwargs)[source]#
Test pathway activity associations with archetypal assignments.
Performs Mann-Whitney U tests to identify pathways with significantly different activity between each archetype and all other cells.
- Parameters:
adata (AnnData) –
Annotated data object with:
obsm[pathway_obsm_key]: Pathway scores [n_cells, n_pathways]obsm[obsm_key]: Archetype distance matrixobs[obs_key]: Archetype assignmentsuns[pathway_obsm_key + '_pathways']: Pathway names (optional)
pathway_obsm_key (str, default: "pathway_scores") – Key in adata.obsm containing pathway activity scores.
obsm_key (str, default: "archetype_distances") – Key in adata.obsm containing archetype distance matrix.
obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.
test_method (str, default: "mannwhitneyu") – Statistical test method.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') – Scope of FDR correction.
test_direction (str, default: "two-sided") – Direction of statistical test.
min_logfc (float, default: 0.01) – Minimum effect size threshold (mean_diff for pathways).
min_cells (int, default: 10) – Minimum cells required per archetype.
comparison_group (str, default: 'all') – Comparison group: ‘all’ or ‘archetypes_only’.
verbose (bool, default: True) – Whether to print progress.
- Returns:
Results with columns:
pathway: str - Pathway namearchetype: str - Archetype identifiern_archetype_cells: int - Cells in archetypen_other_cells: int - Cells in comparisonmean_archetype: float - Mean score in archetypemean_other: float - Mean score in othersmean_diff: float - Mean difference (primary effect size)log_fold_change: float - Alias for mean_diffstatistic: float - Test statisticpvalue: float - Raw p-valuefdr_pvalue: float - FDR-corrected p-valuesignificant: bool - Whether significantdirection: str - ‘higher’ or ‘lower’
- Return type:
pd.DataFrame
Notes
Pathway scores (from AUCell, pySCENIC, etc.) represent activity levels, not expression counts. Mean difference is more interpretable than log fold change for these scores.
Examples
>>> # Basic usage >>> results = pc.tl.pathway_associations(adata) >>> # Filter for specific pathway categories >>> metabolism = results[results["pathway"].str.contains("METABOLISM", case=False)] >>> # Top pathways per archetype >>> for arch in results["archetype"].unique(): ... top = results[(results["archetype"] == arch) & (results["significant"])].nlargest(5, "mean_diff") ... print(f"{arch}: {top['pathway'].tolist()}")
See also
peach.tl.gene_associationsGene-level testing
peach._core.types.PathwayAssociationResultResult row structure
- peach.tl.pattern_analysis(adata, *, data_obsm_key='pathway_scores', obs_key='archetypes', include_individual_tests=True, include_pattern_tests=True, include_exclusivity_analysis=True, verbose=True, **kwargs)[source]#
Comprehensive archetypal pattern analysis.
Performs systematic analysis combining three complementary approaches:
Individual tests: Standard 1-vs-all archetype characterization
Pattern tests: Systematic archetype combination testing (specialists, binary tradeoffs, complex patterns)
Exclusivity analysis: Features with opposing patterns
- Parameters:
adata (AnnData) – Annotated data object with archetypal assignments and scores.
data_obsm_key (str, default: "pathway_scores") – Key in adata.obsm containing scores for pattern analysis.
obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.
include_individual_tests (bool, default: True) – Run individual archetype 1-vs-all tests.
include_pattern_tests (bool, default: True) – Run systematic pattern tests (specialists, tradeoffs).
include_exclusivity_analysis (bool, default: True) – Analyze mutual exclusivity patterns.
verbose (bool, default: True) – Print analysis progress.
- Returns:
Dictionary with keys:
'individual': Individual archetype results'patterns': Pattern-based test results'exclusivity': Mutual exclusivity results
- Return type:
Notes
Pattern Types in ‘patterns’ DataFrame:
specialization: Archetype vs archetype_0 (centroid)tradeoff: Multi-archetype high vs low groups
Pattern Code Format: “12xxx_xx345”
Position = archetype number (0, 1, 2…)
Numbers = high archetypes
‘x’ = low archetypes
Underscore separates high from low group
Examples
>>> # Run comprehensive analysis >>> results = pc.tl.pattern_analysis(adata) >>> # Access individual results >>> individual = results["individual"] >>> # Find specialists (exclusive to one archetype) >>> patterns = results["patterns"] >>> specialists = patterns[patterns["pattern_type"] == "specialization"] >>> # Find mutual exclusivity patterns >>> if not results["exclusivity"].empty: ... exclusive = results["exclusivity"] ... top_tradeoffs = exclusive.nlargest(10, "effect_range")
See also
peach.tl.archetype_exclusive_patternsFocused exclusive pattern analysis
peach.tl.specialization_patternsCentroid comparison analysis
peach.tl.tradeoff_patternsMutual exclusivity analysis
peach._core.types.ComprehensivePatternResultsReturn type structure
- peach.tl.conditional_associations(adata, *, obs_column, archetype_assignments=None, obs_key='archetypes', test_method='hypergeometric', fdr_method='benjamini_hochberg', min_cells=5, verbose=True, **kwargs)[source]#
Test associations between archetypes and categorical metadata.
Performs hypergeometric tests to identify significant enrichment of archetypes within different categorical conditions (samples, treatments, cell types, etc.).
- Parameters:
adata (AnnData) –
Annotated data object with:
obs[obs_key]: Archetype assignmentsobs[obs_column]: Categorical variable to test
obs_column (str) – Column name in adata.obs containing categorical variable.
archetype_assignments (None, optional) – Deprecated. Archetype assignments now read from adata.obs[obs_key].
obs_key (str, default: "archetypes") – Column in adata.obs containing archetypal assignments.
test_method (str, default: "hypergeometric") – Statistical test method (currently only ‘hypergeometric’).
fdr_method (str, default: "benjamini_hochberg") – FDR correction method.
min_cells (int, default: 5) – Minimum cells required per archetype-condition combination.
verbose (bool, default: True) – Whether to print progress.
- Returns:
Results with columns:
archetype: str - Archetype identifiercondition: str - Condition value from obs_columnobserved: int - Observed count in overlapexpected: float - Expected count under nulltotal_archetype: int - Total cells in archetypetotal_condition: int - Total cells in conditionodds_ratio: float - Enrichment measure (>1 = enriched)ci_lower: float - Lower 95% CI for odds ratioci_upper: float - Upper 95% CI for odds ratiopvalue: float - Hypergeometric p-valuefdr_pvalue: float - FDR-corrected p-valuesignificant: bool - Whether significant
- Return type:
pd.DataFrame
Examples
>>> # Test sample associations >>> results = pc.tl.conditional_associations(adata, obs_column="sample") >>> # Find enriched archetypes per condition >>> enriched = results[(results["significant"]) & (results["odds_ratio"] > 2)] >>> # Test treatment effects >>> treatment_results = pc.tl.conditional_associations(adata, obs_column="treatment")
See also
peach._core.types.ConditionalAssociationResultResult row structure
- peach.tl.archetype_exclusive_patterns(adata, *, data_obsm_key='pathway_scores', obs_key='archetypes', test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', min_effect_size=0.05, min_cells=10, use_pairwise=True, verbose=True, **kwargs)[source]#
Identify features exclusively high in single archetypes.
Finds genes or pathways specifically elevated in only one archetype compared to all others. Supports two methods:
Pairwise (default): Tests each archetype vs every other archetype individually. Feature is exclusive if significantly higher vs ALL others. More stringent.
1-vs-all filtering: Tests each archetype vs all other cells. Feature is exclusive if significant in only ONE archetype’s test. More permissive, higher statistical power.
- Parameters:
adata (AnnData) – Annotated data object with archetypal assignments.
data_obsm_key (str, default: "pathway_scores") – Key in adata.obsm for scores. Use None for gene expression.
obs_key (str, default: "archetypes") – Column in adata.obs with archetypal assignments.
test_method (str, default: "mannwhitneyu") – Statistical test method.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') – Scope of FDR correction.
min_effect_size (float, default: 0.05) – Minimum effect size (mean_diff for pathways, log_fc for genes).
min_cells (int, default: 10) – Minimum cells per archetype.
use_pairwise (bool, default: True) – If True, use rigorous pairwise comparisons. If False, use 1-vs-all filtering.
verbose (bool, default: True) – Print progress.
- Returns:
Results with columns:
pathway/gene: Feature identifierarchetype: Exclusive archetypemean_archetype: Mean in exclusive archetypemean_other: Mean in other archetypesmean_diff/log_fold_change: Effect sizeexclusivity_score: Ratio vs max other archetypepvalue,fdr_pvalue,significantpattern_type: ‘exclusive’ or ‘exclusive_pairwise’
- Return type:
pd.DataFrame
Examples
>>> # Pairwise method (more stringent) >>> exclusive = pc.tl.archetype_exclusive_patterns(adata) >>> # 1-vs-all method (more permissive) >>> exclusive = pc.tl.archetype_exclusive_patterns(adata, use_pairwise=False) >>> # Find markers for specific archetype >>> arch3_markers = exclusive[exclusive["archetype"] == "archetype_3"] >>> top_markers = arch3_markers.nlargest(10, "exclusivity_score")
See also
peach.tl.pattern_analysisComprehensive pattern analysis
peach._core.types.ExclusivePatternResultResult row structure
- peach.tl.specialization_patterns(adata, *, data_obsm_key='pathway_scores', obs_key='archetypes', test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', min_cells=10, verbose=True, **kwargs)[source]#
Identify specialization features relative to centroid archetype.
Compares each archetype to archetype_0 (centroid/generalist) to find features representing specialized states or differentiation away from the central cellular state.
- Parameters:
adata (AnnData) – Annotated data object with archetypal assignments.
data_obsm_key (str, default: "pathway_scores") – Key in adata.obsm for scores.
obs_key (str, default: "archetypes") – Column in adata.obs with archetypal assignments.
test_method (str, default: "mannwhitneyu") – Statistical test method.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') – Scope of FDR correction.
min_cells (int, default: 10) – Minimum cells per archetype.
verbose (bool, default: True) – Print progress.
- Returns:
Results showing specialization from archetype_0.
- Return type:
pd.DataFrame
Notes
Archetype_0 typically represents the centroid or generalist state where cells have balanced contributions from all archetypes. Features elevated in other archetypes relative to archetype_0 represent specialized cellular programs.
Examples
>>> spec = pc.tl.specialization_patterns(adata) >>> # Find archetype_4 specialization features >>> arch4_spec = spec[(spec["archetype"] == "archetype_4") & (spec["significant"])]
See also
peach.tl.archetype_exclusive_patternsExclusive pattern analysis
- peach.tl.tradeoff_patterns(adata, *, data_obsm_key='pathway_scores', obs_key='archetypes', tradeoffs='pairs', test_method='mannwhitneyu', fdr_method='benjamini_hochberg', fdr_scope='global', min_cells=10, min_effect_size=0.1, verbose=True, **kwargs)[source]#
Identify mutual exclusivity and tradeoff patterns.
Finds features showing opposing patterns between archetypes, indicating biological tradeoffs or mutually exclusive states.
- Parameters:
adata (AnnData) – Annotated data object with archetypal assignments.
data_obsm_key (str, default: "pathway_scores") – Key in adata.obsm for scores.
obs_key (str, default: "archetypes") – Column in adata.obs with archetypal assignments.
tradeoffs ({'pairs', 'patterns'}, default: 'pairs') –
Type of tradeoff analysis:
'pairs': Binary pairwise (A high, B low)'patterns': Complex multi-archetype (AB high, CD low)
test_method (str, default: "mannwhitneyu") – Statistical test method.
fdr_method (str, default: "benjamini_hochberg") – FDR correction method.
fdr_scope ({'global', 'per_archetype', 'none'}, default: 'global') – Scope of FDR correction.
min_cells (int, default: 10) – Minimum cells per group.
min_effect_size (float, default: 0.1) – Minimum effect size for tradeoffs.
verbose (bool, default: True) – Print progress.
**kwargs –
Additional parameters:
max_pattern_sizeint, default: 2Maximum archetypes per group for complex patterns.
exclude_archetype_0bool, default: TrueExclude archetype_0 from tradeoff patterns.
specific_patternsList[str], optionalTest only specific patterns (e.g., [‘2v3’, ‘1v45’]).
- Returns:
Results with tradeoff patterns:
pattern_code: Visual pattern codehigh_archetypes,low_archetypes: Groupsmean_high,mean_low: Group meanslog_fold_change: Effect sizepattern_complexity: Number of archetypes involved
- Return type:
pd.DataFrame
Examples
>>> # Find pairwise tradeoffs >>> pairs = pc.tl.tradeoff_patterns(adata, tradeoffs="pairs") >>> # Find complex patterns >>> patterns = pc.tl.tradeoff_patterns(adata, tradeoffs="patterns", max_pattern_size=3) >>> # Test specific hypothesis >>> specific = pc.tl.tradeoff_patterns(adata, specific_patterns=["2v3", "1v4"])
See also
peach.tl.archetype_exclusive_patternsExclusive pattern analysis
peach._core.types.PatternAssociationResultResult row structure
- peach.tl.hyperparameter_search(adata, *, n_archetypes_range=[3, 4, 5, 6], cv_folds=3, max_epochs_cv=15, pca_key='X_pca', device='cpu', base_model_config=None, **kwargs)[source]#
Perform cross-validation hyperparameter search for archetypal analysis.
Systematically searches hyperparameter space using K-fold cross-validation to find optimal model configurations. Results support manual selection of the best configuration for final training.
- Parameters:
adata (AnnData) – Annotated data object with PCA coordinates in
adata.obsm[pca_key]. Runscanpy.pp.pca(adata)first.n_archetypes_range (list[int], default: [3, 4, 5, 6]) – Range of archetype numbers to test. Each value is evaluated via cross-validation.
cv_folds (int, default: 3) – Number of cross-validation folds. Higher values give more reliable estimates but take longer.
max_epochs_cv (int, default: 15) – Maximum training epochs per CV fold. Early stopping typically triggers before this limit.
pca_key (str, default: "X_pca") – Key in
adata.obsmcontaining PCA coordinates. Auto-detects: ‘X_pca’, ‘X_PCA’, ‘PCA’.device (str, default: "cpu") – Computing device (‘cpu’, ‘cuda’, or ‘mps’). Default is ‘cpu’ for stability across platforms.
base_model_config (dict | None, default: None) –
Additional base model configuration. If None, uses defaults:
input_dim: Auto-detected from PCA dimensionsbarycentric_mode: Truedevice: From device parameter
**kwargs –
Additional arguments passed to SearchConfig:
hidden_dims_options: list[list[int]] - Architectures to testinflation_factor_range: list[float] - Inflation factors to testspeed_preset: str - “fast”, “balanced”, or “thorough”use_pcha_init: bool - Use PCHA initializationsubsample_fraction: float - Subsampling for large datasetsmax_cells_cv: int - Maximum cells for CVrandom_state: int - Random seed
- Returns:
Complete cross-validation results with analysis methods:
Attributes:
config_results: dict[str, CVResults] - Per-configuration resultssummary_df: pd.DataFrame - Summary tableranked_configs: list[dict] - Configs ranked by R²cv_info: dict - Search metadata
Methods:
summary_report(): str - Text summary for decision supportrank_by_metric(metric): list[dict] - Rank by any metricplot_elbow_r2(): Figure - Primary visualizationplot_metric(metric): Figure - Generic metric visualizationsave(path)/load(path)- Persistence
Ranked config structure:
Each dict in
ranked_configscontains:hyperparameters: dict with n_archetypes, hidden_dims, etc.metric_value: float - R² valuestd_error: float - Standard error across foldsconfig_summary: str - Human-readable description
- Return type:
CVSummary
- Raises:
ValueError – If
adata.obsm[pca_key]not found.
Notes
Workflow Position: This is Phase 2 of the PEACH pipeline. After finding good hyperparameters here, manually select the best configuration (Phase 3) and train the final model with
pc.tl.train_archetypal()(Phase 4).Large Datasets: Datasets larger than
max_cells_cv(default 15000) are automatically subsampled for CV. This doesn’t affect final training.Selecting Best Configuration: Use
summary_report()for a quick overview,rank_by_metric()for detailed rankings, andplot_elbow_r2()to visualize the elbow curve.Examples
Basic hyperparameter search:
>>> import scanpy as sc >>> import peach as pc >>> # Prepare data >>> sc.pp.pca(adata, n_comps=30) >>> # Search hyperparameters >>> cv_summary = pc.tl.hyperparameter_search(adata, n_archetypes_range=[3, 4, 5, 6], cv_folds=5, device="cuda") >>> # Review results >>> print(cv_summary.summary_report())
Analyze and visualize results:
>>> # Get top configurations >>> top_configs = cv_summary.rank_by_metric("archetype_r2")[:3] >>> for config in top_configs: ... print(f"{config['config_summary']}: R²={config['metric_value']:.4f}") >>> # Elbow curve >>> fig = cv_summary.plot_elbow_r2() >>> fig.show() >>> # Compare metrics >>> fig = cv_summary.plot_metric("rmse") >>> fig.show()
Use selected hyperparameters for final training:
>>> # Select best configuration >>> best_config = top_configs[0]["hyperparameters"] >>> n_archetypes = best_config["n_archetypes"] >>> # Train final model (Phase 4) >>> results = pc.tl.train_archetypal( ... adata, n_archetypes=n_archetypes, n_epochs=200, model_config={"hidden_dims": best_config["hidden_dims"]} ... )
Save and load results:
>>> # Save for later >>> cv_summary.save("cv_results.pkl") >>> # Load in new session >>> from peach._core.utils.grid_search_results import CVSummary >>> cv_summary = CVSummary.load("cv_results.pkl")
See also
peach.tl.train_archetypalTrain final model with selected hyperparameters
peach._core.utils.hyperparameter_search.SearchConfigFull configuration options
peach._core.utils.grid_search_results.CVSummaryReturn type details
- class peach.tl.SearchConfig(n_archetypes_range=None, hidden_dims_options=None, cv_folds=5, max_epochs_cv=100, early_stopping_patience=5, subsample_fraction=0.5, max_cells_cv=15000, speed_preset='balanced', use_pcha_init=True, inflation_factor_range=None, random_state=42)[source]#
Bases:
objectConfiguration for hyperparameter search space and CV settings.
Defines the hyperparameter grid to search and cross-validation parameters. Speed presets automatically adjust epochs and early stopping for different use cases.
- Parameters:
n_archetypes_range (list[int] | None, default: None) – Range of archetype numbers to test. If None, defaults to [2, 3, 4, 5, 6, 7].
hidden_dims_options (list[list[int]] | None, default: None) – Network architectures to test. If None, defaults to standard options: [[128, 64], [256, 128, 64], [128], [512, 256, 128]].
inflation_factor_range (list[float] | None, default: None) – Inflation factors to test. If None, uses [1.5] (Helsinki optimal). Set to multiple values (e.g., [1.0, 1.5, 2.0]) to search inflation.
cv_folds (int, default: 5) – Number of cross-validation folds.
max_epochs_cv (int, default: 100) – Maximum epochs per CV fold (overridden by speed_preset).
early_stopping_patience (int, default: 5) – Patience for early stopping (overridden by speed_preset).
subsample_fraction (float, default: 0.5) – Fraction of data to use for CV when dataset > max_cells_cv.
max_cells_cv (int, default: 15000) – Maximum cells for CV. Larger datasets are subsampled.
speed_preset (str, default: "balanced") –
Training speed preset. Options:
"fast": 25 epochs, patience=3 (quick exploration)"balanced": 50 epochs, patience=5 (recommended)"thorough": 100 epochs, patience=8 (comprehensive)
use_pcha_init (bool, default: True) – Whether to use PCHA initialization for archetypes.
random_state (int, default: 42) – Random seed for reproducibility.
- _search_inflation#
Internal flag indicating whether inflation is being searched (True if inflation_factor_range was explicitly provided).
- Type:
- Raises:
ValueError – If n_archetypes_range contains non-positive integers. If cv_folds <= 0. If max_epochs_cv <= 0. If subsample_fraction not in (0, 1]. If max_cells_cv <= 0. If inflation_factor_range contains non-positive values.
Examples
>>> # Basic configuration >>> config = SearchConfig(n_archetypes_range=[3, 4, 5], cv_folds=3, speed_preset="fast") >>> # Search inflation factors >>> config = SearchConfig(n_archetypes_range=[4, 5, 6], inflation_factor_range=[1.0, 1.5, 2.0], cv_folds=5) >>> print(f"Searching inflation: {config._search_inflation}") # True
See also
ArchetypalGridSearchUses this configuration
peach.tl.hyperparameter_searchUser-facing wrapper
- peach.tl.setup_cellrank(adata, high_purity_threshold=0.8, n_neighbors=30, n_pcs=11, compute_paga=True, solver='gmres', tol=1e-06, terminal_obs_key='archetypes', verbose=True)[source]#
Set up CellRank workflow for archetypal or centroid-based trajectory analysis.
This function orchestrates the complete pipeline from terminal state assignments to fate probabilities, including neighbors computation, UMAP, PAGA, and GPCCA.
- Parameters:
adata (AnnData) – Annotated data matrix. Must contain: - adata.obs[terminal_obs_key] : Terminal state assignments - adata.obsm[‘X_pca’] : PCA coordinates If using archetypes (default): - adata.obsm[‘cell_archetype_weights’] : Barycentric weights - adata.obsm[‘archetype_distances’] : Distances to archetypes
high_purity_threshold (float, optional (default: 0.80)) – Percentile threshold for defining high-purity cells. 0.80 means top 20% of cells per archetype. Only used when terminal_obs_key=’archetypes’.
n_neighbors (int, optional (default: 30)) – Number of neighbors for k-NN graph construction
n_pcs (int, optional (default: 11)) – Number of principal components to use
compute_paga (bool, optional (default: True)) – Whether to compute PAGA connectivity
solver (str, optional (default: 'gmres')) – Solver for fate probability computation (‘gmres’, ‘direct’, etc.)
tol (float, optional (default: 1e-6)) – Tolerance for iterative solver
terminal_obs_key (str, optional (default: 'archetypes')) – Key in adata.obs containing terminal state assignments. Use ‘archetypes’ for standard archetype-based analysis or ‘centroid_assignments’ for treatment phase centroid trajectories (requires running pc.tl.assign_to_centroids() first).
verbose (bool, optional (default: True)) – Print progress messages
- Returns:
ck (cellrank.kernels.ConnectivityKernel) – Computed transition kernel
g (cellrank.estimators.GPCCA) – GPCCA estimator with fate probabilities
Stores in adata
—————
adata.obs[‘terminal_states’] (pd.Series) – Terminal state assignments for high-purity cells
adata.obsm[‘fate_probabilities’] (np.ndarray) – Fate probability matrix (n_obs × n_lineages)
adata.uns[‘lineage_names’] (list) – List of lineage names (archetype or centroid names)
adata.uns[‘cellrank_gpcca’] (GPCCA) – GPCCA estimator object for downstream functions
adata.obsm[‘X_umap’] (np.ndarray) – UMAP coordinates (if not already present)
adata.uns[‘neighbors’] (dict) – k-NN graph (if not already present)
adata.uns[‘paga’] (dict) – PAGA results (if compute_paga=True)
Examples
Basic archetype-based analysis:
>>> import peach as pc >>> ck, g = pc.tl.setup_cellrank(adata)
Treatment phase centroid-based analysis:
>>> # First compute centroids and assign cells >>> pc.tl.compute_conditional_centroids(adata, condition_column="treatment_phase") >>> pc.tl.assign_to_centroids(adata, condition_column="treatment_phase") >>> # Then run CellRank with centroid assignments >>> ck, g = pc.tl.setup_cellrank(adata, terminal_obs_key="centroid_assignments")
Access results:
>>> fate_probs = adata.obsm["fate_probabilities"] >>> lineages = adata.uns["lineage_names"] >>> terminal_states = adata.obs["terminal_states"]
Customize parameters:
>>> ck, g = pc.tl.setup_cellrank( ... adata, ... high_purity_threshold=0.90, # Top 10% of cells ... n_neighbors=50, ... compute_paga=False, ... )
Notes
See also
assign_to_centroidsAssign cells to treatment phase centroids
compute_lineage_pseudotimesConvert fate probabilities to pseudotime
compute_lineage_driversIdentify genes driving lineage commitment
References
- peach.tl.compute_lineage_pseudotimes(adata, lineage_names=None, fate_prob_key='fate_probabilities')[source]#
Convert fate probabilities to lineage-specific pseudotimes.
Creates continuous pseudotime variables for each lineage by using fate probabilities as progression measures. Stores results in adata.obs.
- Parameters:
adata (AnnData) – Annotated data matrix. Must contain: - adata.obsm[‘fate_probabilities’] : Fate probability matrix - adata.uns[‘lineage_names’] : List of lineage names
lineage_names (list of str, optional) – Specific lineages to compute pseudotime for. If None, computes for all lineages in adata.uns[‘lineage_names’]
fate_prob_key (str, optional (default: 'fate_probabilities')) – Key in adata.obsm containing fate probabilities
- Returns:
Stores pseudotime variables in adata.obs with keys: ‘pseudotime_to_{lineage}’ for each lineage
- Return type:
None
Examples
Compute pseudotimes for all lineages:
>>> import peach as pc >>> pc.tl.compute_lineage_pseudotimes(adata)
Access specific pseudotime:
>>> pseudotime = adata.obs["pseudotime_to_archetype_5"]
Compute for specific lineages:
>>> pc.tl.compute_lineage_pseudotimes(adata, lineage_names=["archetype_3", "archetype_5"])
Use for gene trend analysis:
>>> import cellrank as cr >>> cr.pl.gene_trends( ... adata, model=cr.models.GAMR(adata), genes=["RARRES1", "SOD2"], time_key="pseudotime_to_archetype_5" ... )
Notes
Must run setup_cellrank() first to compute fate probabilities
Pseudotime values are simply the fate probabilities (range: 0-1)
Higher pseudotime = higher probability of committing to that lineage
See also
setup_cellrankCompute fate probabilities
compute_lineage_driversIdentify genes driving lineage commitment
- peach.tl.compute_lineage_drivers(adata, lineage, n_genes=100, method='cellrank', **kwargs)[source]#
Identify genes driving commitment to a specific lineage.
Computes correlation between gene expression and fate probabilities to identify lineage-specific marker genes.
- Parameters:
adata (AnnData) – Annotated data matrix with fate probabilities computed
lineage (str) – Target lineage name (e.g., ‘archetype_5’)
n_genes (int, optional (default: 100)) – Number of top genes to return
method (str, optional (default: 'cellrank')) – Method for computing drivers: - ‘cellrank’ : Use CellRank’s compute_lineage_drivers (requires GPCCA object) - ‘correlation’ : Simple Spearman correlation (faster, works without GPCCA)
**kwargs – Additional arguments passed to method
- Returns:
drivers – Top driver genes with statistics: - ‘gene’ : Gene name - ‘lineage’ : Target lineage name - ‘correlation’ : Spearman correlation with fate probability - ‘pvalue’ : P-value from correlation test
- Return type:
pd.DataFrame
Examples
Using CellRank method (GPCCA is automatically stored by setup_cellrank):
>>> import peach as pc >>> ck, g = pc.tl.setup_cellrank(adata) >>> drivers = pc.tl.compute_lineage_drivers(adata, lineage="archetype_5", method="cellrank")
Using correlation method (simpler, faster):
>>> drivers = pc.tl.compute_lineage_drivers(adata, lineage="archetype_5", method="correlation", n_genes=50)
Top genes:
>>> print(drivers.head(10))
Notes
‘cellrank’ method is more sophisticated (uses GAM models)
‘correlation’ method is faster and works without storing GPCCA object
For publication, recommend ‘cellrank’ method with GAMR models
See also
setup_cellrankCompute fate probabilities
compute_lineage_pseudotimesCreate pseudotime variables
- peach.tl.compute_transition_frequencies(adata, start_weight_threshold=0.5, fate_prob_threshold=0.3, lineages=None)[source]#
Compute frequency of transitions between archetypal states.
Identifies cells transitioning from one archetype to another based on their starting archetypal weights and fate probabilities from CellRank.
A transition is counted when a cell has: - High barycentric weight for source archetype (> start_weight_threshold) - High fate probability for target archetype (> fate_prob_threshold)
- Parameters:
adata (AnnData) – Annotated data object with CellRank results. Must contain: - adata.obsm[‘cell_archetype_weights’]: Barycentric weights [n_obs, n_archetypes] - adata.obs[‘archetypes’]: Categorical archetype assignments - adata.obsm[‘fate_probabilities’]: Fate probability matrix [n_obs, n_lineages] - adata.uns[‘lineage_names’]: List of lineage/archetype names - adata.uns[‘cellrank_gpcca’]: GPCCA estimator object (from setup_cellrank)
start_weight_threshold (float, default=0.5) – Minimum barycentric weight to consider a cell as “starting” from an archetype. - 0.5 = top 50% cells per archetype (balanced) - 0.7 = top 30% cells (more stringent) - 0.3 = top 70% cells (more permissive)
fate_prob_threshold (float, default=0.3) – Minimum fate probability to consider a cell as “transitioning to” an archetype. - 0.3 = 30% commitment probability (standard) - 0.5 = 50% commitment (stringent) - 0.2 = 20% commitment (permissive)
lineages (list of str, optional) – Specific lineages/archetypes to analyze. If None, uses all lineages from adata.uns[‘lineage_names’] that start with ‘archetype_’.
- Returns:
Transition frequency matrix with shape [n_archetypes, n_archetypes]. - Index: Source archetypes (starting weight) - Columns: Target archetypes (fate probability) - Values: Integer counts of cells satisfying both thresholds - Diagonal: Cells maintaining their archetype identity - Off-diagonal: Cross-archetype transitions
- Example:
archetype_0 archetype_1 archetype_2 archetype_3
archetype_0 150 45 23 12 archetype_1 12 200 67 8 archetype_2 8 34 180 45 archetype_3 5 15 30 190
- Return type:
pd.DataFrame
- Raises:
ValueError – If required CellRank results are missing (run setup_cellrank() first)
Notes
archetype_0 (centroid) uses categorical assignment instead of weight threshold
Returns raw counts (not normalized probabilities)
Cells can appear in multiple transitions if they meet multiple criteria
Use with PAGA connectivity for complete trajectory analysis
Examples
Basic usage with default thresholds:
>>> import peach as pc >>> # After running setup_cellrank() >>> transitions = pc.tl.compute_transition_frequencies(adata) >>> print(transitions)
Stringent thresholds for high-confidence transitions:
>>> transitions = pc.tl.compute_transition_frequencies( ... adata, ... start_weight_threshold=0.7, # Top 30% cells ... fate_prob_threshold=0.5, # 50% commitment ... )
Analyze specific archetypes only:
>>> transitions = pc.tl.compute_transition_frequencies( ... adata, lineages=["archetype_1", "archetype_2", "archetype_3"] ... )
Visualize with seaborn:
>>> import seaborn as sns >>> import matplotlib.pyplot as plt >>> sns.heatmap(transitions, annot=True, fmt="d", cmap="YlOrRd") >>> plt.title("Archetype Transition Frequencies") >>> plt.show()
See also
setup_cellrankComplete CellRank workflow setup
compute_lineage_pseudotimesConvert fate probabilities to pseudotime
compute_lineage_driversIdentify driver genes for lineage commitment
- peach.tl.single_trajectory_analysis(adata, trajectory, trajectories=None, selection_method='discrete', source_weight_threshold=0.4, target_fate_threshold=0.4, verbose=True)[source]#
Analyze single archetype-to-archetype trajectory.
Filters cells based on source archetype assignment/weight and target fate probability. Returns a subset AnnData ready for CellRank gene trends analysis.
- IMPORTANT: This function requires CellRank setup to be run first:
>>> ck, g = pc.tl.setup_cellrank(adata, high_purity_threshold=0.80) >>> pc.tl.compute_lineage_pseudotimes(adata)
- For driver genes, use CellRank directly:
>>> drivers = g.compute_lineage_drivers(lineages="archetype_3")
- Parameters:
adata (AnnData) – Annotated data matrix. Must contain (from setup_cellrank): - adata.obsm[‘fate_probabilities’] : Fate probability matrix - adata.uns[‘lineage_names’] : List of lineage names - adata.obs[‘pseudotime_to_{archetype}’] : Pseudotime from compute_lineage_pseudotimes - adata.obs[‘archetypes’] : Discrete archetype assignments (for selection_method=’discrete’) - adata.obsm[‘cell_archetype_weights’] : Barycentric weights (for selection_method=’weight’)
trajectory (tuple) – Archetype pair as (source_idx, target_idx), e.g., (0, 3) for archetype_0 → archetype_3.
trajectories (list of tuple, optional) – Multiple trajectory pairs to analyze sequentially. If provided, trajectory is ignored and returns list of results.
selection_method (str, default: 'discrete') – How to select source cells: - ‘discrete’ : Filter by adata.obs[‘archetypes’] == source_archetype - ‘weight’ : Filter by weights[:, source_idx] >= source_weight_threshold - ‘both’ : Compute both and report comparison (uses ‘discrete’ for subset)
source_weight_threshold (float, default: 0.4) – Minimum barycentric weight for source archetype (only used if selection_method=’weight’).
target_fate_threshold (float, default: 0.4) – Minimum fate probability for target archetype selection.
verbose (bool, default: True) – Print progress messages.
- Returns:
Tuple[SingleTrajectoryResult, AnnData] –
result : SingleTrajectoryResult with trajectory metadata
adata_traj : Subset AnnData containing only trajectory cells, ready for CellRank gene trends. If trajectories list provided, returns list of tuples.
Stores in adata
—————
adata.obs[‘trajectory_{src}_to_{tgt}_cells’] (bool) – Boolean mask for cells in trajectory.
adata.uns[‘trajectory_{src}_to_{tgt}’] (dict) – Trajectory analysis metadata.
Examples
Complete workflow with CellRank:
>>> import peach as pc >>> import cellrank as cr >>> >>> # 1. Setup CellRank (computes fate probabilities) >>> ck, g = pc.tl.setup_cellrank(adata, high_purity_threshold=0.80) >>> pc.tl.compute_lineage_pseudotimes(adata) >>> >>> # 2. Analyze trajectory (returns subset AnnData) >>> result, adata_traj = pc.tl.single_trajectory_analysis(adata, trajectory=(4, 5), selection_method="discrete") >>> print(f"Found {result.n_trajectory_cells} cells") >>> >>> # 3. Get drivers from CellRank >>> drivers = g.compute_lineage_drivers(lineages="archetype_5") >>> top_genes = drivers.index[:5].tolist() >>> >>> # 4. Plot gene trends using subset >>> cr.pl.gene_trends(adata_traj, model=cr.models.GAMR(adata_traj), genes=top_genes, time_key=result.pseudotime_key)
Compare selection methods:
>>> result, adata_traj = pc.tl.single_trajectory_analysis(adata, trajectory=(1, 2), selection_method="both") >>> print(f"Discrete: {result.n_discrete_cells} cells") >>> print(f"Weight-based: {result.n_weight_cells} cells")
Notes
Requires setup_cellrank() and compute_lineage_pseudotimes() to be run first
Driver computation is NOT included - use CellRank’s g.compute_lineage_drivers() directly
Pseudotime uses CellRank-computed values from compute_lineage_pseudotimes()
See also
setup_cellrankComplete CellRank workflow setup (computes fate probabilities)
compute_lineage_pseudotimesCompute pseudotime to each lineage
- peach.tl.spatial_neighbors(adata, *, spatial_key='spatial', n_neighs=30, coord_type='generic', **kwargs)[source]#
Build spatial neighbor graph from tissue coordinates.
Wrapper around
squidpy.gr.spatial_neighbors()with PEACH-appropriate defaults for bead-based and imaging-based spatial transcriptomics.- Parameters:
adata (AnnData) – Annotated data with spatial coordinates in
adata.obsm[spatial_key].spatial_key (str, default: "spatial") – Key in
adata.obsmcontaining 2D spatial coordinates.n_neighs (int, default: 30) – Number of nearest neighbors for the spatial graph.
coord_type (str, default: "generic") – Coordinate type. Use “generic” for Slide-seq/MERFISH (continuous coords) or “grid” for Visium (hexagonal grid). See squidpy docs for details.
**kwargs (
Any) – Additional arguments passed tosquidpy.gr.spatial_neighbors().
- Returns:
Modifies
adatain place:adata.obsp['spatial_connectivities']: sparse connectivity matrixadata.obsp['spatial_distances']: sparse distance matrix
- Return type:
None
- peach.tl.archetype_nhood_enrichment(adata, *, cluster_key='archetypes', n_perms=1000, seed=42, **kwargs)[source]#
Test spatial neighborhood enrichment between archetype groups.
For each pair of archetypes, tests whether cells of one archetype are found more or less frequently in the spatial neighborhood of the other archetype than expected by chance (permutation test).
- Parameters:
adata (AnnData) – Annotated data with spatial graph (run
spatial_neighborsfirst) and archetype assignments inadata.obs[cluster_key].cluster_key (str, default: "archetypes") – Column in
adata.obswith archetype labels.n_perms (int, default: 1000) – Number of permutations for significance testing.
seed (int, default: 42) – Random seed for permutation reproducibility.
**kwargs (
Any) – Additional arguments passed tosquidpy.gr.nhood_enrichment().
- Returns:
Dictionary with ‘zscore’ and ‘count’ arrays [n_archetypes x n_archetypes]. Also stored in
adata.uns['archetype_nhood_enrichment'].- Return type:
Examples
>>> pc.tl.spatial_neighbors(adata) >>> result = pc.tl.archetype_nhood_enrichment(adata) >>> print(result['zscore']) # positive = enriched, negative = depleted
- peach.tl.archetype_co_occurrence(adata, *, cluster_key='archetypes', spatial_key='spatial', interval=50, **kwargs)[source]#
Compute distance-dependent co-occurrence of archetype groups.
Measures how the co-occurrence ratio between archetype pairs varies with spatial distance. Useful for identifying distance-dependent spatial relationships (e.g., archetypes that co-occur at short but not long distances).
- Parameters:
adata (AnnData) – Annotated data with spatial coordinates and archetype assignments.
cluster_key (str, default: "archetypes") – Column in
adata.obswith archetype labels.spatial_key (str, default: "spatial") – Key in
adata.obsmwith spatial coordinates.interval (int, default: 50) – Number of distance intervals to evaluate.
**kwargs (
Any) – Additional arguments passed tosquidpy.gr.co_occurrence().
- Returns:
Dictionary with ‘occ’ (co-occurrence ratios) and ‘interval’ (distance bins). Also stored in
adata.uns['archetype_co_occurrence'].- Return type:
Examples
>>> pc.tl.spatial_neighbors(adata) >>> result = pc.tl.archetype_co_occurrence(adata) >>> print(result['occ'].shape) # [n_archetypes, n_archetypes, n_intervals]
- peach.tl.archetype_spatial_autocorr(adata, *, weights_key='cell_archetype_weights', mode='moran', n_perms=100, n_jobs=1, **kwargs)[source]#
Compute spatial autocorrelation (Moran’s I or Geary’s C) per archetype weight.
Tests whether each archetype weight is spatially smooth (positive autocorrelation), spatially random, or spatially checkered (negative autocorrelation). Moran’s I > 0 indicates spatial clustering of that archetype; I ~ 0 indicates random distribution; I < 0 indicates dispersal.
- Parameters:
adata (AnnData) – Annotated data with spatial graph (run
spatial_neighborsfirst) and archetype weights inadata.obsm[weights_key].weights_key (str, default: "archetype_weights") – Key in
adata.obsmwith archetype weight matrix [n_cells, n_archetypes].mode (str, default: "moran") – Autocorrelation statistic: “moran” (Moran’s I) or “geary” (Geary’s C).
n_perms (int, default: 100) – Number of permutations for p-value estimation. Set to None for analytical p-values only.
n_jobs (int, default: 1) – Number of parallel jobs. Default 1 for macOS compatibility.
**kwargs (
Any) – Additional arguments passed tosquidpy.gr.spatial_autocorr().
- Returns:
DataFrame with autocorrelation statistics per archetype weight. Also stored in
adata.uns['archetype_spatial_autocorr'].- Return type:
Examples
>>> pc.tl.spatial_neighbors(adata) >>> autocorr = pc.tl.archetype_spatial_autocorr(adata) >>> print(autocorr) # Moran's I per archetype
- peach.tl.archetype_interaction_boundaries(adata, *, cell_type_col='Cell_Type', weights_key='cell_archetype_weights', cell_type_a=None, cell_type_b=None)[source]#
Detect spatial fronts where archetype weight mixtures diverge between cell types.
Uses the existing spatial neighbor graph (from
pc.tl.spatial_neighbors()) and continuous archetype weights (frompc.tl.extract_archetype_weights()) to identify tissue regions where two cell types run different archetype programs in the same neighborhood.For each cell, computes the mean archetype weight vector of its spatial neighbors that belong to cell type A and cell type B separately, then measures divergence (Jensen-Shannon) between these two distributions.
High-scoring cells sit at spatial fronts where the local archetype program of one cell type is shifting in a different direction from the other — e.g., macrophages trending toward inflammatory archetypes while neighboring fibroblasts trend toward fibrotic archetypes.
- Parameters:
adata (AnnData) –
Annotated data with:
Spatial neighbor graph in
adata.obsp['spatial_connectivities'](frompc.tl.spatial_neighbors())Archetype weights in
adata.obsm[weights_key](frompc.tl.extract_archetype_weights())Cell type labels in
adata.obs[cell_type_col]
cell_type_col (str, default: "Cell_Type") – Column in
adata.obswith cell type labels.weights_key (str, default: "archetype_weights") – Key in
adata.obsmwith archetype weight matrix [n_cells, n_archetypes]. Each row sums to 1.cell_type_a (str | None, default: None) – First cell type for pairwise comparison. If None, uses the two most abundant cell types.
cell_type_b (str | None, default: None) – Second cell type.
- Returns:
Dictionary with:
'boundary_scores': np.ndarray [n_cells] — per-cell JSD boundary score'mean_weights_a': np.ndarray [n_cells, n_archetypes] — mean weight vector of type-A neighbors per cell'mean_weights_b': np.ndarray [n_cells, n_archetypes] — mean weight vector of type-B neighbors per cell'cross_correlations': pd.DataFrame — per-archetype Spearman r between cell types across space'cell_type_a','cell_type_b': str — cell type names'n_archetypes': int — number of archetype dimensions
Also stored in
adata.uns['archetype_interaction_boundaries']and boundary scores inadata.obs['boundary_score'].- Return type:
Examples
>>> pc.tl.spatial_neighbors(adata, n_neighs=30) >>> weights = pc.tl.extract_archetype_weights(adata) >>> result = pc.tl.archetype_interaction_boundaries( ... adata, cell_type_col="Cell_Type", ... cell_type_a="Myeloid", cell_type_b="Treg", ... ) >>> pc.pl.interaction_boundaries(adata)
Modules
Core archetypal analysis functions. |
|
CellRank integration for archetypal trajectory analysis. |
|
Hyperparameter optimization for archetypal analysis. |
|
Spatial archetype analysis functions. |
|
Statistical Testing for Archetypal Analysis |