peach.tl.cellrank_integration#

CellRank integration for archetypal trajectory analysis.

Functions

compute_lineage_drivers(adata, lineage[, ...])

Identify genes driving commitment to a specific lineage.

compute_lineage_pseudotimes(adata[, ...])

Convert fate probabilities to lineage-specific pseudotimes.

compute_transition_frequencies(adata[, ...])

Compute frequency of transitions between archetypal states.

setup_cellrank(adata[, ...])

Set up CellRank workflow for archetypal or centroid-based trajectory analysis.

single_trajectory_analysis(adata, trajectory)

Analyze single archetype-to-archetype trajectory.

peach.tl.cellrank_integration.setup_cellrank(adata, high_purity_threshold=0.8, n_neighbors=30, n_pcs=11, compute_paga=True, solver='gmres', tol=1e-06, terminal_obs_key='archetypes', verbose=True)[source]#

Set up CellRank workflow for archetypal or centroid-based trajectory analysis.

This function orchestrates the complete pipeline from terminal state assignments to fate probabilities, including neighbors computation, UMAP, PAGA, and GPCCA.

Parameters:
  • adata (AnnData) – Annotated data matrix. Must contain: - adata.obs[terminal_obs_key] : Terminal state assignments - adata.obsm[‘X_pca’] : PCA coordinates If using archetypes (default): - adata.obsm[‘cell_archetype_weights’] : Barycentric weights - adata.obsm[‘archetype_distances’] : Distances to archetypes

  • high_purity_threshold (float, optional (default: 0.80)) – Percentile threshold for defining high-purity cells. 0.80 means top 20% of cells per archetype. Only used when terminal_obs_key=’archetypes’.

  • n_neighbors (int, optional (default: 30)) – Number of neighbors for k-NN graph construction

  • n_pcs (int, optional (default: 11)) – Number of principal components to use

  • compute_paga (bool, optional (default: True)) – Whether to compute PAGA connectivity

  • solver (str, optional (default: 'gmres')) – Solver for fate probability computation (‘gmres’, ‘direct’, etc.)

  • tol (float, optional (default: 1e-6)) – Tolerance for iterative solver

  • terminal_obs_key (str, optional (default: 'archetypes')) – Key in adata.obs containing terminal state assignments. Use ‘archetypes’ for standard archetype-based analysis or ‘centroid_assignments’ for treatment phase centroid trajectories (requires running pc.tl.assign_to_centroids() first).

  • verbose (bool, optional (default: True)) – Print progress messages

Returns:

  • ck (cellrank.kernels.ConnectivityKernel) – Computed transition kernel

  • g (cellrank.estimators.GPCCA) – GPCCA estimator with fate probabilities

  • Stores in adata

  • —————

  • adata.obs[‘terminal_states’] (pd.Series) – Terminal state assignments for high-purity cells

  • adata.obsm[‘fate_probabilities’] (np.ndarray) – Fate probability matrix (n_obs × n_lineages)

  • adata.uns[‘lineage_names’] (list) – List of lineage names (archetype or centroid names)

  • adata.uns[‘cellrank_gpcca’] (GPCCA) – GPCCA estimator object for downstream functions

  • adata.obsm[‘X_umap’] (np.ndarray) – UMAP coordinates (if not already present)

  • adata.uns[‘neighbors’] (dict) – k-NN graph (if not already present)

  • adata.uns[‘paga’] (dict) – PAGA results (if compute_paga=True)

Examples

Basic archetype-based analysis:

>>> import peach as pc
>>> ck, g = pc.tl.setup_cellrank(adata)

Treatment phase centroid-based analysis:

>>> # First compute centroids and assign cells
>>> pc.tl.compute_conditional_centroids(adata, condition_column="treatment_phase")
>>> pc.tl.assign_to_centroids(adata, condition_column="treatment_phase")
>>> # Then run CellRank with centroid assignments
>>> ck, g = pc.tl.setup_cellrank(adata, terminal_obs_key="centroid_assignments")

Access results:

>>> fate_probs = adata.obsm["fate_probabilities"]
>>> lineages = adata.uns["lineage_names"]
>>> terminal_states = adata.obs["terminal_states"]

Customize parameters:

>>> ck, g = pc.tl.setup_cellrank(
...     adata,
...     high_purity_threshold=0.90,  # Top 10% of cells
...     n_neighbors=50,
...     compute_paga=False,
... )

Notes

  • Requires CellRank installation: pip install cellrank

  • For GAMR models, set R_HOME before importing cellrank: ```python import os

    os.environ[“R_HOME”] = “/Library/Frameworks/R.framework/Resources” ```

See also

assign_to_centroids

Assign cells to treatment phase centroids

compute_lineage_pseudotimes

Convert fate probabilities to pseudotime

compute_lineage_drivers

Identify genes driving lineage commitment

References

peach.tl.cellrank_integration.compute_lineage_pseudotimes(adata, lineage_names=None, fate_prob_key='fate_probabilities')[source]#

Convert fate probabilities to lineage-specific pseudotimes.

Creates continuous pseudotime variables for each lineage by using fate probabilities as progression measures. Stores results in adata.obs.

Parameters:
  • adata (AnnData) – Annotated data matrix. Must contain: - adata.obsm[‘fate_probabilities’] : Fate probability matrix - adata.uns[‘lineage_names’] : List of lineage names

  • lineage_names (list of str, optional) – Specific lineages to compute pseudotime for. If None, computes for all lineages in adata.uns[‘lineage_names’]

  • fate_prob_key (str, optional (default: 'fate_probabilities')) – Key in adata.obsm containing fate probabilities

Returns:

Stores pseudotime variables in adata.obs with keys: ‘pseudotime_to_{lineage}’ for each lineage

Return type:

None

Examples

Compute pseudotimes for all lineages:

>>> import peach as pc
>>> pc.tl.compute_lineage_pseudotimes(adata)

Access specific pseudotime:

>>> pseudotime = adata.obs["pseudotime_to_archetype_5"]

Compute for specific lineages:

>>> pc.tl.compute_lineage_pseudotimes(adata, lineage_names=["archetype_3", "archetype_5"])

Use for gene trend analysis:

>>> import cellrank as cr
>>> cr.pl.gene_trends(
...     adata, model=cr.models.GAMR(adata), genes=["RARRES1", "SOD2"], time_key="pseudotime_to_archetype_5"
... )

Notes

  • Must run setup_cellrank() first to compute fate probabilities

  • Pseudotime values are simply the fate probabilities (range: 0-1)

  • Higher pseudotime = higher probability of committing to that lineage

See also

setup_cellrank

Compute fate probabilities

compute_lineage_drivers

Identify genes driving lineage commitment

peach.tl.cellrank_integration.compute_lineage_drivers(adata, lineage, n_genes=100, method='cellrank', **kwargs)[source]#

Identify genes driving commitment to a specific lineage.

Computes correlation between gene expression and fate probabilities to identify lineage-specific marker genes.

Parameters:
  • adata (AnnData) – Annotated data matrix with fate probabilities computed

  • lineage (str) – Target lineage name (e.g., ‘archetype_5’)

  • n_genes (int, optional (default: 100)) – Number of top genes to return

  • method (str, optional (default: 'cellrank')) – Method for computing drivers: - ‘cellrank’ : Use CellRank’s compute_lineage_drivers (requires GPCCA object) - ‘correlation’ : Simple Spearman correlation (faster, works without GPCCA)

  • **kwargs – Additional arguments passed to method

Returns:

drivers – Top driver genes with statistics: - ‘gene’ : Gene name - ‘lineage’ : Target lineage name - ‘correlation’ : Spearman correlation with fate probability - ‘pvalue’ : P-value from correlation test

Return type:

pd.DataFrame

Examples

Using CellRank method (GPCCA is automatically stored by setup_cellrank):

>>> import peach as pc
>>> ck, g = pc.tl.setup_cellrank(adata)
>>> drivers = pc.tl.compute_lineage_drivers(adata, lineage="archetype_5", method="cellrank")

Using correlation method (simpler, faster):

>>> drivers = pc.tl.compute_lineage_drivers(adata, lineage="archetype_5", method="correlation", n_genes=50)

Top genes:

>>> print(drivers.head(10))

Notes

  • ‘cellrank’ method is more sophisticated (uses GAM models)

  • ‘correlation’ method is faster and works without storing GPCCA object

  • For publication, recommend ‘cellrank’ method with GAMR models

See also

setup_cellrank

Compute fate probabilities

compute_lineage_pseudotimes

Create pseudotime variables

peach.tl.cellrank_integration.compute_transition_frequencies(adata, start_weight_threshold=0.5, fate_prob_threshold=0.3, lineages=None)[source]#

Compute frequency of transitions between archetypal states.

Identifies cells transitioning from one archetype to another based on their starting archetypal weights and fate probabilities from CellRank.

A transition is counted when a cell has: - High barycentric weight for source archetype (> start_weight_threshold) - High fate probability for target archetype (> fate_prob_threshold)

Parameters:
  • adata (AnnData) – Annotated data object with CellRank results. Must contain: - adata.obsm[‘cell_archetype_weights’]: Barycentric weights [n_obs, n_archetypes] - adata.obs[‘archetypes’]: Categorical archetype assignments - adata.obsm[‘fate_probabilities’]: Fate probability matrix [n_obs, n_lineages] - adata.uns[‘lineage_names’]: List of lineage/archetype names - adata.uns[‘cellrank_gpcca’]: GPCCA estimator object (from setup_cellrank)

  • start_weight_threshold (float, default=0.5) – Minimum barycentric weight to consider a cell as “starting” from an archetype. - 0.5 = top 50% cells per archetype (balanced) - 0.7 = top 30% cells (more stringent) - 0.3 = top 70% cells (more permissive)

  • fate_prob_threshold (float, default=0.3) – Minimum fate probability to consider a cell as “transitioning to” an archetype. - 0.3 = 30% commitment probability (standard) - 0.5 = 50% commitment (stringent) - 0.2 = 20% commitment (permissive)

  • lineages (list of str, optional) – Specific lineages/archetypes to analyze. If None, uses all lineages from adata.uns[‘lineage_names’] that start with ‘archetype_’.

Returns:

Transition frequency matrix with shape [n_archetypes, n_archetypes]. - Index: Source archetypes (starting weight) - Columns: Target archetypes (fate probability) - Values: Integer counts of cells satisfying both thresholds - Diagonal: Cells maintaining their archetype identity - Off-diagonal: Cross-archetype transitions

Example:

archetype_0 archetype_1 archetype_2 archetype_3

archetype_0 150 45 23 12 archetype_1 12 200 67 8 archetype_2 8 34 180 45 archetype_3 5 15 30 190

Return type:

pd.DataFrame

Raises:

ValueError – If required CellRank results are missing (run setup_cellrank() first)

Notes

  • archetype_0 (centroid) uses categorical assignment instead of weight threshold

  • Returns raw counts (not normalized probabilities)

  • Cells can appear in multiple transitions if they meet multiple criteria

  • Use with PAGA connectivity for complete trajectory analysis

Examples

Basic usage with default thresholds:

>>> import peach as pc
>>> # After running setup_cellrank()
>>> transitions = pc.tl.compute_transition_frequencies(adata)
>>> print(transitions)

Stringent thresholds for high-confidence transitions:

>>> transitions = pc.tl.compute_transition_frequencies(
...     adata,
...     start_weight_threshold=0.7,  # Top 30% cells
...     fate_prob_threshold=0.5,  # 50% commitment
... )

Analyze specific archetypes only:

>>> transitions = pc.tl.compute_transition_frequencies(
...     adata, lineages=["archetype_1", "archetype_2", "archetype_3"]
... )

Visualize with seaborn:

>>> import seaborn as sns
>>> import matplotlib.pyplot as plt
>>> sns.heatmap(transitions, annot=True, fmt="d", cmap="YlOrRd")
>>> plt.title("Archetype Transition Frequencies")
>>> plt.show()

See also

setup_cellrank

Complete CellRank workflow setup

compute_lineage_pseudotimes

Convert fate probabilities to pseudotime

compute_lineage_drivers

Identify driver genes for lineage commitment

peach.tl.cellrank_integration.single_trajectory_analysis(adata, trajectory, trajectories=None, selection_method='discrete', source_weight_threshold=0.4, target_fate_threshold=0.4, verbose=True)[source]#

Analyze single archetype-to-archetype trajectory.

Filters cells based on source archetype assignment/weight and target fate probability. Returns a subset AnnData ready for CellRank gene trends analysis.

IMPORTANT: This function requires CellRank setup to be run first:
>>> ck, g = pc.tl.setup_cellrank(adata, high_purity_threshold=0.80)
>>> pc.tl.compute_lineage_pseudotimes(adata)
For driver genes, use CellRank directly:
>>> drivers = g.compute_lineage_drivers(lineages="archetype_3")
Parameters:
  • adata (AnnData) – Annotated data matrix. Must contain (from setup_cellrank): - adata.obsm[‘fate_probabilities’] : Fate probability matrix - adata.uns[‘lineage_names’] : List of lineage names - adata.obs[‘pseudotime_to_{archetype}’] : Pseudotime from compute_lineage_pseudotimes - adata.obs[‘archetypes’] : Discrete archetype assignments (for selection_method=’discrete’) - adata.obsm[‘cell_archetype_weights’] : Barycentric weights (for selection_method=’weight’)

  • trajectory (tuple) – Archetype pair as (source_idx, target_idx), e.g., (0, 3) for archetype_0 → archetype_3.

  • trajectories (list of tuple, optional) – Multiple trajectory pairs to analyze sequentially. If provided, trajectory is ignored and returns list of results.

  • selection_method (str, default: 'discrete') – How to select source cells: - ‘discrete’ : Filter by adata.obs[‘archetypes’] == source_archetype - ‘weight’ : Filter by weights[:, source_idx] >= source_weight_threshold - ‘both’ : Compute both and report comparison (uses ‘discrete’ for subset)

  • source_weight_threshold (float, default: 0.4) – Minimum barycentric weight for source archetype (only used if selection_method=’weight’).

  • target_fate_threshold (float, default: 0.4) – Minimum fate probability for target archetype selection.

  • verbose (bool, default: True) – Print progress messages.

Returns:

  • Tuple[SingleTrajectoryResult, AnnData]

    • result : SingleTrajectoryResult with trajectory metadata

    • adata_traj : Subset AnnData containing only trajectory cells, ready for CellRank gene trends. If trajectories list provided, returns list of tuples.

  • Stores in adata

  • —————

  • adata.obs[‘trajectory_{src}_to_{tgt}_cells’] (bool) – Boolean mask for cells in trajectory.

  • adata.uns[‘trajectory_{src}_to_{tgt}’] (dict) – Trajectory analysis metadata.

Examples

Complete workflow with CellRank:

>>> import peach as pc
>>> import cellrank as cr
>>>
>>> # 1. Setup CellRank (computes fate probabilities)
>>> ck, g = pc.tl.setup_cellrank(adata, high_purity_threshold=0.80)
>>> pc.tl.compute_lineage_pseudotimes(adata)
>>>
>>> # 2. Analyze trajectory (returns subset AnnData)
>>> result, adata_traj = pc.tl.single_trajectory_analysis(adata, trajectory=(4, 5), selection_method="discrete")
>>> print(f"Found {result.n_trajectory_cells} cells")
>>>
>>> # 3. Get drivers from CellRank
>>> drivers = g.compute_lineage_drivers(lineages="archetype_5")
>>> top_genes = drivers.index[:5].tolist()
>>>
>>> # 4. Plot gene trends using subset
>>> cr.pl.gene_trends(adata_traj, model=cr.models.GAMR(adata_traj), genes=top_genes, time_key=result.pseudotime_key)

Compare selection methods:

>>> result, adata_traj = pc.tl.single_trajectory_analysis(adata, trajectory=(1, 2), selection_method="both")
>>> print(f"Discrete: {result.n_discrete_cells} cells")
>>> print(f"Weight-based: {result.n_weight_cells} cells")

Notes

  • Requires setup_cellrank() and compute_lineage_pseudotimes() to be run first

  • Driver computation is NOT included - use CellRank’s g.compute_lineage_drivers() directly

  • Pseudotime uses CellRank-computed values from compute_lineage_pseudotimes()

See also

setup_cellrank

Complete CellRank workflow setup (computes fate probabilities)

compute_lineage_pseudotimes

Compute pseudotime to each lineage