peach.tl.cellrank_integration

peach.tl.cellrank_integration#

CellRank integration for archetypal trajectory analysis.

Functions

`compute_lineage_drivers`(adata, lineage[, ...])	Identify genes driving commitment to a specific lineage.
`compute_lineage_pseudotimes`(adata[, ...])	Convert fate probabilities to lineage-specific pseudotimes.
`compute_transition_frequencies`(adata[, ...])	Compute frequency of transitions between archetypal states.
`setup_cellrank`(adata[, ...])	Set up CellRank workflow for archetypal or centroid-based trajectory analysis.
`single_trajectory_analysis`(adata, trajectory)	Analyze single archetype-to-archetype trajectory.

peach.tl.cellrank_integration.setup_cellrank(adata, high_purity_threshold=0.8, n_neighbors=30, n_pcs=11, compute_paga=True, solver='gmres', tol=1e-06, terminal_obs_key='archetypes', verbose=True)[source]#

Set up CellRank workflow for archetypal or centroid-based trajectory analysis.

This function orchestrates the complete pipeline from terminal state assignments to fate probabilities, including neighbors computation, UMAP, PAGA, and GPCCA.

Parameters:

adata (AnnData) – Annotated data matrix. Must contain: - adata.obs[terminal_obs_key] : Terminal state assignments - adata.obsm[‘X_pca’] : PCA coordinates If using archetypes (default): - adata.obsm[‘cell_archetype_weights’] : Barycentric weights - adata.obsm[‘archetype_distances’] : Distances to archetypes
high_purity_threshold (float, optional (default: 0.80)) – Percentile threshold for defining high-purity cells. 0.80 means top 20% of cells per archetype. Only used when terminal_obs_key=’archetypes’.
n_neighbors (int, optional (default: 30)) – Number of neighbors for k-NN graph construction
n_pcs (int, optional (default: 11)) – Number of principal components to use
compute_paga (bool, optional (default: True)) – Whether to compute PAGA connectivity
solver (str, optional (default: 'gmres')) – Solver for fate probability computation (‘gmres’, ‘direct’, etc.)
tol (float, optional (default: 1e-6)) – Tolerance for iterative solver
terminal_obs_key (str, optional (default: 'archetypes')) – Key in adata.obs containing terminal state assignments. Use ‘archetypes’ for standard archetype-based analysis or ‘centroid_assignments’ for treatment phase centroid trajectories (requires running pc.tl.assign_to_centroids() first).
verbose (bool, optional (default: True)) – Print progress messages

Returns:

ck (cellrank.kernels.ConnectivityKernel) – Computed transition kernel
g (cellrank.estimators.GPCCA) – GPCCA estimator with fate probabilities
Stores in adata
—————
adata.obs[‘terminal_states’] (pd.Series) – Terminal state assignments for high-purity cells
adata.obsm[‘fate_probabilities’] (np.ndarray) – Fate probability matrix (n_obs × n_lineages)
adata.uns[‘lineage_names’] (list) – List of lineage names (archetype or centroid names)
adata.uns[‘cellrank_gpcca’] (GPCCA) – GPCCA estimator object for downstream functions
adata.obsm[‘X_umap’] (np.ndarray) – UMAP coordinates (if not already present)
adata.uns[‘neighbors’] (dict) – k-NN graph (if not already present)
adata.uns[‘paga’] (dict) – PAGA results (if compute_paga=True)

Examples

Basic archetype-based analysis:

>>> import peach as pc
>>> ck, g = pc.tl.setup_cellrank(adata)

Treatment phase centroid-based analysis:

>>> # First compute centroids and assign cells
>>> pc.tl.compute_conditional_centroids(adata, condition_column="treatment_phase")
>>> pc.tl.assign_to_centroids(adata, condition_column="treatment_phase")
>>> # Then run CellRank with centroid assignments
>>> ck, g = pc.tl.setup_cellrank(adata, terminal_obs_key="centroid_assignments")

Access results:

>>> fate_probs = adata.obsm["fate_probabilities"]
>>> lineages = adata.uns["lineage_names"]
>>> terminal_states = adata.obs["terminal_states"]

Customize parameters:

>>> ck, g = pc.tl.setup_cellrank(
...     adata,
...     high_purity_threshold=0.90,  # Top 10% of cells
...     n_neighbors=50,
...     compute_paga=False,
... )

Notes

Requires CellRank installation: pip install cellrank
For GAMR models, set R_HOME before importing cellrank: ```python import os

os.environ[“R_HOME”] = “/Library/Frameworks/R.framework/Resources” ```

See also

assign_to_centroids: Assign cells to treatment phase centroids
compute_lineage_pseudotimes: Convert fate probabilities to pseudotime
compute_lineage_drivers: Identify genes driving lineage commitment

References

peach.tl.cellrank_integration.compute_lineage_pseudotimes(adata, lineage_names=None, fate_prob_key='fate_probabilities')[source]#

Convert fate probabilities to lineage-specific pseudotimes.

Creates continuous pseudotime variables for each lineage by using fate probabilities as progression measures. Stores results in adata.obs.

Parameters:

adata (AnnData) – Annotated data matrix. Must contain: - adata.obsm[‘fate_probabilities’] : Fate probability matrix - adata.uns[‘lineage_names’] : List of lineage names
lineage_names (list of str, optional) – Specific lineages to compute pseudotime for. If None, computes for all lineages in adata.uns[‘lineage_names’]
fate_prob_key (str, optional (default: 'fate_probabilities')) – Key in adata.obsm containing fate probabilities

Returns:

Stores pseudotime variables in adata.obs with keys: ‘pseudotime_to_{lineage}’ for each lineage

Return type:

None

Examples

Compute pseudotimes for all lineages:

>>> import peach as pc
>>> pc.tl.compute_lineage_pseudotimes(adata)

Access specific pseudotime:

>>> pseudotime = adata.obs["pseudotime_to_archetype_5"]

Compute for specific lineages:

>>> pc.tl.compute_lineage_pseudotimes(adata, lineage_names=["archetype_3", "archetype_5"])

Use for gene trend analysis:

>>> import cellrank as cr
>>> cr.pl.gene_trends(
...     adata, model=cr.models.GAMR(adata), genes=["RARRES1", "SOD2"], time_key="pseudotime_to_archetype_5"
... )

Notes

Must run setup_cellrank() first to compute fate probabilities
Pseudotime values are simply the fate probabilities (range: 0-1)
Higher pseudotime = higher probability of committing to that lineage

See also

setup_cellrank: Compute fate probabilities
compute_lineage_drivers: Identify genes driving lineage commitment

peach.tl.cellrank_integration.compute_lineage_drivers(adata, lineage, n_genes=100, method='cellrank', **kwargs)[source]#

Identify genes driving commitment to a specific lineage.

Computes correlation between gene expression and fate probabilities to identify lineage-specific marker genes.

Parameters:

adata (AnnData) – Annotated data matrix with fate probabilities computed
lineage (str) – Target lineage name (e.g., ‘archetype_5’)
n_genes (int, optional (default: 100)) – Number of top genes to return
method (str, optional (default: 'cellrank')) – Method for computing drivers: - ‘cellrank’ : Use CellRank’s compute_lineage_drivers (requires GPCCA object) - ‘correlation’ : Simple Spearman correlation (faster, works without GPCCA)
**kwargs – Additional arguments passed to method

Returns:

drivers – Top driver genes with statistics: - ‘gene’ : Gene name - ‘lineage’ : Target lineage name - ‘correlation’ : Spearman correlation with fate probability - ‘pvalue’ : P-value from correlation test

Return type:

pd.DataFrame

Examples

Using CellRank method (GPCCA is automatically stored by setup_cellrank):

>>> import peach as pc
>>> ck, g = pc.tl.setup_cellrank(adata)
>>> drivers = pc.tl.compute_lineage_drivers(adata, lineage="archetype_5", method="cellrank")

Using correlation method (simpler, faster):

>>> drivers = pc.tl.compute_lineage_drivers(adata, lineage="archetype_5", method="correlation", n_genes=50)

Top genes:

>>> print(drivers.head(10))

Notes

‘cellrank’ method is more sophisticated (uses GAM models)
‘correlation’ method is faster and works without storing GPCCA object
For publication, recommend ‘cellrank’ method with GAMR models

See also

setup_cellrank: Compute fate probabilities
compute_lineage_pseudotimes: Create pseudotime variables

peach.tl.cellrank_integration.compute_transition_frequencies(adata, start_weight_threshold=0.5, fate_prob_threshold=0.3, lineages=None)[source]#

Compute frequency of transitions between archetypal states.

Identifies cells transitioning from one archetype to another based on their starting archetypal weights and fate probabilities from CellRank.

A transition is counted when a cell has: - High barycentric weight for source archetype (> start_weight_threshold) - High fate probability for target archetype (> fate_prob_threshold)

Parameters:

adata (AnnData) – Annotated data object with CellRank results. Must contain: - adata.obsm[‘cell_archetype_weights’]: Barycentric weights [n_obs, n_archetypes] - adata.obs[‘archetypes’]: Categorical archetype assignments - adata.obsm[‘fate_probabilities’]: Fate probability matrix [n_obs, n_lineages] - adata.uns[‘lineage_names’]: List of lineage/archetype names - adata.uns[‘cellrank_gpcca’]: GPCCA estimator object (from setup_cellrank)
start_weight_threshold (float, default=0.5) – Minimum barycentric weight to consider a cell as “starting” from an archetype. - 0.5 = top 50% cells per archetype (balanced) - 0.7 = top 30% cells (more stringent) - 0.3 = top 70% cells (more permissive)
fate_prob_threshold (float, default=0.3) – Minimum fate probability to consider a cell as “transitioning to” an archetype. - 0.3 = 30% commitment probability (standard) - 0.5 = 50% commitment (stringent) - 0.2 = 20% commitment (permissive)
lineages (list of str, optional) – Specific lineages/archetypes to analyze. If None, uses all lineages from adata.uns[‘lineage_names’] that start with ‘archetype_’.

Returns:

Transition frequency matrix with shape [n_archetypes, n_archetypes]. - Index: Source archetypes (starting weight) - Columns: Target archetypes (fate probability) - Values: Integer counts of cells satisfying both thresholds - Diagonal: Cells maintaining their archetype identity - Off-diagonal: Cross-archetype transitions

Example:: archetype_0 archetype_1 archetype_2 archetype_3

archetype_0 150 45 23 12 archetype_1 12 200 67 8 archetype_2 8 34 180 45 archetype_3 5 15 30 190

Return type:

pd.DataFrame

Raises:

ValueError – If required CellRank results are missing (run setup_cellrank() first)

Notes

archetype_0 (centroid) uses categorical assignment instead of weight threshold
Returns raw counts (not normalized probabilities)
Cells can appear in multiple transitions if they meet multiple criteria
Use with PAGA connectivity for complete trajectory analysis

Examples

Basic usage with default thresholds:

>>> import peach as pc
>>> # After running setup_cellrank()
>>> transitions = pc.tl.compute_transition_frequencies(adata)
>>> print(transitions)

Stringent thresholds for high-confidence transitions:

>>> transitions = pc.tl.compute_transition_frequencies(
...     adata,
...     start_weight_threshold=0.7,  # Top 30% cells
...     fate_prob_threshold=0.5,  # 50% commitment
... )

Analyze specific archetypes only:

>>> transitions = pc.tl.compute_transition_frequencies(
...     adata, lineages=["archetype_1", "archetype_2", "archetype_3"]
... )

Visualize with seaborn:

>>> import seaborn as sns
>>> import matplotlib.pyplot as plt
>>> sns.heatmap(transitions, annot=True, fmt="d", cmap="YlOrRd")
>>> plt.title("Archetype Transition Frequencies")
>>> plt.show()

See also

setup_cellrank: Complete CellRank workflow setup
compute_lineage_pseudotimes: Convert fate probabilities to pseudotime
compute_lineage_drivers: Identify driver genes for lineage commitment

peach.tl.cellrank_integration.single_trajectory_analysis(adata, trajectory, trajectories=None, selection_method='discrete', source_weight_threshold=0.4, target_fate_threshold=0.4, verbose=True)[source]#

Analyze single archetype-to-archetype trajectory.

Filters cells based on source archetype assignment/weight and target fate probability. Returns a subset AnnData ready for CellRank gene trends analysis.

IMPORTANT: This function requires CellRank setup to be run first:

>>> ck, g = pc.tl.setup_cellrank(adata, high_purity_threshold=0.80)
>>> pc.tl.compute_lineage_pseudotimes(adata)

For driver genes, use CellRank directly:

>>> drivers = g.compute_lineage_drivers(lineages="archetype_3")

Parameters:

adata (AnnData) – Annotated data matrix. Must contain (from setup_cellrank): - adata.obsm[‘fate_probabilities’] : Fate probability matrix - adata.uns[‘lineage_names’] : List of lineage names - adata.obs[‘pseudotime_to_{archetype}’] : Pseudotime from compute_lineage_pseudotimes - adata.obs[‘archetypes’] : Discrete archetype assignments (for selection_method=’discrete’) - adata.obsm[‘cell_archetype_weights’] : Barycentric weights (for selection_method=’weight’)
trajectory (tuple) – Archetype pair as (source_idx, target_idx), e.g., (0, 3) for archetype_0 → archetype_3.
trajectories (list of tuple, optional) – Multiple trajectory pairs to analyze sequentially. If provided, trajectory is ignored and returns list of results.
selection_method (str, default: 'discrete') – How to select source cells: - ‘discrete’ : Filter by adata.obs[‘archetypes’] == source_archetype - ‘weight’ : Filter by weights[:, source_idx] >= source_weight_threshold - ‘both’ : Compute both and report comparison (uses ‘discrete’ for subset)
source_weight_threshold (float, default: 0.4) – Minimum barycentric weight for source archetype (only used if selection_method=’weight’).
target_fate_threshold (float, default: 0.4) – Minimum fate probability for target archetype selection.
verbose (bool, default: True) – Print progress messages.

Returns:

Tuple[SingleTrajectoryResult, AnnData] –
- result : SingleTrajectoryResult with trajectory metadata
- adata_traj : Subset AnnData containing only trajectory cells, ready for CellRank gene trends. If trajectories list provided, returns list of tuples.
Stores in adata
—————
adata.obs[‘trajectory_{src}_to_{tgt}_cells’] (bool) – Boolean mask for cells in trajectory.
adata.uns[‘trajectory_{src}_to_{tgt}’] (dict) – Trajectory analysis metadata.

Examples

Complete workflow with CellRank:

>>> import peach as pc
>>> import cellrank as cr
>>>
>>> # 1. Setup CellRank (computes fate probabilities)
>>> ck, g = pc.tl.setup_cellrank(adata, high_purity_threshold=0.80)
>>> pc.tl.compute_lineage_pseudotimes(adata)
>>>
>>> # 2. Analyze trajectory (returns subset AnnData)
>>> result, adata_traj = pc.tl.single_trajectory_analysis(adata, trajectory=(4, 5), selection_method="discrete")
>>> print(f"Found {result.n_trajectory_cells} cells")
>>>
>>> # 3. Get drivers from CellRank
>>> drivers = g.compute_lineage_drivers(lineages="archetype_5")
>>> top_genes = drivers.index[:5].tolist()
>>>
>>> # 4. Plot gene trends using subset
>>> cr.pl.gene_trends(adata_traj, model=cr.models.GAMR(adata_traj), genes=top_genes, time_key=result.pseudotime_key)

Compare selection methods:

>>> result, adata_traj = pc.tl.single_trajectory_analysis(adata, trajectory=(1, 2), selection_method="both")
>>> print(f"Discrete: {result.n_discrete_cells} cells")
>>> print(f"Weight-based: {result.n_weight_cells} cells")

Notes

Requires setup_cellrank() and compute_lineage_pseudotimes() to be run first
Driver computation is NOT included - use CellRank’s g.compute_lineage_drivers() directly
Pseudotime uses CellRank-computed values from compute_lineage_pseudotimes()

See also

setup_cellrank: Complete CellRank workflow setup (computes fate probabilities)
compute_lineage_pseudotimes: Compute pseudotime to each lineage

peach.tl.cellrank_integration

Contents

peach.tl.cellrank_integration#