peach.tl.compute_lineage_drivers

peach.tl.compute_lineage_drivers#

peach.tl.compute_lineage_drivers(adata, lineage, n_genes=100, method='cellrank', **kwargs)[source]#

Identify genes driving commitment to a specific lineage.

Computes correlation between gene expression and fate probabilities to identify lineage-specific marker genes.

Parameters:
  • adata (AnnData) – Annotated data matrix with fate probabilities computed

  • lineage (str) – Target lineage name (e.g., ‘archetype_5’)

  • n_genes (int, optional (default: 100)) – Number of top genes to return

  • method (str, optional (default: 'cellrank')) – Method for computing drivers: - ‘cellrank’ : Use CellRank’s compute_lineage_drivers (requires GPCCA object) - ‘correlation’ : Simple Spearman correlation (faster, works without GPCCA)

  • **kwargs – Additional arguments passed to method

Returns:

drivers – Top driver genes with statistics: - ‘gene’ : Gene name - ‘lineage’ : Target lineage name - ‘correlation’ : Spearman correlation with fate probability - ‘pvalue’ : P-value from correlation test

Return type:

pd.DataFrame

Examples

Using CellRank method (GPCCA is automatically stored by setup_cellrank):

>>> import peach as pc
>>> ck, g = pc.tl.setup_cellrank(adata)
>>> drivers = pc.tl.compute_lineage_drivers(adata, lineage="archetype_5", method="cellrank")

Using correlation method (simpler, faster):

>>> drivers = pc.tl.compute_lineage_drivers(adata, lineage="archetype_5", method="correlation", n_genes=50)

Top genes:

>>> print(drivers.head(10))

Notes

  • ‘cellrank’ method is more sophisticated (uses GAM models)

  • ‘correlation’ method is faster and works without storing GPCCA object

  • For publication, recommend ‘cellrank’ method with GAMR models

See also

setup_cellrank

Compute fate probabilities

compute_lineage_pseudotimes

Create pseudotime variables