peach.tl.hyperparameter_search#
- peach.tl.hyperparameter_search(adata, *, n_archetypes_range=[3, 4, 5, 6], cv_folds=3, max_epochs_cv=15, pca_key='X_pca', device='cpu', base_model_config=None, **kwargs)[source]#
Perform cross-validation hyperparameter search for archetypal analysis.
Systematically searches hyperparameter space using K-fold cross-validation to find optimal model configurations. Results support manual selection of the best configuration for final training.
- Parameters:
adata (AnnData) – Annotated data object with PCA coordinates in
adata.obsm[pca_key]. Runscanpy.pp.pca(adata)first.n_archetypes_range (list[int], default: [3, 4, 5, 6]) – Range of archetype numbers to test. Each value is evaluated via cross-validation.
cv_folds (int, default: 3) – Number of cross-validation folds. Higher values give more reliable estimates but take longer.
max_epochs_cv (int, default: 15) – Maximum training epochs per CV fold. Early stopping typically triggers before this limit.
pca_key (str, default: "X_pca") – Key in
adata.obsmcontaining PCA coordinates. Auto-detects: ‘X_pca’, ‘X_PCA’, ‘PCA’.device (str, default: "cpu") – Computing device (‘cpu’, ‘cuda’, or ‘mps’). Default is ‘cpu’ for stability across platforms.
base_model_config (dict | None, default: None) –
Additional base model configuration. If None, uses defaults:
input_dim: Auto-detected from PCA dimensionsbarycentric_mode: Truedevice: From device parameter
**kwargs –
Additional arguments passed to SearchConfig:
hidden_dims_options: list[list[int]] - Architectures to testinflation_factor_range: list[float] - Inflation factors to testspeed_preset: str - “fast”, “balanced”, or “thorough”use_pcha_init: bool - Use PCHA initializationsubsample_fraction: float - Subsampling for large datasetsmax_cells_cv: int - Maximum cells for CVrandom_state: int - Random seed
- Returns:
Complete cross-validation results with analysis methods:
Attributes:
config_results: dict[str, CVResults] - Per-configuration resultssummary_df: pd.DataFrame - Summary tableranked_configs: list[dict] - Configs ranked by R²cv_info: dict - Search metadata
Methods:
summary_report(): str - Text summary for decision supportrank_by_metric(metric): list[dict] - Rank by any metricplot_elbow_r2(): Figure - Primary visualizationplot_metric(metric): Figure - Generic metric visualizationsave(path)/load(path)- Persistence
Ranked config structure:
Each dict in
ranked_configscontains:hyperparameters: dict with n_archetypes, hidden_dims, etc.metric_value: float - R² valuestd_error: float - Standard error across foldsconfig_summary: str - Human-readable description
- Return type:
CVSummary
- Raises:
ValueError – If
adata.obsm[pca_key]not found.
Notes
Workflow Position: This is Phase 2 of the PEACH pipeline. After finding good hyperparameters here, manually select the best configuration (Phase 3) and train the final model with
pc.tl.train_archetypal()(Phase 4).Large Datasets: Datasets larger than
max_cells_cv(default 15000) are automatically subsampled for CV. This doesn’t affect final training.Selecting Best Configuration: Use
summary_report()for a quick overview,rank_by_metric()for detailed rankings, andplot_elbow_r2()to visualize the elbow curve.Examples
Basic hyperparameter search:
>>> import scanpy as sc >>> import peach as pc >>> # Prepare data >>> sc.pp.pca(adata, n_comps=30) >>> # Search hyperparameters >>> cv_summary = pc.tl.hyperparameter_search(adata, n_archetypes_range=[3, 4, 5, 6], cv_folds=5, device="cuda") >>> # Review results >>> print(cv_summary.summary_report())
Analyze and visualize results:
>>> # Get top configurations >>> top_configs = cv_summary.rank_by_metric("archetype_r2")[:3] >>> for config in top_configs: ... print(f"{config['config_summary']}: R²={config['metric_value']:.4f}") >>> # Elbow curve >>> fig = cv_summary.plot_elbow_r2() >>> fig.show() >>> # Compare metrics >>> fig = cv_summary.plot_metric("rmse") >>> fig.show()
Use selected hyperparameters for final training:
>>> # Select best configuration >>> best_config = top_configs[0]["hyperparameters"] >>> n_archetypes = best_config["n_archetypes"] >>> # Train final model (Phase 4) >>> results = pc.tl.train_archetypal( ... adata, n_archetypes=n_archetypes, n_epochs=200, model_config={"hidden_dims": best_config["hidden_dims"]} ... )
Save and load results:
>>> # Save for later >>> cv_summary.save("cv_results.pkl") >>> # Load in new session >>> from peach._core.utils.grid_search_results import CVSummary >>> cv_summary = CVSummary.load("cv_results.pkl")
See also
peach.tl.train_archetypalTrain final model with selected hyperparameters
peach._core.utils.hyperparameter_search.SearchConfigFull configuration options
peach._core.utils.grid_search_results.CVSummaryReturn type details