peach.pp.prepare_training

peach.pp.prepare_training#

peach.pp.prepare_training(adata, batch_size=128, shuffle=True, pca_key=None, num_workers='auto', pin_memory='auto', persistent_workers='auto', prefetch_factor=2)[source]#

Create DataLoader from AnnData for training with HPC optimizations.

Parameters:
  • adata (AnnData) – Annotated data object with PCA coordinates

  • batch_size (int, default: 128) – Batch size for training

  • shuffle (bool, default: True) – Whether to shuffle data in DataLoader

  • pca_key (str, default: None) – Key in adata.obsm containing PCA coordinates (auto-detected if None)

  • num_workers (int or 'auto', default: 'auto') – Number of subprocesses for data loading. β€˜auto’ detects optimal value based on environment (0 for Apple Silicon, 6 for HPC, 2 for local)

  • pin_memory (bool or 'auto', default: 'auto') – Use pinned memory for faster GPU transfer. β€˜auto’ sets True if CUDA available

  • persistent_workers (bool or 'auto', default: 'auto') – Keep workers alive between epochs. β€˜auto’ sets True if num_workers > 0

  • prefetch_factor (int, default: 2) – Number of batches loaded in advance by each worker

Returns:

PyTorch DataLoader optimized for the execution environment

Return type:

DataLoader

Examples

>>> # Auto-detect optimal settings
>>> dataloader = peach.pp.prepare_training(adata)
>>> # Force HPC settings
>>> dataloader = peach.pp.prepare_training(adata, num_workers=8, pin_memory=True)
>>> # Minimal settings for debugging
>>> dataloader = peach.pp.prepare_training(adata, num_workers=0)