peach.pp.prepare_training#
- peach.pp.prepare_training(adata, batch_size=128, shuffle=True, pca_key=None, num_workers='auto', pin_memory='auto', persistent_workers='auto', prefetch_factor=2)[source]#
Create DataLoader from AnnData for training with HPC optimizations.
- Parameters:
adata (AnnData) β Annotated data object with PCA coordinates
batch_size (int, default: 128) β Batch size for training
shuffle (bool, default: True) β Whether to shuffle data in DataLoader
pca_key (str, default: None) β Key in adata.obsm containing PCA coordinates (auto-detected if None)
num_workers (int or 'auto', default: 'auto') β Number of subprocesses for data loading. βautoβ detects optimal value based on environment (0 for Apple Silicon, 6 for HPC, 2 for local)
pin_memory (bool or 'auto', default: 'auto') β Use pinned memory for faster GPU transfer. βautoβ sets True if CUDA available
persistent_workers (bool or 'auto', default: 'auto') β Keep workers alive between epochs. βautoβ sets True if num_workers > 0
prefetch_factor (int, default: 2) β Number of batches loaded in advance by each worker
- Returns:
PyTorch DataLoader optimized for the execution environment
- Return type:
DataLoader
Examples
>>> # Auto-detect optimal settings >>> dataloader = peach.pp.prepare_training(adata)
>>> # Force HPC settings >>> dataloader = peach.pp.prepare_training(adata, num_workers=8, pin_memory=True)
>>> # Minimal settings for debugging >>> dataloader = peach.pp.prepare_training(adata, num_workers=0)