Training Demo#
Here we demonstrate how to train Popari downloading and preprocessing a multisample spatial transcriptomics dataset for analysis with Popari. In particular, we will be working with the Alzheimer’s Disease (AD) dataset from the “Preprocessing Demo” notebook.
# Disable warnings for prettier notebook
import warnings
warnings.filterwarnings("ignore")
from pathlib import Path
from tqdm.auto import trange
import torch
import popari
from popari.model import Popari
from popari import pl, tl
from popari.train import TrainParameters, Trainer
data_directory = Path("/path/to/directory/")
data_directory = Path("/work/magroup/shahula/spatiotemporal_transcriptomics_integration/data/STARmapPlus/SCP1375/")
model_parameters = {
'K': 15,
'dataset_path': data_directory / f"preprocessed_dataset.h5ad",
'lambda_Sigma_x_inv': 1e-4,
'lambda_Sigma_bar': 1e-4,
'initial_context': {
'device': 'cuda:0',
'dtype': torch.float64
},
'torch_context': {
'device': 'cuda:0',
'dtype': torch.float64
},
'verbose': 0,
'spatial_affinity_mode': 'differential lookup',
}
popari_example = Popari(**model_parameters)
Training Loop#
train_parameters = TrainParameters(
nmf_iterations=10,
iterations=200,
savepath=data_directory / "results.h5ad",
)
trainer = Trainer(
parameters=train_parameters,
model=popari_example,
verbose=1,
)
Below, we train Popari for 200 iterations; this should take ~30 minutes on a standard GPU.
trainer.train()
Hierarchical Training#
Using hierarchical mode, we can train Popari more robustly using a lower resolution view of the original spatial transcriptomics data. We can then “superresolve” the embeddings at the higher resolution to regain a fine-grained view.
hierachical_parameters = {
'K': 15,
'dataset_path': data_directory / f"preprocessed_dataset.h5ad",
'lambda_Sigma_x_inv': 1e-4,
'lambda_Sigma_bar': 1e-4,
'initial_context': {
'device': 'cuda:0',
'dtype': torch.float64
},
'torch_context': {
'device': 'cuda:0',
'dtype': torch.float64
},
'verbose': 0,
'spatial_affinity_mode': 'differential lookup',
'downsampling_method': 'partition',
'hierarchical_levels': 2,
'superresolution_lr': 1e-1,
}
K = 15
dataset_path = data_directory / f"preprocessed_dataset.h5ad"
context = {"device": "cuda:0", "dtype": torch.float64}
hierarchical_levels = 2
superresolution_lr = 1e-1
hierarchical_example = Popari(**hierachical_parameters)
[2024/10/15 00:37:39] Initializing hierarchy level 1
[2024/10/15 00:37:40] Downsized dataset from 8186 to 1637 spots.
[2024/10/15 00:37:42] Downsized dataset from 10372 to 2074 spots.
hierarchical_train_parameters = TrainParameters(
nmf_iterations=10,
iterations=200,
savepath=data_directory / "hierarchical_results.h5ad",
)
hierarchical_trainer = Trainer(
parameters=hierarchical_train_parameters,
model=hierarchical_example,
verbose=True,
)
hierarchical_trainer.train()
The optimization for the hierarchical trainer is done at the lowest resolution (level = model.hierarchical_levels - 1). In order to recover spatially-informed embeddings X for the higher resolutions, we provide a superresolution subroutine that can be run after the main training loop.
hierarchical_trainer.superresolve(n_epochs=10000, tol=1e-6)
Save results to disk#
hierarchical_example.save_results(data_directory / f"hierarchical_results", ignore_raw_data=False)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[1], line 1
----> 1 hierarchical_example.save_results(3, ignore_raw_data=False)
NameError: name 'hierarchical_example' is not defined
Load a pretrained model#
from popari.model import load_trained_model
reloaded_model = load_trained_model(data_directory / f"hierarchical_results")
[2024/10/13 15:53:55] Reloading level 0
[2024/10/13 15:53:55] Reloading level 1