Modelscape: Machine learning mega training experiment repo released!

Training models now made much simpler!

Jan 26, 2026

Introduction + Using

I made modelscape, a repo extending MLPscape. This repo allows for quick iteration of ML experiments without the hassle of needing to change the trainloop, a massive list of for loops defining what gets iterated over, worrying about .py vs .ipynb differences, multiprocessing, offline vs online training, and so on!

In this post, I’ll give a detailed overview of what modelscape is and how to use it. If you’ve used MLPscape in the past, then the one line update is that the ML model, optimizer, and loss function can be defined dynamically.

Using modelscape

Everything in modelscape is fairly modular: you can define what model, dataset, optimizer and loss to use, in addition to deciding whether it should be trained offline or online, and with or without multiprocessing. Variables can be defined through a config or by using the command-line, and the entire training can be tracked if wanted.

To use modelscape, simply follow one of the provided notebooks/python files in the examples folder! Typically, your file will look something roughly along the lines of

Imports
Your model/loss fn/optimizer defined
Your model grabs definitions
Hyperparameter specification
Iterator specification
Data selection
Batch function selection
Other pre-trainloop setup
Trainloop execution
Results

where most of the code powering modelscape is hidden away in what I call the backend, which handles multiprocessing, the trainloop which runs any functions throughout, and so on. Let’s run through each step, just as a reference to come back to in case anything goes wrong:

Imports

This is quite standard, with the only things that really need attention being importing the trainloop, and the arguments creators (parse_args for .py or base_args for .ipynb):

import numpy as np
import torch

import torch.multiprocessing as mp
from modelscape.backend.cli import parse_args, base_args, parse_args
from modelscape.backend.job_iterator import main as run_job_iterator

Your model/loss function/optimizer definitions

Here (or in another .py file), have your model/custom loss function/custom optimizer defined! We’ll let modelscape know how to use it within the hyperparameter selection section.

Your MLP grabs definitions

Want to check something out while your model is training? Great! Just define whatever function here, and we’ll worry about it later.

The important parts here is to always take **kwargs, and use model, X_tr, y_tr, X_te, y_te, or other variables that are being iterated over! So far, this is about the most general way I’ve thought of having the trainloop not need to be changed.

def my_mlp_grab(model, X_tr, y_tr, **kwargs):
    return (y_tr.T @ model(X_tr)).squeeze()

Hyperparameter selection

We’ll start here by taking in the default args, and changing whatever we don’t want! A list of the default arguments can be found at the bottom of this post, and in the README of the repo.

Here’s also where we want to define the model/loss/optimizer being used! Of course, these can be pytorch defaults like torch.optim.Adam for the optimizer and args.LOSS_CLASS = nn.CrossEntropyLoss for the loss. See examples/example_resnet_run.py for more detail.

Note: for custom models that don’t have a defined width, or if you want the model to not be rescaled via mupify, please set mup_param='sp'.

args = parse_args() OR base_args()

args.MODEL_CLASS = YourModel
args.OPTIMIZER_CLASS = YourOptimizer
args.LOSS_CLASS = YourLoss

args.ONLINE = False
args.N_TRAIN=4000
args.N_TEST=1000
args.N_TOT = args.N_TEST+args.N_TRAIN
args.CLASSES = [[0], [6]]
args.NORMALIZED = True
args.NUM_TRIALS = 2
args.N_SAMPLES = [1024]
args.GAMMA = [0.1, 1, 10]
args.mup_param = 'mup'

Iterator specification

Now that the arguments are set, let’s make sure our iterators are set up properly. For this, you’ll need to make sure the number of samples is taken as iterator 0, and the number of trials is taken as iterator 1; otherwise, the iterators are fine to be dynamically set!


iterators = [args.N_SAMPLES, range(args.NUM_TRIALS), args.GAMMA]
iterator_names = ["ntrain", "trial", "GAMMA"]

Data selection

Load in the data here! Hopefully, I don’t need to explain this…

Batch function selection

A large portion of MLPscape is based off of batch functions. The batch functions should be set up in a particular way, like

def your_bfn(X_total, y_total, X=None, y=None, bsz=128,
                     gen=None, **kwargs):
    def batch_fn(step: int, X=X, y=y):
        if (X is not None) and (y is not None):
            X = ensure_torch(X)
            y = ensure_torch(y)
            return X, y
        with torch.no_grad():
            N_total = X_total.shape[0]
            indices = torch.randint(0, N_total, (bsz,), generator=gen, device=gen.device)
            X_batch = ensure_torch(X_total.to(gen.device)[indices])
            y_batch = ensure_torch(y_total.to(gen.device)[indices])
            return X_batch, y_batch
    return batch_fn

which should be placed outside any “if name == “main():” statements so it can be found by MLPscape. Make sure kwargs is defined! Once this is set up, make sure bfn_config is set up to have

other_args = dict(X_total = X_full, y_total = y_full) #one example
bfn_config = dict(base_bfn=your_bfn, *other_args)

Other pretraining setup

We’re almost to running the trainloop, we just need to include a global_config that gets used everywhere, which roughly takes the form of:

global_config = args.__dict__.copy()

Also, we’ll want to make sure we get any grabs throughout the trainloop:

grabs.update({"my_mlp_grab": my_mlp_grab})
global_config.update({"otherreturns": grabs})

Quick note:

grabs = build_other_grabs(args.other_model_grabs, per_alias_kwargs=args.other_model_kwargs,)

is often included in my examples, but is not strictly necessary; only include if grabs aren’t defined within your file.

Trainloop execution

Finally, we can run everything! If using multiprocessing, call

mp.set_start_method("spawn", force=True)

Note: If using a python notebook, the multiprocessing step doesn’t play nice with an in-notebook batch function. Please either place the function in a .py file, or don’t use multiprocessing.

Then, we call

result = run_job_iterator(iterators, iterator_names, global_config, bfn_config=bfn_config)
torch.cuda.empty_cache()

which is really simple!

I hope this helps in your model experimenting future!