Deep Learning & MLOps Workflows

Amazing tools exist for the R&D/training phase (like wandb.ai). They let you create easily create experiments without worrying about version control, without to setup a projects, and they have features that QA-Board doesn't have (compare N training curves at the same time, early stopping, nice reports built-in...).

However, in our experience, tools built for training usually lack important features to evaluate inference:

going in depth into specific difficult inputs (in our case images...): the training loss doesn't tell the whole picture
comparing versus reference/previous versions
giving users tools to trigger runs on new inputs (e.g. test database supplied by client)
integration with source control, CI systems, etc
rich/custom visualization

For instance here is how users can compare runs in QA-Board:

So how do we use QA-Board and deep learning tools? Well, we setup our projects to have them work well together.

Workflows

Since training is best done with dedicated tools, our projects are setup to show in QA-Board:

the evaluation metrics
model parameters and training summary
link to the training page

In practice:

Users get the best of both worlds: QA-Board, and the experiement/training-loss UI from wandb...

What is needed to make this work?

Users will write a run() function that:

Reads from context.params some model ID (or defaults to the model that was trained for the commit)
Fetches details about that model from some API: hyper-params, link to the training page... It will return those as badges
Runs the model on the context.input_path
Does postprocessing to compute various evaluation metrics/plots

tip

Some engineers have a script that filters/sort their recent models according to various hyperparams/metrics, and start evaluation with QA-Board... Ideally we should be able to trigger directly inferences from the training page but not all ML tools make it easy!

MLOps

When using MLops flows, users will have the CI:

Train a network
Fail if the training loss is too high
Run inferences on an evaluation batch using QA-Board
Fail if the evaluation metrics are bad

Workflows​

What is needed to make this work?​

tip

MLOps​

Workflows

What is needed to make this work?

MLOps