Skip to main content

Using celery as a task runner

Celery is a simple, flexible, and reliable distributed task queue.

Starting Celery workers

  1. To manage the task queue we'll need what they call a broker. The QA-Board server already starts one on port 5672. If you want to start another one, it's easy:
docker run --detach -p 5672:5672 rabbitmq
#=> runs in the background
  1. Next you need to start at least a worker that will execute async tasks:
# need python3
pip install celery qaboard
celery -A qaboard.runners.celery_app worker --concurrency=10 --loglevel=info
note

Ideally we should run workers as daemons to handle failures, reboots... Read the docs to do it nicely... Currently we just use screen:

sudo apt-get install screen
screen -dmS qaboard-worker-01 <celery-command>
  1. To have qa batch use Celery runners, just configure:
qaboard.yaml
runners:
default: celery
# by default we assume you QA-Board server hostname is qaboard...
# to change it (for example to "localhost", where QA-Board might be running), define:
celery:
broker_url: pyamqp://guest:guest@localhost:5672// # also read from the ENV variable CELERY_BROKER_URL
tip

You can choose on the CLI what runner you want:

qa batch --runner=celery my-batch
# can be useful to see real-time logs in the CLI...
qa batch --runner=local my-batch

Note that unless you have a transparent shared storage for your working directory, you'll need to use qa --share batch to see runs in QA-Board...

Monitoring

  • A flower instance is available at <QABOARD_HOST>/flower
  • Read the docs.

Advanced Celery Configuration

To configure Celery at the project level:

qaboard.yaml
runners:
default: celery
celery:
# assuming your QA-Board server's hostname is qaboard
broker_url: pyamqp://guest:guest@qaboard:5672// # also read from ENV vars with CELERY_BROKER_URL
result_backend: rpc:// # also read from ENV vars with CELERY_RESULT_BACKEND
# To know all the options and tweak priorities, rate-limiting... Read:
# http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html#configuration
# http://docs.celeryproject.org/en/latest/userguide/configuration.html#configuration
# For example:
timezone: Europe/Paris

# By default tasks will be named "qaboard" unless you define
qaboard_task_name: qaboard

It's often useful to give batches their own settings. For instance you may want to use different queues if you manage different types of resources (GPUs, Windows/Linux/Android...):

# On a server with a GPU:
celery -A qaboard.runners.celery_app worker --concurrency=1 --queues gpu,large-gpu
qa/batches.yaml
my-batch-that-needs-a-gpu:
inputs:
- my/training/images
configuration:
- hyperparams.yaml
celery:
task_routes:
qaboard: gpu

Celery's worker user guide has lots of information on how to run worker in the background, set concurrency... Check it out too as needed!