Multiprocessing and scarpletΒΆ

This simple example shows how to use the match_template and compare methods with a multiprocessing worker pool.

It is available as a Jupyter notebook (link) in the repository. Sample data is provided in the data folder.

import numpy as np
import matplotlib.pyplot as plt

from functools import partial
from multiprocessing import Pool

import scarplet as sl
from scarplet.datasets import load_synthetic
from scarplet.WindowedTemplate import Scarp
data = load_synthetic()
# Define parmaters for search
scale = 10
age = 10.
angles = np.linspace(-np.pi / 2, np.pi / 2, 181)
nprocs = 3

For each set of input parameters, we can start a separate masking task. These can be run in parallel, which is what scarplet does by default.

# Start separate search tasks
pool = Pool(processes=nprocs)
wrapper = partial(sl.match_template, data, Scarp, scale, age)
results = pool.imap(wrapper, angles, chunksize=1)
# Reduce the final results as they are completed
ny, nx = data.shape
best =, nx, ny)
CPU times: user 720 ms, sys: 296 ms, total: 1.02 s
Wall time: 2.48 s

To compare, we can a loop to fit the templates sequentially.

best = np.zeros((4, ny, nx))
for angle in angles:
    results = sl.match_template(data, Scarp, scale, age, angle)
    best =[best, results], nx, ny)
CPU times: user 3.76 s, sys: 708 ms, total: 4.47 s
Wall time: 3.62 s

We get a fairly good speed up just using three processes on this small test case. Distributing tasks and reducing results using a cluster can make processing large datasets feasible. For example, dask provides nice distributed task management in Python.