Multiprocessing and scarpletΒΆ

This simple example shows how to use the match_template and compare methods with a multiprocessing worker pool.

It is available as a Jupyter notebook (link) in the repository. Sample data is provided in the data folder.

[1]:
import numpy as np
import matplotlib.pyplot as plt

from functools import partial
from multiprocessing import Pool

import scarplet as sl
from scarplet.datasets import load_synthetic
from scarplet.WindowedTemplate import Scarp
[2]:
data = load_synthetic()
[3]:
# Define parmaters for search
scale = 10
age = 10.
angles = np.linspace(-np.pi / 2, np.pi / 2, 181)
nprocs = 3

For each set of input parameters, we can start a separate masking task. These can be run in parallel, which is what scarplet does by default.

[4]:
# Start separate search tasks
pool = Pool(processes=nprocs)
wrapper = partial(sl.match_template, data, Scarp, scale, age)
results = pool.imap(wrapper, angles, chunksize=1)
[5]:
%%time
# Reduce the final results as they are completed
ny, nx = data.shape
best = sl.compare(results, nx, ny)
CPU times: user 720 ms, sys: 296 ms, total: 1.02 s
Wall time: 2.48 s

To compare, we can a loop to fit the templates sequentially.

[6]:
%%time
best = np.zeros((4, ny, nx))
for angle in angles:
    results = sl.match_template(data, Scarp, scale, age, angle)
    best = sl.compare([best, results], nx, ny)
CPU times: user 3.76 s, sys: 708 ms, total: 4.47 s
Wall time: 3.62 s

We get a fairly good speed up just using three processes on this small test case. Distributing tasks and reducing results using a cluster can make processing large datasets feasible. For example, dask provides nice distributed task management in Python.