Python parallel processing on Unity HPC.¶
Objectives¶
- Learn how to request a parallel processor 'job'and set up a Python environment that will use that job.
- Write a short function to test and confirm that parallel processing is taking place (in-class).
- Use these concepts to modify landsatexplore.py from Week09 and implement the NDVI calculation (take-home).
- Use the python library Dask to extend Numpy array and Pandas dataframe calculations over multiple p
Dask documentation explains how Dask works with parallel computing: https://tutorial.dask.org/02_array.html
Use Unity documentation to understand more using HPC resources:https://docs.unity.uri.edu/documentation/
Step 1: Log in to Unity via OOD.¶
Use your login info to connect to https://ood.unity.rc.umass.edu/ as before.
First, load your conda environment, following the same steps as in Week09.
$ module load anaconda/2022.10
$ source activate your_env_here
Remember to type $ source deactivate
to close your Anaconda environment.
Step 2: Request resources for your compute node.¶
The Unity documentation describes how to request cluster computing resources or jobs
, which are categorized into several distinct partitions. Unity uses SLURM to manage and allocate resources, but we won't dedicate much time to understanding how SLURM works. The Python library for distributed processing of array data (dask) will be the tool we focus on.
The salloc
command allows to request cluster jobs on Unity. Documentation for both is linked above. For salloc
, you can request the number of cpu nodes -c
, the amount of time you want the job to last --time
, and the RAM or memory -mem
, and which partition to use -p
. A simple request looks like this:
]$ salloc -c 5 # Request 5 cpu nodes
The command below requests 12 cpus and 150 GB of RAM for 60 minutes on the partition called 'cpu'. I find that the fewer processors you request, the faster your job is allocated to you. NOTE: Please do not request more than 24 processors.
]$ salloc -J interact -c 12 --time=0:60:00 --mem=150G -p cpu
Note that the module and environemnt we loaded (Anaconda) get unloaded when the job is allocated. This is because, Unity has assigned us a new compute node
with the processing resources on the computer. Because we are occupying a different physical space in the cluster, our computing environment has been reset to the default. Before we can do our work, we must reload those as below.
Aside. You can include all of these commands into a text file called a shell script and just run the script to speed up the process. The script must begin with the line #!/usr/bin/bash
. You can use nano to create this shell script. The protocol is to give it the file extension .sh
, ie start.sh
. After you have created the shell script you need to make it executable with chmod
command.
$ chmod a+x start.sh
The script can be run at the command line using
$ source start.sh
Step 3: Make a coreclock script to confirm parallel processing of computations.¶
- Download the script coreclock.py and examine the comments and contents.
- Add a module called coreclock() to the script, following the comments in the script.
- Upload the script to Unity.
- Open the Unity OOD Shell.
- Request compute resources for your job following Step 2.
- Load modules, activate your conda environment.
Notes about the code in coreclock.py¶
# Client() and LocalCluster() will be used to connect to the job resources that
# were requested.
from dask.distributed import Client, LocalCluster
# Progress function reports the computation status to the screen
from dask.distributed import progress
# Use time library for sleep
import time
# Connect to resources.
cluster = LocalCluster()
job = Client(cluster)
print(job)
The code block above creates a connection to the salloc
resources that were requested before starting python. The resources can be viewed with print(job)
- Use client.map() to execute coreclock() 500 times and distribute them over the requested cpus.
- Run coreclock.py at the command line:
$ python coreclock.py
[#### ] | 10% Completed | 11.6s
- How long does it take for the code to complete execution?
- Based on 500 instances of coreclock() and the 1 second delay, how long would it take for a single processor to complete the same task?
- Is parellel computation working as expected?
Step 4: What to turn in?¶
- Answer the questions from Step 3 in this .ipynb.
- Upload this .ipynb
- Upload your modified version of coreclock.py