NDVI computation on the Unity cluster.¶
Use the documentation to supplement this guide https://docs.unity.uri.edu/documentation/.
Objectives¶
- Learn how to request a parallel processor 'job'and set up a Python environment that will use that job.
- Write a short function to test and confirm that parallel processing is taking place.
- Use these concepts to modify landsatexplore.py from Week09 and implement the NDVI calculation.
- Use the Unity cluster to split this calculation amongst the processors and peform the computation for all the satellite images in your items list.
- Observe whether there is a seasonal difference in the NDVI for the RI/New England region.
NOTE: This exercise uses cumulative concepts from throughout the course, including use of list and numpy arrays, creation of modules, reading and writing files, using Pandas dataframes, and unix commands. Refer to your code from previous weeks and to the online library documentation as needed.
Step 1: Modify landsatexplore.py
code to carry out NDVI computation.¶
The Normalized Difference Vegetation Index can be computed directly from two frequency bands in the Landsat satellite data - the Near Infrared band (NIR) and the visible red (Red) spectral reflectance:
$$ NDVI = \frac{NIR - Red}{NIR + Red} $$NOTE: The .tif files ending in _B4.tif contain the near-infrared (NIR) spectral wavelengths, and the .tif files ending in _B3.tif contain the Red spectral reflectance.
NOTE: A compilation of Landsat ETM files for the Rhode Island region can be found in /work/pi_csc593_uri_edu/Landsat
on the Unity HPC.
Below is a list of modifications you should make to your script. Give it a new name, like landsatexplore_NDVI.py
or similar.
- Delete, or comment out all usage of graphing and cartopy from Week09. This script will do the NDVI computation using the cluster job and save the results in a Pandas dataframe, nothing more.
- Put all the code of the script inside the
if
statement below. This ensures that the cluster resource request will be compiled at runtime before the computations begin:
Don't forget to indentif __name__ == '__main__':
- Add the codeblock from coreclock.py that requests and uses the requested Cluster resources.
- Create a list of scenes (filenames) from the files ending in _B4.tif. This can be found in the text file called test.out, in the same folder
/work/pi_csc593_uri_edu/Landsat
. - Define a module to compute the NDVI with the following operations. Your module should take a single element of the list of satellite filenames or
scenes
and it should return the average NDVI value from that item. - The filenames are found in
# Get the path+filename for the NIR band. Do the same for the Red band.
# Use the skimage.io library to open and read the NIR band geotiff image. Do the same for the Red band.
# Compute the NDVI as (nir-red)/(nir+red). Note that zeros at some cells in the .tif will cause inf
# or undefined values. Use np.divide(), np.subtract(), and np.add() in order to tolerate
# the undefined values.
# Replace all undefined values with NaNs.
NDVI = np.where(NDVI == np.inf, np.nan, NDVI)
# Use np.nanmean() to compute and return the average NDVI value from this satellite item.
Add the code block from coreclock.py that uses f = client.map() to distribute the computation amongst the cores and displays the progress. f = client.map() will take your module name and the satellite items list as inputs.
Write a
for
loop to extract the time stamp from the items list, and put it into a separate list array, called 'NDVI_time' or similar. The datetime is stored initem.datetime
.Extract the list of Average NVDI computations from your module using client.gather().
AvNDVI_list = client.gather(f)
Make a Pandas dataframe with a 'time' column and an 'NDVI' column. Put NDVI_time and AvNDVI_list into the Pandas dataframe (they should be the same length).
Save the Pandas dataframe for download. I recommend using df.to_pickle('file.pkl'), but you can also use df.to_csv('file.csv'); Depending on which you choose, you then you can read the file with df = pd.read_pickle() or df = read_csv();
Step 2: Download and post-process your results.¶
- Write a short script as .py or .ipynb to load and graph the NDVI mean value as a function of time.
- Make a plot of the NDVI index with time, similar to the one below.
- You can improve the rendering of dates on the y-axis using the mdates formatter:
import matplotlib.dates as mdates # read in .csv.. # # Use pandas to create a running or 'rolling' mean of the timeseries. Note, this is related to the operations we did for the thermistor analysis. Create a running average and include in the graph. Explain the averaging interval you chose. # f, ax = plt.subplots(figsize=(10,5)) # #Display only year and month on x-axis. ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
Summary of results:¶
Results: Report on your results.
- How many satellite scenes did you include? What span of time do they represent?
- How long did the analysis take to complete?
- What resources did you request?
- What trends, if any, can you discern in the data and in the running average?
Caveats:
- What other factors might be influencing the results, which might make it harder to achieve a clear interpretation?
Step 3: What to turn in?¶
- Modify this .ipynb to include your results from the NDVI calculation and summary comments.
- The python script you used to make the NDVI calculation.
- Answers to the questions above.