Using Pandas dataframes or R dataframes to process and analyze temperature time series data.¶

NOTES: This week we will analyze the at-home temperature time series you have collected in order to see if we can extract some more information about the quality of the data, and what story the time series can tell us. We are going to compare your at-home temperature with air temperature from a weather station and see if there is a correlation, and/or a lag in the correlation.

First, download a record of NOAA 5-min weather station temperature from Kingston, RI (https://www1.ncdc.noaa.gov/pub/data/uscrn/products/subhourly01/2024/).
Put both your temperature data and the NOAA data onto the same time base by using datetime as the row index for both data sets and merging them.
Merge the two data sets into one data frame, so you can use them for time series analysis.
Last, compute the serial covariance to observe how well the two temperatures are correlated and if there is any lag between the outdoor temperature and your indoor temperature.

In [ ]:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import datetime

Download and Read in NOAA Weather Station data.¶

In [ ]:

# Download 5-min temperature data, for a nearby weather station. Look for the neareest 5-min weather stations in RI.
# These archives are available at:
# https://www1.ncdc.noaa.gov/pub/data/uscrn/products/subhourly01/2024/

# Use the pandas function pd.read_csv() to import the data.  Hint, you can indicate whitespace delimiters as '\s+'. 
# NOTE:  The NOAA file will take some special processing to create a datetime array.  See Pandas intro for an example.

#Kin_DF.info()

In [ ]:

# Load the csv data you saved from your overnight data collection.

# Use the parse_dates option to ensure a datetime column is created.

# Use df.info() function to confirm it has all the correct columns and data types.

In [ ]:

# Make it so that both time series use the datetime column as their row index.  
# Use the Pandas .set_index() function and apply your changes to the same dataframe using the inplace=True option.
# Must merge on index, so setting both timeseries to indices.

# Use the datetime column as the array index for the weather station data.

# Use the datetime colum as the array index for your home temp.

Truncate the weather station data¶

In [ ]:

# The Kingston Weather data file covers the entire year up to the present.  We want to cut this down to the time period 
# that overlaps with your overnight temperature data.  
# Use boolean indices or the pandas .truncate() function to truncate the data to a time period that overlaps with
# your temperature data.

Visualize the two temperature records.¶

In [ ]:

# Make a single panel plot with datetime on the x-axis and temperature on the y-axis.  Use this plot to confirm that
# the truncated weather station data overlaps with your overnight time series.

In [ ]:

## Are there any bad data points?  If so, use boolean indices to remove them. 
# Hint: Don't make them NaNs or they will complicate the time series analysis.

In [ ]:

# The time interval of the overnight temperature data has to match that of the NOAA weather station.  Use the Pandas 
# resample().mean() function to compute the running mean of your temperature time series at 5 minute intervals, to match the 
# NOAA data.

In [ ]:

# Merge the two data sets together using the pandas .join() function.

Computing the serial covariance between your overnight and weather station data.¶

Check out the wiki figures and explanation of cross-correlation. https://en.wikipedia.org/wiki/Cross-correlation

In [ ]:

# Make a lag vector, k, that spans from -N to N-1, where N is the number of elements in Therm_5min.

# Plot the output from np.correlate() as a function of k to see how the correlation changes at negative lag (k < 0),
# zero lag (k = 0), and positive lag (k > 0).

# 1. Where are the two most correlated?
# 2. Are the series correlated or anti-correlated?
# 3. How long is the lag in minutes?  You can figure this out by multiplying k with your sampling interval.
# 4. Can you offer a theory why there is (or isn't) a lag between indoor and outdoor temperature?

Subtract the mean from both time series and compute covariance on the temperature anomaly.¶

In [ ]:

# Subtract the mean from both time series to make both sets of measurements have zero mean.

# Use the function np.correlate(,,mode='full') in Python to compute the serial cross-correlation.
# Note, this Numpy function is fully compatible with Pandas dataframes as inputs.
# Use cf = ccf(T1,T2,type="correlation",plot=FALSE) in R to compute the serial cross-correlation.
#x = cf$lag; y = cf$acf

# Make the lag vector, k, that ranges from -N to N with steps of 1.

# Make a 2-panel plot.  In the first panel, plot the Overnight and Weather temperature data. In the second panel, plot the 
# cross-correlation as a function of lag.  Use these plots to answer the questions above.

What to submit:¶

This .ipynb with code and answers to the above questions. Use Markdown cells for practice.
An image of the 2-panel plot you made above.
Text files containing your overnight thermistor data and weather station data.

In [ ]: