Using Pandas dataframes or R dataframes to process and analyze temperature time series data.¶
NOTES: This week we will analyze the at-home temperature time series you have collected in order to see if we can observe something about heat flow in houses and buildings. We are going to compare your temperature time-series with air temperature from a weather station and see if there is a correlation, and/or a lag in the correlation.
Last week we:
- Downloaded a record of NOAA 5-min weather station temperature from Kingston, RI (https://www1.ncdc.noaa.gov/pub/data/uscrn/products/subhourly01/2025/). This will be your OUTDOOR temperature.
- Resampled your temperature data to a 5 minute time interval by using datetime as the row index for both data sets and merging them. This will be your INDOOR temperature.
This week we will:
- Merge the two data sets into one data frame, so you can use them for time series analysis.
- Computer, or subtract the mean from the timeseries.
- Compute the serial covariance to observe how well the two temperatures are correlated and if there is any lag between the outdoor temperature and your indoor temperature.
In [ ]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import datetime
Computing the serial covariance between your overnight and weather station data.¶
Check out the wiki figures and explanation of cross-correlation. https://en.wikipedia.org/wiki/Cross-correlation
1. Visualize the two temperature records.¶
In [ ]:
# Make a single panel plot with datetime on the x-axis and temperature on the y-axis. Use this plot to confirm that
# the truncated weather station data overlaps with your overnight time series.
In [ ]:
## Are there any bad data points? If so, use boolean indices to remove them.
# Hint: Don't make them NaNs or they will complicate the time series analysis.
In [ ]:
# The time interval of the overnight temperature data has to match that of the NOAA weather station. Use the Pandas
# resample().mean() function to compute the running mean of your temperature time series at 5 minute intervals, to match the
# NOAA data.
In [ ]:
# Merge the two data sets together using the pandas .join() or .merge() function.
# Name the merged dataframe Twotemps_5min
404 Students: Compute and plot the indoor and outdoor temperature anomaly.¶
What to submit:¶
- This .ipynb with code and answers to the below questions. Use Markdown cells for practice.
- An image of the 2-panel plot you made.
- Text files containing your overnight thermistor data and weather station data.
- Do you see a relationship between the indoor and outdoor temp? Please describe what you see if anything.
- Can you offer a theory why there is (or isn't) a relationship indoor and outdoor temperature?
In [ ]:
# 404 Students:
# Compute the average of the indoor and outdoor temps. This should be a single value for both.
# Subtract the average from both indoor and outdoor temps and save to new variable names
# This makes new versions of the indoor and outdoor temp that have zero mean.
# These are called temperature anomalies.
# Make a plot of the temperature anomalies for indoor and outdoor temp on the same plot.
# Answer the questions above.
593 Students: Compute serial cross-covariance between the temperature anomalies.¶
What to submit:¶
- This .ipynb with code and answers to the below questions (593 students). Use Markdown cells for practice.
- An image of the 2-panel plot you made above.
- Text files containing your overnight thermistor data and weather station data.
- Where are the two most correlated?
- Are the series correlated or anti-correlated?
- How long is the lag in minutes? You can figure this out by multiplying k with your sampling interval.
- Can you offer a theory why there is (or isn't) a lag between indoor and outdoor temperature?
In [ ]:
# 593 Students.
# Subtract the mean from both time series to make both sets of measurements have zero mean.
# Use the function np.correlate() set mode='full' in np.correlate() to compute the serial cross-correlation.
# Note, this Numpy function is fully compatible with Pandas dataframes as inputs.
# Plot the output from np.correlate() as a function of k to see how the correlation changes at negative lag (k < 0),
# zero lag (k = 0), and positive lag (k > 0).
# Make a 2-panel plot. In the first panel, plot the Overnight and Weather temperature data. In the second panel, plot the
# cross-correlation as a function of lag. Use these plots to answer the questions above.