Timekeeping in Python, R¶
Clocks and timekeeping on computers are not trivial.¶
Remember Y2k? All of that was caused by the representation of 2-digit years in computer systems.
For many years, all timekeeping on computer systems was referenced to January 1, 1970 because no computer data pre-existed that. This is known as the POSIX time.
Python has multiple timekeeping modules including base python modules date, time, datetime. We'll focus on datetime.
Note: Numpy and Pandas also have their own timekeeping, but both are compatible with the base python package of datetime.
In [ ]:
# PYTHON
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import time as time
import datetime as dt
In [ ]:
# R
#install.packages("lubridate")
library(lubridate)
Da = ymd("2019-12-13")
Db <- ymd_hms("2000-12-13 13:55:59")
print(Da)
In [ ]:
# A Datetime object can incorporate Year, Month, Day, Hour, Minute, Second, Millisecond.
# At a minimum, year, month, day must be specified
Da = dt.datetime(
year=2019,\
month=12,\
day=13,\
second=59
)
print(Da)
# Specify everything in order of Year, Month, Day, Hour, Minute, Second
Db = dt.datetime(2000,12,13,13,55,59)
print(Db)
# Specify less than the minimum
Dc = dt.datetime(2020,12,1)
print(Dc)
Datetime objects support basic arithmetic:
In [ ]:
# PYTHON,
# Subtracting
Dg = Da-Db
print(Dg)
# Dg is a timedelta object
Dg?
Dg.total_seconds()
# Datetime and timedelta objects can be added and subtracted
print( dt.datetime(1990,10,31)+dt.timedelta(days=33) )
print( dt.datetime(1990,10,31)-Dg )
# This is not permitted, because Db and Da represent absolute time since a reference.
Df = Db+Da
print(Df)
Df?
Df.total_seconds()
In [ ]:
Dg?
In [ ]:
# Datetime provides the current time stamp using now()
t0 = dt.datetime.now()
print(t0)
Time references:¶
- Datetime uses the proleptic Gregorian calendar as a reference. Year 1, is the reference year. Year 2020 is Year 2020.
- This differs from e.g. the POSIX calendar, which uses 1970 as the reference year.
- Unless specified, the datetime object is naive, meaning it doesn't know its time relative to geography.
In [ ]:
# R
# Datetime provides the current time stamp using now()
t0 <- ymd_hms(Sys.time())
print(t0)
t1 <- as.integer(t0) # This creates seconds from the start of the POSIX calendar.
print(t1)
In [ ]:
# Express t0 to total number of elapsed days since January 1, of year 1.
t0 = dt.datetime.now()
t1 = t0.toordinal() # This creates days from the proleptic Gregorian calendar.
# t1 is an integer.
print(t1)
# Recover timestamp, but note that hrs,mins,seconds have been lost
print(dt.datetime.fromordinal(t1))
# Express t0 as total seconds elapsed since January 1, of year 1 (the reference year).
t0.timestamp() # This value is a float, and its the number of seconds from the start of the POSIX calendar.
In [ ]:
# Include notes on tzinfo()
# Can add and subtract datetime objects. These are stored as timedelta()
dt.datetime.now().timestamp()
In [ ]:
# PYTHON
print(t0.tzinfo==None)
# pytz gives the time zone information
import pytz
timezone = pytz.timezone("America/New_York")
ttz = timezone.localize(t0)
print(ttz) #this prints offset from UTC.
In [ ]:
# R
# Check if t0 has a time zone
print(is.null(attr(t0, "tzone")))
# Set time zone to America/New_York
ttz <- with_tz(t0, "America/New_York")
# Print the datetime with time zone information
print(ttz)
In [ ]:
# PYTHON
# Datetime objects can be converted to strings.
print(t0)
T = t0.strftime('%m/%d/%y,%H:%M:%S')
print(T)
In [ ]:
# R
# Format the datetime as a string
print(t0)
T <- format(t0, "%m/%d/%y,%H:%M:%S")
# Print the formatted string
print(t0)
Recording datetime to make a timeseries¶
Dataframes can incorporate the datetime object directly into its dataframe. In Python, Pandas is one library that was built for working with dataframes. In R, base R creates dataframes by default.
In [ ]:
# Python
dd = pd.DataFrame(columns=('time','iterator')) # Create a pandas dataframe with 2 columns.
for i in range(0,10):
t = dt.datetime.now()
dd = dd.append({'time': t,'iterator': i}, ignore_index=True)
#dd = dd.append({}, ignore_index=True)
time.sleep(0.5)
print(dd)
dd.head()
In [ ]:
# Python
# Importantly, the datetime vector can be plotted in matplotlib.
plt.figure()
plt.plot(dd['time'],dd['iterator'],'.')
plt.xlabel('time'); plt.ylabel('iterator')
plt.show()
In [ ]:
# R
# Create an empty data frame with columns 'time' and 'iterator'
dd <- data.frame(time = character(), iterator = numeric())
# Loop 10 times
for (i in 1:10) {
# Get current datetime
t <- ymd_hms(Sys.time())
# Append a new row with current time and iterator value
dd <- rbind(dd, list(time = t, iterator = i))
# Sleep for 0.5 seconds
Sys.sleep(1)
}
# Print the entire data frame
print(dd)
# Print the first 5 rows
head(dd)
In [ ]:
# R
# NOTE: R doesn't have datetime as a data type, so it is harder to graph date and time on one axis.
# This example graphs them in numeric seconds
library(ggplot2)
# Create a scatter plot
ggplot(dd, aes(x = time, y = iterator)) +
geom_point(color = "blue") + # Plot points with blue color
labs(x = "Time", y = "Iterator") + # Set axis labels
theme_minimal() # Apply a clean theme