Script to calculate Basic Statistics¶
Equation for the mean: $\mu_x = \sum_{i=1}^{N}\frac{x_i}{N}$¶
Equation for the standard deviation: $\sigma_x = \sqrt{\sum_{i=1}^{N}\left(x_i - \mu \right)^2}\frac{1}{N-1}$¶
Instructions:
(1) Before you write code, write an algorithm that describes the sequence of steps you will take to compute the mean and standard deviation for your samples. The algorithm can be written in pseudocode or as an itemized list.*
(2) Use 'for' loops to help yourself compute the average and standard deviation.
(3) Use for loops and conditional operators to count the number of samples within $1\sigma$ of the mean.
Note: For this exercise, it is not acceptable to use the pre-programmed routines for mean and st. dev., e.g. numpy.mean()
Edit this box to write an algorithm for computing the mean and std. deviation in Markdown.¶
Write your code using instructions in the cells below.¶
# Put your Header information here. Name, creation date, version, etc.
# Import the matplotlib module here. No other modules should be used in this exercise.
# Create a 1D array called 'x' using data you read from a file. For example, you can use one
# column of data from the USGS earthquake data that you downloaded last week.
# Pretend you do not know how long x is; compute it's length, N, without using functions or modules. Hint: loops love these jobs.
# Compute the mean of the elements in x without using premade libraries.
# Compute the std deviation, using the mean and the elements in x without using premade libraries.
# Use the 'print' command to report the values of average (mu) and std. dev. (sigma).
# Count the number of values that are within +/- 1 std. deviation of the mean.
# A normal distribution will have approx. 68% of the values within this range.
# Based on this criteria is the list normally distributed?
# Use print() and if statements to report a message about whether the data is normally distributed.
# Use Matplotlb.pyplot to make a histogram of x.
# Look up an equation for Skewness and write code to compute the skewness.
# Compute the skewness and report whether the sample is normally distributed.
Put all the code you wrote in a module that will can compute the 'basic stats' on a 1D array of numbers that you pass into the function as an input.¶
Note: Recall that functions are constructed to receive inputs and deliver outputs. The inputs are defined on the first defintion line. The outputs are defined by whatever is on the same line as the return.
basic_stats <- function(data_in) { # This is the function definition line in R.
# operations
# more operations
return(out1, out2,outn) } # This line defines the outputs and ends the function in R.
Your basic_stats function should do the following:
- Take data in a 1D array
- Compute the length of the data
- Compute the mean, standard deviation, and skewness of the data
- Make a histogram of the data to display graphically
- return the mean and std. dev as outputs.
# Create your module here.
# Call your module here to show the inputs and outputs.
What to turn in:¶
- A saved copy of this .ipynb with answers completed for each cell above.
- Save a version that shows your output, including figures and an interpretation of the data you analyzed.
- Be sure to include the data file you used with basic_stats so I can run it in this case.