A hallmark of science is conducting hypothesis testing to answer questions, and relying upon evidence from observations (the act of viewing the world around us) to find an answer or explanation to those questions. To do this, scientists gather experimental data to support or refute hypotheses. Eventually though, scientists need to communicate the results of their experiments either to each other, or to the greater public, generally by arranging their data into tables or graphs that can easily convey large amounts of information at a glance.
Graphing & Data Analysis
In many of the research sciences, you need to be able to not only perform and design scientific experiments, but also to know what to do with the data after you collect it. One of the tools in every scientists’ arsenal is Microsoft Excel (Google Sheets is another spreadsheet program with similar functions). Microsoft Excel (MS Excel) is available on all UB campus computers, and all classroom laptops, and is part of the Microsoft Office 365 Suite that is freely downloadable for all students.
Scientists often have their data sorted into tables. Specifically summary tables that give the highlights of your data (and not every data point) are a good method of displaying your information. However, other instances call for graphical displays of your data. As you might remember from math class, graphs come in multiple formats, and it is helpful to know when to use which kind of graph. Line graphs visualize changes over time among one or more groups. Bar charts work best when comparing quantities among different categories. Scatter plots data against two different axes, and easily shows patterns in a relationship between two things. Often, graphs more clearly illustrate trends or interesting finds in your results much better than raw data in a table. For example, a graph showing your height in inches between ages 1 and 18 might be a line graph. An experiment comparing the number of visits different fast food chains in the same shopping center get in a day might be a bar chart. And a good example of a scatter plot would be plotting height versus arm span. On most graphs, the horizontal axis is the x-axis, and the vertical axis is the y-axis. Generally, we plot the independent variable on the x-axis and the dependent variable on the y-axis.
In order for graphs to accurately convey all the information they contain, they must have a specific title, and often have a legend to indicate how to read the parts of your graph, as sometimes your graph will display more than one data series. Additionally, the x- and y-axes must be labeled with what they represent, including the units. Units make the different between a graph showing inches and a graph showing miles, so don’t forget to include them.
Today you will be given a dataset and asked to perform some basic statistics on that data using functions in Excel, and to also present your data as graphs and data tables. Functions are predefined formulas that perform calculations using specified values. For example, you can take the average (mean) of a dataset using the “average” function. Most of the statistics we use in this class are common summary statistics, consisting of measures such as mean, median, range, and standard deviation. Many of these measures should be familiar from previous math courses, and their corresponding Excel functions are given in parentheses. To use a function, you click in an empty cell, type “=”, type the function, and then select the cells on which to perform that function. The squares in an Excel sheet are called “cells”, and they are set up in a grid-fashion, where the columns are letters and the rows are numbered. So cell A1 would be the first row of column A. For example, if I had numbers 1 – 10 in cells A1 – A10 and I wanted to find the average of those numbers, I would select an empty cell, type “=AVERAGE(A1:A10)” and hit “Enter”, and Excel will calculate the average. You can either type in location of the cells if you know them (e.g. E34 or A293), or you can highlight those cells with your mouse. Excel will tell you what cells you are highlighting at the top of the screen in the function bar. There are many functions in Excel, but the most helpful to you will be the following (functions in parentheses):
- Addition/subtraction/multiplication/division à + , – , * , /
- Mean (=AVERAGE) – A measure of the center of your data. Can be greatly affected by extreme values.
- Median (=MEDIAN) – the value in the center of your data
- Maximum (=MAX) – the maximum (highest) value in your dataset
- Minimum (=MIN) – the minimum (lowest) value in your dataset.
- Maximum and minimum (=MAX & =MIN) – the greatest and least values in your data
- Range (=MAX(cell#)-MIN(cell#) – the difference between highest and lowest values. Can be greatly affected by extreme values. No direct function for range exists, so we calculate range by combining the MAX and MIN functions.
- Mode (=MODE) – the data value that occurs most frequently
- Standard deviation (=STDEV) – the average deviation from the mean; tells us how variable the data is
- Standard error – a measure of how precise our estimate is of the true mean; there is no direct function for standard error. However, you can calculate standard error by dividing the standard deviation by the square root of the sample size, which is known as n. If you had an example where the standard deviation was 58 and the sample size was 9, you would calculate the standard error by: . In excel, the function for square root is =SQRT. Make sure you use parentheses! For standard error (SE), your Excel function might look something like:
- o =(58/(SQRT(9)))
- o Notice that every parentheses needs to have a partner
- o Notice that you do not have to type “=” twice, or for each function, just the first time so that Excel will recognize that you are giving it a function command.
Another common tool researchers use, which are usable on computers generally and not just Excel, are keyboard shortcuts. Below are some keyboard shortcuts that may be helpful during today’s lab.
Useful keyboard shortcuts –
Ctrl + c = Copy
Ctrl + v = Paste
Ctrl + f = Search current page
Ctrl + z = Undo
Ctrl + – = Inserts new column or row
Ctrl + F6 = Next workbook
Alt + F2 = Save As
Alt + Shift + F2 = Saves current worksheet
In lab today, you will be looking at and graphing real data from the National Oceanic and Atmospheric Administration (NOAA), which is part of the Department of Commerce. NOAA studies oceans, major waterways, and the atmosphere. On place you might be familiar with NOAA is through the National Weather Service and their website www.weather.gov. The National Weather Service operates weather stations throughout the country, and has historical records going back to the 1800s. The data that NOAA collects gives us information about both weather and climate. Weather is defined as the atmospheric conditions at a place and time, and refers to day to day temperature and precipitation activity. People commonly confuse weather with climate, although they are very different. Climate is the term for the averaging of atmospheric conditions over longer periods of time, generally on the span of years or decades. While weather can change in a few hours, climate generally takes hundreds of years to change. Types of weather include rain, snow, fog, windy, sunny, etc.
Today, we will be graphing and looking at temperature and precipitation, or rainfall (inches) data for Baltimore, Maryland. Below, you will find the precipitation in inches for July for Baltimore, MD over multiple decades (Data from: https://w2.weather.gov/climate/xmacis.php?wfo=lwx)
Morris et al. 2016. Biology: How Life Works, 2nd ed.
Stufflebeam, R. 2008. Introduction to the Scientific Method. Consortium on Cognitive Science Instruction.