| Two Variable Stats |
Intro to StatsBasics | Single Variable | Two VariableTwo Variable StatisticsNow you're a pro at analyzing individual data columns. They are very useful, but often you want to see the relationship between two different data columns. For instance, you might want to know how a city's average temperature relates to its longitude. We provide several tools for determining how strong these relationships are what their equations are. Two variable statistics tell you how strongly two data columns are correlated (related) and if so how you can use one to predict the other. Obviously, if the correlation is poor, the corresponding predictions are poor. ScatterplotThere is a scatterplot at the bottom of the sidebar. This plots the data column you clicked on on the x axis and the one you have moused over on the y axis. By looking at the graph, you can qualitatively determine if a relationship between the two columns exists. If the points seem to form a straight line or curve, that may indicate a mathematical relationship between them. If they seem to be scattered and do not follow any simple pattern, they are probably uncorrelated and thus unrelated. Our program tries to draw the best possible straight line through all of the data points on the scatterplot. In the lingo of the field this is known as a "linear regression line". This line is drawn in blue while the data points are drawn in black. The blue line or "regression line" can be used as a metric for judging the relationships between the data in the two columns. Should many points fall near the line, the line is probably a good approximation for the relationship. Should the data follow some other pattern, the data may still be related, but the straight line and the associated statistics are suspect. Of course, if the data appears to be scattered without any obvious pattern is probably entirely unrelated. In the below statistics section, we discuss quantitative metrics for the accuracy of the regression line, finding its equation and using it as a predictor. Statistics
CAUTION: Although regression can tell you about correlations, it does not tell you whether one factor causes another. For instance, if you measured the price of food and the price of gas over the past 50 years, you would probably find an extremely strong correlation between them. However, this does not mean that gas prices directly effect food prices. Rather, both may have increased proportionally due to some third factor (e.g. inflation). As the old adage goes, "CORRELATION DOES NOT NECESSARILY IMPLY CAUSATION".
|