Visible Statistics       Visible Probability
Home    Forums    About Us
Support    Documentation
Basic Statistics

Intro to Stats

Basics | Single Variable | Two Variable

The below is meant to be a qualitative primer in statistics for our users. It is not comprehensive or a good substitute for a statistics text. There are many excellent statistics texts available and we suggest that users with questions consult them for a more thorough treatment.

The Visible Statistics Data Model

Visible Statistics consists of data sets each of which has one or more related data tables. For our analysis to work, data tables must be organized so that all cells in a column contain the same type of information. For example, a table might consist of the prices of food in various cities. The columns might consist of the city name, the price of eggs, the price of bread, the price of milk and the price of a meal at a nice restaurant. Each row would represent a different city. It is important to note that our software cannot understand a table where rows contain the same type of data and columns vary. Our software attempts to automatically flip tables so that columns have the same types, but it sometimes makes mistakes, so you may have to flip them manually in our table manager.

Categorical Data vs. Numerical Data

There are two basic types of data for statistical analysis: categorical data and numerical data. Categorical data classifies something into one of several categories. For instance, for M&Ms, each M&M is either red, green, blue, yellow or brown. Thus, we can classify each M&M by putting it in one of each of these categories. (See our M&M tutorial)

Numerical data consists of numbers that correspond to some property of an item. For instance, the height of a person expressed in decimal notation. Unlike categorical data, there is usually particular "correct" way to group numerical data into categories. For example, it is unclear if a 6'5" man and a 6'1" man belong in the same category.

Visible Statistics takes data tables that you provide, and it attempts to determine automatically if table columns are categorical or numeric data. Our software is pretty good, but it sometimes makes mistakes on tricky tables. You can manually force it to treat data as categorical or numerical by setting the column options.

For the rest of this document, we will concentrate on analyzing data columns which contain numeric data.


By using this site, you agree to our Terms of Use (click to view).