
Neat and approachable datasets


This dataset is the go to dataset for pedagogical use. It’s not particularly interesting but you should become familiar with it as many lessons depend on it.

The dataset is preloaded in R

> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa



Description and link TBD



Description and link TBD


Survey datasets

American Community Survey

Like the Census, but much more detailed.


American Time Use Survey

Measures the amount of time Americans spend on various activities in a given day. Combination of survey and diary data.


The General Social Survey

80-year survey covering sociological and attitudinal trends in the United States. Topics include “civil liberties, crime and violence, intergroup tolerance, morality, national spending priorities, psychological well-being, social mobility, and stress and traumatic events.”


National Health and Nutrition Examination Survey

Designed to assess the health and nutritional status of adults and children in the United States. Combination of interviews, physical exams, and lab tests.


Great but messy data

NYC subway turnstile data

A large, messy dataset containing every entry and exit to the NYC subway system.…/Turnstile


NYC cab data

Another huge dataset from the city. Contains every yellow and green cab ride. See blogger Todd Schneider’s post analyzing this dataset for ideas.…/trip record data