You are here
Visualizing Huge Data Sets with R: An Old Wives Tale from the U.S. Census
This white paper provides an example of quickly analyzing and visualizing a huge data set in R using the new R package from Revolution Analytics, RevoScaleR. The example uses a census data set with over 14 million observations (5% Public Use Microdata Sample (PUMS) of the 2000 United States Census) and examines patterns related to the sex ratio by age. After first identifying an aberration in the aggregate data, we are able to quickly drill down and create plots conditioned on a variety of characteristics such as region, race, and marital status. In the process, errors in the data are graphically revealed; for example, 65-year-old men are more likely to have an “old wife” age 70 than a wife their own age. The power and flexibility of the R language and graphics combined with the speed of RevoScaleR make visualization an easy and critical first step to analysis of huge data sets.Also included are scripts to recreate the analysis using Revolution R Enterprise 4, and a link to a video demonstration of the analysis.