You are here

Big Data Sets you can use with R

Written by revolution_analytics On the

The world may indeed be awash with data, however, it is not always easy to find a suitable data set when you need one. As the number of people becoming involved with R and data science increases so does the need for interesting data sets for creating examples, showcasing machine learning algorithms and developing statistical analyses. The most difficult data sets to find are those that would provide the foundation for impressive big data examples: data sets with a 100 million rows and hundreds of variables.The problem with big data, however, is that most of it is proprietary and locked away. Consequently, when constructing examples it is often necessary "make do" with data sets that are considerably smaller than an analyst is likely to be faced with in practice. To help with this problem, we have added some new data sets to lists of data sets on inside-r.org that we began keeping since almost two years ago. So, if you are looking for a sample data set or if you are the kind of person who enjoys browsing data repositories as some people enjoy browsing bookstores have a look at what is available there. The following presents some of the highlights.

Joseph Rickert