community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Designer Knowledge Base

Definitive answers from Designer experts.

Available "Small Data Sets" on the Web

Alteryx
Alteryx
Created on

Question 

 

What are some "Small Data Sets" available over the internet?

   

Small data is data that is small enough size for human comprehension.  A few thousand lines of credit data or marketing segmentation example data, B2B client contact history of a firm are some examples... 

Answer

 

  • kaggle.com "Kaggle is a platform for predictive modeling and analytics competitions on which companies and researchers post their data and statisticians and data miners from all over the world compete to produce the best models. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modeling task and it is impossible to know at the outset which technique or analyst will be most effective." There are multiple available small datasets that you can test your skills on:  https://www.kaggle.com/c/informs2010 - The goal of this contest is to predict short term movements in stock prices. https://www.kaggle.com/c/axa-driver-telematics-analysis - Use telematic data to identify a driver signature. https://www.kaggle.com/c/sf-crime - Predict the category of crimes that occurred in the city by the bay.  You may find 202 more under the following link https://www.kaggle.com/competitions/search?DeadlineColumnSort=Descending 
  • Kaggle has started a section called Kaggle Datasets, that has public datasets that you can use as datasets for the competitions were often restricted for use outside the competition. https://www.kaggle.com/datasets
  • Kaggle also has scripts for processing the given data sets: https://www.kaggle.com/scripts, which are usually in R or Python. It can be instructive to look at those and discern which parts can be pulled into standard Alteryx tools, and which parts left to a custom R call, for instance.  The nice thing is that, once you've finished, you can submit your output to the relevant Kaggle competition (even after the fact) to see how your output stacks up to the competition.
  • "Small Data" set to test your skills on Duplicate Detection, Record Linkage, and Identity Uncertainty http://www.cs.utexas.edu/users/ml/riddle/data.html 
  • Here is an addition from Europe...http://open-data.europa.eu/en/data/  "The European Union Open Data Portal is the single point of access to a growing range of data from the institutions and other bodies of the European Union (EU). Data are free for you to use and reuse for commercial or non-commercial purposes. By providing easy and free access to data, the portal aims to promote their innovative use and unleash their economic potential. It also aims to help foster the transparency and the accountability of the institutions and other bodies of the EU."
Comments
Alteryx Partner

This topic has a still growing collection under the original community post, first initiated by me on 22-11-2015

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Available-quot-Small-data-sets-quot-ov...