I'm opening this topic for everyone to list some small data* sets available over the net.
Best
Altan
* Small data is data that is small enough size for human comprehension.
A few thousand lines of credit data or marketing segmentation example data, B2B client contact history of a firm are some examples...
Solved! Go to Solution.
I believe most of the people interested in advanced analytics and Alteryx know of kaggle.com
Just for the ones who has yet to come accross;
"Kaggle is a platform for predictive modelling and analytics competitions on which companies and researchers post their data and statisticians and data miners from all over the world compete to produce the best models. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know at the outset which technique or analyst will be most effective."
There are multiple available small datasets that you can test your skills on, here I link to a few of them;
1. https://www.kaggle.com/c/informs2010 - The goal of this contest is to predict short term movements in stock prices
2. https://www.kaggle.com/c/axa-driver-telematics-analysis - Use telematic data to identify a driver signature
3. https://www.kaggle.com/c/sf-crime - Predict the category of crimes that occurred in the city by the bay
You may find 202 more under the following link https://www.kaggle.com/competitions/search?DeadlineColumnSort=Descending
Here is a "Small Data" set to test your skills on
http://www.cs.utexas.edu/users/ml/riddle/data.html
Merry Christmas
Altan
Kaggle has started a section called Kaggle Datasets, that has public datasets that you can use as datasets for the competitions were often restricted for use outside the competition.
Kaggle also has scripts for processing the given data sets: https://www.kaggle.com/scripts, which are usually in R or Python. It can be instructive to look at those and discern which parts can be pulled into standard Alteryx tools, and which parts left to a custom R call, for instance. The nice thing is that, once you've finished, you can submit your output to the relevant Kaggle competition (even after the fact) to see how your output stacks up to the competition.
Just a source for 'raw' data (as opposed to how to use it, although the sight does have some interesting applications) is for weather data...
Here is an addition from Europe...
I was not aware EU had an open data initiative so far, here is the link;
http://open-data.europa.eu/en/data/
"The European Union Open Data Portal is the single point of access to a growing range of data from the institutions and other bodies of the European Union (EU). Data are free for you to use and reuse for commercial or non-commercial purposes.
By providing easy and free access to data, the portal aims to promote their innovative use and unleash their economic potential. It also aims to help foster the transparency and the accountability of the institutions and other bodies of the EU."
This time it's USGS Earthquake data set... ıt's not big as all these social media data sets, IoT data etc..
Data can be grouped by;
Available outputs are;
Here is the link; https://earthquake.usgs.gov/earthquakes/search/