Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Available "Small data sets" over the internet...

Atabarezz
13 - Pulsar

I'm opening this topic for everyone to list some small data* sets available over the net.

 

  • Feel free to list competion data sets
  • Data journalism examples
  • Tutorial datasets from different analytics tools...

 

Best

 

Altan

 

Small data is data that is small enough size for human comprehension.

A few thousand lines of credit data or marketing segmentation example data, B2B client contact history of a firm are some examples... 

7 REPLIES 7
Atabarezz
13 - Pulsar

I believe most of the people interested in advanced analytics and Alteryx know of kaggle.com

Just for the ones who has yet to come accross;

 

"Kaggle is a platform for predictive modelling and analytics competitions on which companies and researchers post their data and statisticians and data miners from all over the world compete to produce the best models. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know at the outset which technique or analyst will be most effective."

 

There are multiple available small datasets that you can test your skills on, here I link to a few of them;

 

1. https://www.kaggle.com/c/informs2010 - The goal of this contest is to predict short term movements in stock prices

2. https://www.kaggle.com/c/axa-driver-telematics-analysis - Use telematic data to identify a driver signature

3. https://www.kaggle.com/c/sf-crime - Predict the category of crimes that occurred in the city by the bay

 

You may find 202 more under the following link https://www.kaggle.com/competitions/search?DeadlineColumnSort=Descending

 

Atabarezz
13 - Pulsar

Here is a "Small Data" set to test your skills on

 

  • Duplicate Detection,
  • Record Linkage, and
  • Identity Uncertainty

 

 

http://www.cs.utexas.edu/users/ml/riddle/data.html

 

 

Merry Christmas

 

Altan

 

KaneG
Alteryx Alumni (Retired)

Kaggle has started a section called Kaggle Datasets, that has public datasets that you can use as datasets for the competitions were often restricted for use outside the competition.

 

 
 
 
JohnJPS
15 - Aurora

Kaggle also has scripts for processing the given data sets: https://www.kaggle.com/scripts, which are usually in R or Python. It can be instructive to look at those and discern which parts can be pulled into standard Alteryx tools, and which parts left to a custom R call, for instance.  The nice thing is that, once you've finished, you can submit your output to the relevant Kaggle competition (even after the fact) to see how your output stacks up to the competition.

 

RodL
Alteryx Alumni (Retired)

Just a source for 'raw' data (as opposed to how to use it, although the sight does have some interesting applications) is for weather data...

https://www.ncdc.noaa.gov/data-access

Atabarezz
13 - Pulsar

Here is an addition from Europe...

I was not aware EU had an open data initiative so far, here is the link;

 

http://open-data.europa.eu/en/data/

 

Picture1.png

 

"The European Union Open Data Portal is the single point of access to a growing range of data from the institutions and other bodies of the European Union (EU). Data are free for you to use and reuse for commercial or non-commercial purposes.

By providing easy and free access to data, the portal aims to promote their innovative use and unleash their economic potential. It also aims to help foster the transparency and the accountability of the institutions and other bodies of the EU."

Atabarezz
13 - Pulsar

This time it's USGS Earthquake data set... ıt's not big as all these social media data sets, IoT data etc..

 

Data can be grouped by;

 

  • Data&time
  • Location and
  • Magnitude

 

USGS_logo[1].png

 

 

Available outputs are;

  • Map & List
  • CSV
  • KML
  • QuakeML
  • GeoJSON

 

Here is the link; https://earthquake.usgs.gov/earthquakes/search/

Labels