Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
MeganBowers
Alteryx Community Team
Alteryx Community Team

You might need dummy data for various reasons—protecting data privacy, learning new software, creating portfolio projects, or asking for help on forums, to name a few.

 

There are ways you can mask sensitive data fields in Alteryx (like this), but sometimes you may need a whole new dataset.

 

Luckily, there are many (free) websites and repositories for data that you can use in your projects. Keep reading to learn more about creating and accessing the data you need.

 

Generating Dummy Data

 

If you need to create a dataset using certain parameters or need placeholder data to start a project, generating a new dataset (aka “dummy data”) may be the best way to proceed.  Here are a few options:

 

Mockaroo: With the free plan, you can generate 1,000 rows of dummy data per download. One cool feature of Mockaroo is that you can generate fields using AI–describe your topic, and it will generate field names and types for you!

 

mockaroo1.png

 

mockaroo2.png

 Social media data schema generated by AI

 

The output format options are extensive as well, including Excel files.

 

Dataconstruct: Generate up to 1000 rows of data by adding fields and selecting data types. There are many data types to choose from (e.g., Datetime, Countries, Currency codes, Street Addresses, etc.)

 

The output formats are more limited but great for development projects:

 

dataconstruct.png

 

Generatedata: You can generate up to 20 preview rows of data for free by selecting the fields you want and the output format. The site has a nice user interface and lots of output format options.

 

generate data.png

 

Dataset Repositories

 

Sometimes, you want an existing dataset to plug and play with for your analysis. Much of the data on these sites below is real and can be helpful for portfolio projects, learning, and more.

 

Kaggle: You may have heard of Kaggle as it is commonly used in data education. Their datasets page is extensive; you can search by the analysis type you want to complete (e.g., classification, NLP, data visualization). Kaggle also gives each dataset a usability score and shows Python projects from other users who analyzed the dataset.


Kaggle.png

 

UC Irvine Machine Learning Repository: The place to go if you want to find images of 13,611 grains of 7 different registered dry beans.

 

In all seriousness, this site houses amazing data for machine learning projects. If you are upskilling in machine learning, you can find rich datasets with all kinds of feature types and subject areas.

 

UCI.png

 

Public Dataset Repository (GitHub): A testament to the power of open source, this massive list on GitHub contains links to public data in many industries. Instead of scouring the web for datasets, take a look at this list first! It is well maintained, with indicators for broken dataset links.

 

Github.png

 

Google Dataset Search: Did you know that Google has a separate search engine for datasets? It pulls useful information into the search results so you can see a preview of the data, know when it was updated, and understand if it is openly accessible or not.

 

Dataset search.png

 

Conclusion

 

Hopefully, these resources will be useful for your next data project. Whether you generate data for a use case to build a prototype solution or need data to experiment with machine learning models, there is plenty out there for the taking. Or rather, the downloading.

 

If you get your dummy data somewhere else, let us know in the comments!

 

Megan Bowers
Sr. Content Manager

Hi, I'm Megan! I am a Sr. Content Manager at Alteryx. I work to make sure our blogs and podcast have high quality, helpful, and engaging content. As a data analyst turned writer, I am passionate about making analytics & data science accessible (and fun) for all. If there is content that you think the community is missing, feel free to message me--I would love to hear about it.

Hi, I'm Megan! I am a Sr. Content Manager at Alteryx. I work to make sure our blogs and podcast have high quality, helpful, and engaging content. As a data analyst turned writer, I am passionate about making analytics & data science accessible (and fun) for all. If there is content that you think the community is missing, feel free to message me--I would love to hear about it.

Comments
BS_THE_ANALYST
14 - Magnetar
14 - Magnetar

Thanks @MeganBowers  there's some gems in here. 

 

I didn't know about Google Dataset Search either. 

 

Very cool stuff 😎.

CailinS
Alteryx
Alteryx

This is such a great compilation of resources and examples. Thank you @MeganBowers !