This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
The 2022.1.1.30569 Patch/Minor release has been removed from the Download Portal due to a missing signature in some of the included files. This causes the files to not be recognized as valid files provided by Alteryx and might trigger warning messages by some 3rd party programs.
If you installed the 2022.1.1.30569 release, we recommend that you reinstall the patch.
I'm opening this topic for everyone to list some Big data* sets available over the net.
Feel free to list competion/datathon data sets
Results of web scraping
Social media data
Anything bigger than 1 mio records (beyond excel and access)
* Big data is data that is usually with sizes beyond the ability of commonly used software tools to manage and process within a tolerable elapsed time. A year long credit card transaction history or CDR (Call data record) of a telecoms company for the last 9 months, behavioral credit data of a large financial institution are some examples...
This 3TB+ dataset comprises the largest released source of GitHub activity to date. It contains a full snapshot of the content of more than 2.8 million open source GitHub repositories including more than 145 million unique commits, over 2 billion different file paths, and the contents of the latest revision for 163 million files, all of which are searchable with regular expressions.