I'm opening this topic for everyone to list some Big data* sets available over the net.
Best
Altan
* Big data is data that is usually with sizes beyond the ability of commonly used software tools to manage and process within a tolerable elapsed time. A year long credit card transaction history or CDR (Call data record) of a telecoms company for the last 9 months, behavioral credit data of a large financial institution are some examples...
--------------
[Edited by a Moderator]
We've compiled the responses to this thread into the following Knowledge Base article: Available "Big Data Sets" on the Web
--------------
Solved! Go to Solution.
Here is a link to a list of data sources that I compiled a while back. Hope it helps!
https://www.linkedin.com/pulse/need-data-bob-wyman?trk=mp-author-card
The Government of Canada has an Open Data portal -- http://open.canada.ca/en/open-data -- it takes some digging to find the gems, but there are some.
There's also some open mapping data at -- http://open.canada.ca/en/open-maps.
I don't know how many of these qualify as "Big data sets"...but there are a few.
This 3TB+ dataset comprises the largest released source of GitHub activity to date. It contains a full snapshot of the content of more than 2.8 million open source GitHub repositories including more than 145 million unique commits, over 2 billion different file paths, and the contents of the latest revision for 163 million files, all of which are searchable with regular expressions.
15.49TB of research data available.
http://academictorrents.com/
A scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds.
Regards,
Cristian
Australia, New South Wales Open data
Mixture of different Government department's data-sets. As Jason says, not all would qualify as big data.
The taxi dataset is what was used for IronViz at the Tableau conference in Nov 2016.
This site just opened up and has tons of data. It looks like the ability to download each set is "coming soon" as the site is in beta at the time of this posting.
https://public.enigma.com/ - Enigma Public states "they the world’s broadest collection of public data."
Vessel Traffic Data
https://marinecadastre.gov/ais/
11 billion rows of public ship AIS data to explore, spanning from 2009 to 2014