The one with all the Webscraping!! (Gotta love a Friends Reference!!)
WOW what a fantastic turn out and a great afternoon at Sainsbury's HQ. The agenda was as follows, and according to Joe Lipski was “HOT”.
I, @samanthaj_hughes kicked off with the usual community updates and introduced everyone in the user group to Dr Tim Rains who is our GRAND PRIX Driver this year at INSPIRE. He has accomplished so much this year with completing his PhD and successfully getting a place on the Grand Prix stage. I look forward to cheering him on with the User Group and Sainsbury’s colleagues on the night, so look out for us and join us. Also a shout out to Joe Serpis who will be his pit crew. Follow Tim on Twitter: Dr Tim Rains (Twitter)
Then I started into the good stuff. Introduction to Webscraping. There is so much to share here, I hope you had a great time learning, the concepts and tools involved as well as the demos. The presentation and workflows are attached here.
Useful resources from my presentation:
Weekly Challenges (F12)
XML Page Example
List of Presidents on Wiki
I covered a bit of ethics too, webscraping generally falls within the grey area, some websites terms and conditions do not find it acceptable for you to download their information. (Intellectual Property) Do be careful and ensure you know where you stand for every page. API’s are more acceptable as it’s the data they are happy to share and they can monitor what you are looking at. Remember most information on the HTML front page, is public information so you should be safe, but if you go digging deep then you may be crossing that line.
Queue Joe Lipski @joe_lipski to introduce the audience to APIs, the two main examples were around Spotify and Google APIs.
Great places to start, remember do not share your API key with anyone, especially if it’s hooked up to your personal details, credit card, etc.
Joe shared great tips around creating macros and using the download tool to access the wealth of APIs out there at the moment. It all starts with the API documentation.
Joe’s talk will be at INSPIRE later this year so the slides will be available online, post INSPIRE.
Then came Joe Serpis @JosephSerpis, who showed us how easy it is to webscrape with the most basic of tools within the hour. Joe used the example of Weatherspoon’s and how easy it is to webscrape it. Within this challenge, Joe shared how often the rules changed during that hour and yet he still managed to accomplish it. Please see his slides and give him a shout, here: he is currently aiming to speed through the ranks on the community.
Joe's slides are in the attachment.
Finally we had @chris_love, our final ACE talk. Chris really brought together everything we had all been sharing today. If you are trying desperately to remember that awesome REGEX tip, fear no more, here it is.
This little beauty and with repeatable patterns of text either side that you are looking for on a web page, will pull back the data you are after. Remember make sure its always the same, the minute you make it unique you will only get one result.
Chris has a number of resources for you to consume being a Zen Master with Tableau and an Alteryx ACE.
He also has a data blog which I will get a link for and post as soon as possible.
I just want to take a moment, to thank you all for making the time to come along to this user group, it’s been great having a growing audience. Thank you @TuvyL, for coming to Coventry and experiencing our User Group with jet lag, we really appreciate it. I hope to do a Xmas theme next time along with personal sharing stories, so please get in touch if you would to speak, host or would like to see something specific. Remember it’s our user group.
Date for your diaries 6th December 2019 - hope to see you all there for some festive fun.
Tim, Joe and I also have a blog which has recently been published onto the Alteryx Blog page (super proud) see Intersections and Overlaps.
Until the next time