Click in the JOIN GROUP button in Home to follow our news and attend our events!
The one with all the Webscraping!! (Gotta love a Friends Reference!!)
WOW what a fantastic turn out and a great afternoon at Sainsbury's HQ. The agenda was as follows, and according to Joe Lipski was “HOT”.
Our RED HOT Agenda :)
I, @Samanthaj_hughes kicked off with the usual community updates and introduced everyone in the user group to Dr Tim Rains who is our GRAND PRIX Driver this year at INSPIRE. He has accomplished so much this year with completing his PhD and successfully getting a place on the Grand Prix stage. I look forward to cheering him on with the User Group and Sainsbury’s colleagues on the night, so look out for us and join us. Also a shout out to Joe Serpis who will be his pit crew. Follow Tim on Twitter: Dr Tim Rains (Twitter)
Everyone ready to learn
Then I started into the good stuff. Introduction to Webscraping. There is so much to share here, I hope you had a great time learning, the concepts and tools involved as well as the demos. The presentation and workflows are attached here.
Useful resources from my presentation:
Description |
Link |
W3 Schools |
Weekly Challenges (F12) | |
XML Page Example |
List of Presidents on Wiki | |
Some useful tools for Webscraping
I covered a bit of ethics too, webscraping generally falls within the grey area, some websites terms and conditions do not find it acceptable for you to download their information. (Intellectual Property) Do be careful and ensure you know where you stand for every page. API’s are more acceptable as it’s the data they are happy to share and they can monitor what you are looking at. Remember most information on the HTML front page, is public information so you should be safe, but if you go digging deep then you may be crossing that line.
Queue Joe Lipski @Joe_Lipski to introduce the audience to APIs, the two main examples were around Spotify and Google APIs.
Getting APIs down!
Great places to start, remember do not share your API key with anyone, especially if it’s hooked up to your personal details, credit card, etc.
Joe shared great tips around creating macros and using the download tool to access the wealth of APIs out there at the moment. It all starts with the API documentation.
Everyone is really interested into how to understand Google APIs.
Joe’s talk will be at INSPIRE later this year so the slides will be available online, post INSPIRE.
Then came Joe Serpis @JosephSerpis, who showed us how easy it is to webscrape with the most basic of tools within the hour. Joe used the example of Weatherspoon’s and how easy it is to webscrape it. Within this challenge, Joe shared how often the rules changed during that hour and yet he still managed to accomplish it. Please see his slides and give him a shout, here: he is currently aiming to speed through the ranks on the community.
Joe's slides are in the attachment.
When ACEs get together
Finally we had @chris_love, our final ACE talk. Chris really brought together everything we had all been sharing today. If you are trying desperately to remember that awesome REGEX tip, fear no more, here it is.
This little beauty and with repeatable patterns of text either side that you are looking for on a web page, will pull back the data you are after. Remember make sure its always the same, the minute you make it unique you will only get one result.
Chris has a number of resources for you to consume being a Zen Master with Tableau and an Alteryx ACE.!/
He also has a data blog which I will get a link for and post as soon as possible.
I just want to take a moment, to thank you all for making the time to come along to this user group, it’s been great having a growing audience. Thank you @TuvyL, for coming to Coventry and experiencing our User Group with jet lag, we really appreciate it. I hope to do a Xmas theme next time along with personal sharing stories, so please get in touch if you would to speak, host or would like to see something specific. Remember it’s our user group.
ACEs gotta be cool.
Date for your diaries 6th December 2019 - hope to see you all there for some festive fun.
Tim, Joe and I also have a blog which has recently been published onto the Alteryx Blog page (super proud) see Intersections and Overlaps.
Until the next time
@chris_love @Samanthaj_hughes @Joe_Lipski @JosephSerpis Thank you so much for sharing your fantastic presentations yesterday, I regret I couldn't thank you in person as I was running late for a conference call. The session was so good, it's got me online on a Saturday to have a play.