Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Coventry-Northampton, UK

Welcome to the Coventry-Northampton User Group

Click in the JOIN GROUP button in Home to follow our news and attend our events!

2019 Q3 Meeting - The One with all the Webscraping - Meeting Recap

Samanthaj_hughes
ACE Emeritus
ACE Emeritus

The one with all the Webscraping!!  (Gotta love a Friends Reference!!)

 

WOW what a fantastic turn out and a great afternoon at Sainsbury's HQ. The agenda was as follows, and according to Joe Lipski was “HOT”.

 

Our RED HOT Agenda :)Our RED HOT Agenda :)

 

I, @Samanthaj_hughes kicked off with the usual community updates and introduced everyone in the user group to Dr Tim Rains who is our GRAND PRIX Driver this year at INSPIRE. He has accomplished so much this year with completing his PhD and successfully getting a place on the Grand Prix stage. I look forward to cheering him on with the User Group and Sainsbury’s colleagues on the night, so look out for us and join us. Also a shout out to Joe Serpis who will be his pit crew. Follow Tim on Twitter: Dr Tim Rains (Twitter)

 

Everyone ready to learnEveryone ready to learn

 

Then I started into the good stuff. Introduction to Webscraping. There is so much to share here, I hope you had a great time learning, the concepts and tools involved as well as the demos. The presentation and workflows are attached here.

 

Useful resources from my presentation:

 

Description

Link

W3 Schools

https://www.w3schools.com/html/

Weekly Challenges (F12)

https://community.alteryx.com/t5/Weekly-Challenge/Weekly-Challenge-Index-amp-Welcome/td-p/48275

XML Page Example

https://www.w3schools.com/xml/cd_catalog.xml

List of Presidents on Wiki

https://en.wikipedia.org/wiki/List_of_presidents_of_the_United_States

 

Some useful tools for WebscrapingSome useful tools for Webscraping

 

I covered a bit of ethics too, webscraping generally falls within the grey area, some websites terms and conditions do not find it acceptable for you to download their information. (Intellectual Property) Do be careful and ensure you know where you stand for every page. API’s are more acceptable as it’s the data they are happy to share and they can monitor what you are looking at. Remember most information on the HTML front page, is public information so you should be safe, but if you go digging deep then you may be crossing that line.

 

 

Queue Joe Lipski @Joe_Lipski to introduce the audience to APIs, the two main examples were around Spotify and Google APIs.

Getting APIs down!Getting APIs down!

 

https://developer.spotify.com/documentation/web-api/

https://console.developers.google.com/apis/dashboard

 

Great places to start, remember do not share your API key with anyone, especially if it’s hooked up to your personal details, credit card, etc.

 

Joe shared great tips around creating macros and using the download tool to access the wealth of APIs out there at the moment. It all starts with the API documentation.

 

Everyone is really interested into how to understand Google APIs.Everyone is really interested into how to understand Google APIs.

 

Joe’s talk will be at INSPIRE later this year so the slides will be available online, post INSPIRE.

 

Then came Joe Serpis @JosephSerpis, who showed us how easy it is to webscrape with the most basic of tools within the hour. Joe used the example of Weatherspoon’s and how easy it is to webscrape it. Within this challenge, Joe shared how often the rules changed during that hour and yet he still managed to accomplish it. Please see his slides and give him a shout, here: he is currently aiming to speed through the ranks on the community.

 

Joe's slides are in the attachment.

 

When ACEs get togetherWhen ACEs get together

 

 

Finally we had @chris_love, our final ACE talk. Chris really brought together everything we had all been sharing today. If you are trying desperately to remember that awesome REGEX tip, fear no more, here it is.

 

(.*?)

 

This little beauty and with repeatable patterns of text either side that you are looking for on a web page, will pull back the data you are after. Remember make sure its always the same, the minute you make it unique you will only get one result. 

 

Chris has a number of resources for you to consume being a Zen Master with Tableau and an Alteryx ACE.

https://public.tableau.com/profile/chrisluv#!/

 

He also has a data blog which I will get a link for and post as soon as possible.

 

I just want to take a moment, to thank you all for making the time to come along to this user group, it’s been great having a growing audience. Thank you @TuvyL, for coming to Coventry and experiencing our User Group with jet lag, we really appreciate it. I hope to do a Xmas theme next time along with personal sharing stories, so please get in touch if you would to speak, host or would like to see something specific. Remember it’s our user group.

 

ACEs gotta be cool.ACEs gotta be cool.

 

Date for your diaries 6th December 2019 - hope to see you all there for some festive fun.

 

Tim, Joe and I also have a blog which has recently been published onto the Alteryx Blog page (super proud) see Intersections and Overlaps. 

 

Until the next time

#alteryxrocks

S*

#Alteryxrocks
3 REPLIES 3
Si-Pri
8 - Asteroid

@chris_love @Samanthaj_hughes @Joe_Lipski @JosephSerpis  Thank you so much for sharing your fantastic presentations yesterday, I regret I couldn't thank you in person as I was running late for a conference call. The session was so good, it's got me online on a Saturday to have a play.

 

#(.*?)

 

🙂

Samanthaj_hughes
ACE Emeritus
ACE Emeritus
I love the fact you are playing on a Saturday because we have inspired you to have a go.

Thanks for letting us know Simon.

I agree can we trend the following hashtag do you reckon?

#(.*?)

S*


________________________________

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager (postmaster@sainsburys.co.uk) and delete it from your system.

Sainsbury's Supermarkets Ltd (3261722 England)
Registered Offices: 33 Holborn, London, EC1N 2HT

Sainsbury's Argos is a trading name of both:
1) Argos Limited, Registered office: 489-499 Avebury Boulevard, Milton Keynes, United Kingdom, MK9 2NW, registered number: 01081551 (England and Wales); and
2) Sainsbury's Supermarkets Limited, Registered office: 33 Holborn, London, EC1N 2HT, registered number: 03261722 (England and Wales).

All companies listed above are subsidiaries of J Sainsbury plc (185647).

________________________________
#Alteryxrocks
Samanthaj_hughes
ACE Emeritus
ACE Emeritus

Chris's blog: https://medium.com/@databeats

I always keep a promise 🙂

#Alteryxrocks