Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Coventry-Northampton, UK

Welcome to the Coventry-Northampton User Group

Click in the JOIN GROUP button in Home to follow our news and attend our events!

Web Scraping newbie!

reyres
7 - Meteor

Hi everyone!

 

As a relative new-comer to Alteryx I am exploring some of the capabilities and one thing I would like to know more about is how to use Alteryx for web-scraping. I believe that some of you in this user group already use Alteryx for web-scraping and I was wondering if anyone would be willing to have a chat about it and/or demo what you have done?

 

Any help would be much appreciated!

 

Thanks,

 

Ricahrd.

11 REPLIES 11
Samanthaj_hughes
ACE Emeritus
ACE Emeritus
Absolutely Richard,

Welcome to Webscraping! It's a fine art, what kind of websites are you looking to scrape?

S*

Samantha Hughes | Analytical System Developer | Property Data Analytics
Sainsbury's Supermarkets Ltd | Ansty Park
Samanthaj.hughes@sainsburys.co.uk | 02476 529165

[cid:image001.gif@01CFF2A2.DCCDEAE0] [AdvancedLogo18]
www.sainsburys.co.uk<>

________________________________

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager (postmaster@sainsburys.co.uk) and delete it from your system.

Sainsbury's Supermarkets Ltd (3261722 England)
Registered Offices: 33 Holborn, London, EC1N 2HT

Sainsbury's Argos is a trading name of both:
1) Argos Limited, Registered office: 489-499 Avebury Boulevard, Milton Keynes, United Kingdom, MK9 2NW, registered number: 01081551 (England and Wales); and
2) Sainsbury's Supermarkets Limited, Registered office: 33 Holborn, London, EC1N 2HT, registered number: 03261722 (England and Wales).

All companies listed above are subsidiaries of J Sainsbury plc (185647).

________________________________
#Alteryxrocks
reyres
7 - Meteor

Hi Samantha,

 

Thank you for getting back to me so quickly! At the moment I don't have a specific website in mind I'm really looking for some examples of what is possible and what other people are doing with the tool.

 

Thanks,

 

Richard.

Samanthaj_hughes
ACE Emeritus
ACE Emeritus
Ok I'll try and find an example for you.

It's all about the websites really, as everyone is different, brush up your HTML, JSON and XML knowledge.

Then dig deeper with the developer tools within your browser. Here you will find all your gems.

S*

Samantha Hughes | Analytical System Developer | Property Data Analytics
Sainsbury's Supermarkets Ltd | Ansty Park
Samanthaj.hughes@sainsburys.co.uk | 02476 529165

[cid:image001.gif@01CFF2A2.DCCDEAE0] [AdvancedLogo18]
www.sainsburys.co.uk<>

________________________________

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager (postmaster@sainsburys.co.uk) and delete it from your system.

Sainsbury's Supermarkets Ltd (3261722 England)
Registered Offices: 33 Holborn, London, EC1N 2HT

Sainsbury's Argos is a trading name of both:
1) Argos Limited, Registered office: 489-499 Avebury Boulevard, Milton Keynes, United Kingdom, MK9 2NW, registered number: 01081551 (England and Wales); and
2) Sainsbury's Supermarkets Limited, Registered office: 33 Holborn, London, EC1N 2HT, registered number: 03261722 (England and Wales).

All companies listed above are subsidiaries of J Sainsbury plc (185647).

________________________________
#Alteryxrocks
Si-Pri
8 - Asteroid

Hi Richard,

 

It's great for building catalogues or mapping locations. You may also be able make use of internal data, such as extracting numbers from an active directory on your SharePoint to build a comprehensive phone directory on your mobile etc.

 

@Joe_Lipski  and the team at Javelin have an absolutely brilliant Webinar on it here: https://pages.alteryx.com/Javelin-Webinar-Series-Master.html

You can even download their examples!

 

Enjoy!

Si

Samanthaj_hughes
ACE Emeritus
ACE Emeritus

Hi Richard,

 

This is an example of where a website has JSON behind the scenes.

 

Check out the webinar series, if you haven’t already. It’s a great resource.

 

Enjoy,

 

Samantha

#Alteryxrocks
reyres
7 - Meteor

Thank you Samantha for the sample file and to Si for the link. I'll definitely check out the webinars!

 

Do either of you use web-scraping for senitment analysis at all?

Samanthaj_hughes
ACE Emeritus
ACE Emeritus
We have done some twitter scraping for looking into text sentiment. Not actually got to that yet though. There is lots out there. Check this out if you haven't seen it yet. :)

https://community.alteryx.com/t5/Data-Science-Blog/Text-Analysis-in-Alteryx-with-the-Python-SDK-Gend...


Samantha Hughes | Analytical System Developer | Property Data Analytics
Sainsbury's Supermarkets Ltd | Ansty Park
Samanthaj.hughes@sainsburys.co.uk | 02476 529165

[cid:image001.gif@01CFF2A2.DCCDEAE0] [AdvancedLogo18]
www.sainsburys.co.uk<>

________________________________

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager (postmaster@sainsburys.co.uk) and delete it from your system.

Sainsbury's Supermarkets Ltd (3261722 England)
Registered Offices: 33 Holborn, London, EC1N 2HT

Sainsbury's Argos is a trading name of both:
1) Argos Limited, Registered office: 489-499 Avebury Boulevard, Milton Keynes, United Kingdom, MK9 2NW, registered number: 01081551 (England and Wales); and
2) Sainsbury's Supermarkets Limited, Registered office: 33 Holborn, London, EC1N 2HT, registered number: 03261722 (England and Wales).

All companies listed above are subsidiaries of J Sainsbury plc (185647).

________________________________
#Alteryxrocks
Si-Pri
8 - Asteroid

Thanks for posting that article! Very interesting.

 

Richard, I've not webscaped to get my data, as reviews are prepared for me from another department in Excel format. I do however run sentiment analysis on them by using the following code in the R tool:

 

library(sentimentr)

sent <- sentiment_by(get_sentences(read.Alteryx("#1")))

write.Alteryx(sent, 1)

 

Please note, in order to make the script work, I have to use an older version of alteryx (2018.3) to find the library 'sentimentr'

(Check out the sentiment webinar too!)

Si-Pri
8 - Asteroid

 it may work in the 2019 releases, I will update mine and let you know!