Hello,
I admittedly don't have much experience at all parsing HTML data, but have been tasked with attempting to screen scrape the Text off the various Countries from this website:
https://www.omnipresent.com/global-employment-solutions/peo-albania
For example, I am trying to get the data of all the Text you see on the screen like "Employee income Taxes in Albania - the rate of personal taxation varies depending on the income tax bracket the individual belongs to. This ranges between 0% - 23%". I'm wanting to create a matrix with all of these answers between countries, but figured I'd need to correctly parse out this text / answers to these questions for one country, before applying it to all the countries.
Is anyone able to take a stab at this using the Download tool and removing the various tags from HTML to try to just get the text shown on the website left within the Alteryx workflow?
Thanks!
Solved! Go to Solution.
Hi, @taxguy33.
My post here may help: https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-Scrape-Branch-Details/m-p/...
One reason I may not want to use the Download tool is because it has some constraints, one of which is that it is synchronous, meaning it sends requests one at a time, waiting for the response, before sending another request. Python has asynchronous functions that allow you to send requests to many websites at once. Also, Python has robust solutions for solving this exact use case.
Hi @taxguy33
I am attaching a workflow able to scrap the information you need from the Albanian Global Employment Solutions & PEO web page you supply, I will details how some of the rules-formulas in the workflow were derived:
Albanian Global Employment Solutions & PEO inspection: (F12 in Google Chrome)
Albanian page processing, Alteryx Workflow:
The 0% -23 % range is returned by the workflow:
Comments and Conclusions:
Hope this helps,
Arnaldo
@ArnaldoSandoval This is awesome! Thank you so much - you spelled out everything you were doing perfectly and easy to learn from.