Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Web Scraping, selecting alternative url links based on a condition

Lili7891
7 - Meteor

Hi, 

 

Pretty new to web scraping so I'll try and make myself as clear as I can but hopefully someone will be able to point me in the right direction. 

I currently have a workflow which will pull the url links from a website which has documents set out in different languages, most of these urls have 3 documents (3 languages) per page, what I would like to do is perhaps set a formula up in the workflow which recognises key words in the url relating to the language selection e.g. 'English', 'Arabic', 'French', 'Dutch' and select an alternative link if any of the non required languages appear in the files. 

 

The problem with the links is they don't all have the same languages, so a page with Arabic text may only have links for French, German or Hindi whilst a page with English text may only have links for Dutch, French, Arabic etc. so doing a switch of the key word won't always work as the language options I want may not necessarily be in those links. 

 

I basically have a list of languages I don't require so what I want to achieve is have the workflow recognise the key word e.g. if I don't want Arabic, I want it to automatically go to the second or even third link.

Can anyone advise if this is possible?

Thanks!

6 REPLIES 6
dougperez
12 - Quasar

So you want to build a app or a macro to select the correct language and then run the workflow based on that selection? 

 

Lili7891
7 - Meteor

@dougperez I just want something that will go if the url per link 1 contains a key word e.g. 'Dutch' then go to link 2 or 3 / the url for the 2nd or 3rd one. No user selection required, I'd like this to run automatically, hence why I thought a formula might be required

DawnDuong
13 - Pulsar
13 - Pulsar

hi @Lili7891 

You can do that by using the Check Box interface tool in an app.

https://help.alteryx.com/current/designer/check-box-tool

 

The example workflow shows how the user selection "activates" different action, which is the same as what you are trying to achieve here.
Dawn.

Lili7891
7 - Meteor

Hi @DawnDuong Appreciate the help, I am familiar with the check box in an app format, however I want the workflow to be able to automatically scrape based on the condition as opposed to an end user being able to select the language, hope that makes sense.

DawnDuong
13 - Pulsar
13 - Pulsar

Hi @Lili7891 

Thanks for clarifying. Unless there is an easy way to limit how far need to scrape before detecting the language then it can mean need to scrape the entire page...

I think there is an R package to guess the language (auto detect language), but i'm not as familiar with it. Very interesting topic - i'll watch this space to hear advice from the community gurus 

Cheers

Dawn.

dougperez
12 - Quasar

@Lili7891  can you provide some mock data to me?

Labels