In case you missed the announcement: The Alteryx One Fall Release is here! Learn more about the new features and capabilities here
ACT NOW: The Alteryx team will be retiring support for Community account recovery and Community email-change requests after December 31, 2025. Set up your security questions now so you can recover your account anytime, just log out and back in to get started. Learn more here
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

WebScrapping with Alteryx + Selenium

imaran_669
8 - Asteroid

Hi community, I'm currently working on a project to switch a python process I have to incorporate it to an Alteryx workflow.

 

The process is very simple, selects a code, it types it on a webpage and brings the results, then on to the next one. 

 

Reading a bit the forums I managed to make it work in Alteryx but I need to do 2 modifications that I cant seem to get my head into.

 

1) Instead of using an excel file I want my data to come from the workflow. 

2) Instead o bulking all results to a single cell I want the output to go in each row of the iterated data. 

 

Can anyone guide me a bit here?

 

Thanks!!!

3 REPLIES 3
danilang
19 - Altair
19 - Altair

Hi @imaran_669 

 

1a)   Use dfRUT = Alteryx.read("#1") to load the input into a data frame. 

1b)   Change the loop to for row in range(len(dfRUT)): since we're not looping over an excel sheet any more.

 

2a)   Create an empty dataframe before your loop with dfOut = pd.DataFrame(columns=['html_page']). 

2b)   Append each new row to this dataframe. dfOut = dfOut.append({"html_page":[html_page]},ignore_index=True)

2c)   Use Alteryx.write(dfOut,1) to output the resultant rows with each element in it's own row

 

danilang_0-1628425504993.png

 

Dan

 

imaran_669
8 - Asteroid

Thanks!!! Exactly what I was looking for!!!

imaran_669
8 - Asteroid

Dan, the process works great but i have a problem due to the amount of numbers i have to process. The list has over 500.000 cases. I made it a bit more eficient working headless but I wonder if it is posible to split in sets of 100.000 cases and make the queries in paralel Chrome instasnces. 

 

Any ideas?

 

Thanks and regards!

 

Labels
Top Solution Authors