Let’s talk Alteryx Copilot. Join the live AMA event to connect with the Alteryx team, ask questions, and hear how others are exploring what Copilot can do. Have Copilot questions? Ask here!
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

WebScrapping with Alteryx + Selenium

imaran_669
8 - Asteroid

Hi community, I'm currently working on a project to switch a python process I have to incorporate it to an Alteryx workflow.

 

The process is very simple, selects a code, it types it on a webpage and brings the results, then on to the next one. 

 

Reading a bit the forums I managed to make it work in Alteryx but I need to do 2 modifications that I cant seem to get my head into.

 

1) Instead of using an excel file I want my data to come from the workflow. 

2) Instead o bulking all results to a single cell I want the output to go in each row of the iterated data. 

 

Can anyone guide me a bit here?

 

Thanks!!!

3 REPLIES 3
danilang
19 - Altair
19 - Altair

Hi @imaran_669 

 

1a)   Use dfRUT = Alteryx.read("#1") to load the input into a data frame. 

1b)   Change the loop to for row in range(len(dfRUT)): since we're not looping over an excel sheet any more.

 

2a)   Create an empty dataframe before your loop with dfOut = pd.DataFrame(columns=['html_page']). 

2b)   Append each new row to this dataframe. dfOut = dfOut.append({"html_page":[html_page]},ignore_index=True)

2c)   Use Alteryx.write(dfOut,1) to output the resultant rows with each element in it's own row

 

danilang_0-1628425504993.png

 

Dan

 

imaran_669
8 - Asteroid

Thanks!!! Exactly what I was looking for!!!

imaran_669
8 - Asteroid

Dan, the process works great but i have a problem due to the amount of numbers i have to process. The list has over 500.000 cases. I made it a bit more eficient working headless but I wonder if it is posible to split in sets of 100.000 cases and make the queries in paralel Chrome instasnces. 

 

Any ideas?

 

Thanks and regards!

 

Labels
Top Solution Authors