Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

WebScrapping with Alteryx + Selenium

imaran_669
8 - Asteroid

Hi community, I'm currently working on a project to switch a python process I have to incorporate it to an Alteryx workflow.

 

The process is very simple, selects a code, it types it on a webpage and brings the results, then on to the next one. 

 

Reading a bit the forums I managed to make it work in Alteryx but I need to do 2 modifications that I cant seem to get my head into.

 

1) Instead of using an excel file I want my data to come from the workflow. 

2) Instead o bulking all results to a single cell I want the output to go in each row of the iterated data. 

 

Can anyone guide me a bit here?

 

Thanks!!!

3 REPLIES 3
danilang
19 - Altair
19 - Altair

Hi @imaran_669 

 

1a)   Use dfRUT = Alteryx.read("#1") to load the input into a data frame. 

1b)   Change the loop to for row in range(len(dfRUT)): since we're not looping over an excel sheet any more.

 

2a)   Create an empty dataframe before your loop with dfOut = pd.DataFrame(columns=['html_page']). 

2b)   Append each new row to this dataframe. dfOut = dfOut.append({"html_page":[html_page]},ignore_index=True)

2c)   Use Alteryx.write(dfOut,1) to output the resultant rows with each element in it's own row

 

danilang_0-1628425504993.png

 

Dan

 

imaran_669
8 - Asteroid

Thanks!!! Exactly what I was looking for!!!

imaran_669
8 - Asteroid

Dan, the process works great but i have a problem due to the amount of numbers i have to process. The list has over 500.000 cases. I made it a bit more eficient working headless but I wonder if it is posible to split in sets of 100.000 cases and make the queries in paralel Chrome instasnces. 

 

Any ideas?

 

Thanks and regards!

 

Labels