ALTERYX INSPIRE | Join us this May for for a multi-day virtual analytics + data science experience like no other! Register Now

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Python Code Tool - Web Scraping Dynamic Websites Using Selenium

Reggie1995
8 - Asteroid

Reggie1995_0-1593466647481.png

Hey David - Could you possibly help me transcribe the error that I am receiving here? @DavidM 

DavidM
Alteryx
Alteryx
Can you please share the full error message from the Python Code tool running the code interactively?

What is the portion of the code causing this?

What are you trying to achieve?

David Matyas | Sales Engineer
Alteryx Prague, Czech Republic
Mobile: +420 725 919 975
Email: dmatyas@alteryx.com | www.alteryx.com<>

[cid:C839E3C8-2DCA-4CEB-AC26-67832AAB33A0@extendthereach.com]
David Matyas
Sales Engineer
Alteryx
Reggie1995
8 - Asteroid

I am trying to utilize your code to extract all of the inspect elements from a site similar to your original post.  the problem is, when I run (even with your site you were using, it errored and showed the error I presented.

 

Reggie1995_0-1593468309186.png

 

DavidM
Alteryx
Alteryx

@Reggie1995  just by looking at your code what catches my eye is that you are not correctly specifying the path to the driver. See in my original post that Python needs / instead of \ in the path definition.

David Matyas
Sales Engineer
Alteryx
sraheja21
5 - Atom

@DavidM

 

I need your help with the python tool i am using in my code. I am using selenium to connect to a server ( a site ) to download the data from the site. Clicking on the URL, downloads the .csv file to my download folder on my desktop. Could you please help me with a code so that i could handle the csv file in python output that could be utilised in my workflow instead downloading that to my downloads folder. 

 

Below is the code that downloads the file to my downloads folder -  

 

 

download_output = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.ID, 'ctl00_cphMain_lnkDownload')))
try:
driver.execute_script("window.scrollBy(0, -150);")
driver.execute_script("arguments[0].click();", download_output)

except ElementClickInterceptedException:
print("Mouse click is observed.")
pass
except TimeoutException:
print("DOWNLOAD OUTPUT: Download option of EXCEL is not available.")

Shannila
6 - Meteoroid

Amazing guide! I managed to pick up on how to use Python+Selenium, tweaked the codes (after hours of googling) and got my workflow working!

 

Using the original workflow, I was getting this error: 

WebDriverException: Message: unknown error

I had to add these lines to fix it. In case anyone else get the same error, could try it.

 

options = webdriver.ChromeOptions(); <==
options.add_argument("--no-sandbox")  <==
driver = webdriver.Chrome(options=options,executable_path="C:\ProgramData\Alteryx\chromedriver.exe")

 

Thank you!

FlorianWalter
5 - Atom

Thanks, David for your great support. It's the first time that I have used Python Code Tool - so excuse my dumb question.
I am getting a NameError: Alteryx is not known. Which mistake did I do?

DavidM
Alteryx
Alteryx

Hi @FlorianWalter ,

 

I think you are not running this section of the code that imports Alteryx package

 

from ayx import Alteryx

#install the selenium and urllib3 packages
#you may need to elevate privileges for running Designer to admin for doing this
Alteryx.installPackages("urllib3")
Alteryx.installPackages("selenium")
David Matyas
Sales Engineer
Alteryx
FlorianWalter
5 - Atom

Alright - I thought I have to run the codes after each other. Worked now well but somehow it seems that it the data load takes too long now. Any recommendations? Thanks, David.

sohamkale
5 - Atom

I have the same problem. I am trying to download some html content which is rendered using viewer.js and pdf.js so I was thinking of using this script to get that exact element but I keep getting the exception that 

Loading took too much time!

There are no errors other than that and I am able to download the page source but since the rendering is done using JS I don't the element that I need.

I have tried using the sample code provided by you but even then I get the same exception. I have also tried to increase the delay but to no avail. Please help me out. I am fairly new to python and alteryx but this is a really important task that I need to complete as soon as possible. I am attaching my workflow with this. Please find the same below. Any help is really appreciated. Thanks 

Labels