Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Selenium Web Scrape works correctly within Python Tool but not when I run Alteryx workflow

yalteryx
6 - Meteoroid

Hi Community - I have a use case where I need to automatically download files from a link within an email.

 

With the great guide of @DavidM and his article here https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Python-Code-Tool-Web-Scraping-Dynamic-... .  I was able to open up a download link from an email, scrape the webpage, click the checkbox, and click download into our shared drive.   

 

This however only worked when I ran the code within the Python Tool (file saved in shared drive). 

When I ran the Alteryx workflow itself, the python code stopped when downloading into our shared drive (file not saved in shared drive).

It was able to open up the webpage and click the checkbox before closing the website automatically without download (no error message). 

Hoping for some advice as I am new to Python!  It's probably my slashes too but why does the code work inside the Python tool and not when I run Alteryx workflow? I used UNC path for the new download directory and to ChromeDriver as well because I would like to put the workflow on Alteryx Server.

 

 

from ayx import Alteryx
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
from selenium.webdriver.common.by import By

# Read in URL to download files from
dat = Alteryx.read("#1")
print(dat)
URL = dat['TextBody'].iloc[0]

# Change download location
options = webdriver.ChromeOptions()
options.add_experimental_option("prefs", {
  "download.default_directory": r"\\UNC\path\directory",
  "download.prompt_for_download": False,
  "download.directory_upgrade": True,
  "safebrowsing.enabled": True
})

# Start the WebDriver and load the page
driver = webdriver.Chrome("//UNC//path//chromedriver.exe", options=options)

# Enter URL you want to scrape
driver.get(URL)

# Wait 10 seconds until checkbox is clickable
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//*[text()='Select All']"))).click()

# Wait 5 seconds until download button is clickable
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.XPATH, "//*[text()='Download']"))).click()

 

workflow.jpgresults.jpg

 

 

1 REPLY 1
DavidM
Alteryx
Alteryx

Hi @yalteryx,

 

thanks for posting this. 

 

just my two cents:

- My understanding is that this is being run against some internal system within your company

- It is going to be hard for anyone external to help with how the selenium actions are taken without approach to your internal system to test it out properly

- If the webpage opens, that means your chromium driver and all else are set correctly

- And the problem will be in this part of code, trying to call the elements

# Wait 10 seconds until checkbox is clickable
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//*[text()='Select All']"))).click()

# Wait 5 seconds until download button is clickable
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.XPATH, "//*[text()='Download']"))).click()

- I would suggest an approach of contacting your account executive who manages your account and try to establish whether there is an Alteryx partner in place for you who could help you write the code in a bit more project-like fashion as it can get quite tricky building those calls with Selenium

- Another approach may be to automate the website clicking by an RPA tool and pick up the data afterward with Alteryx

- Maybe at the same time if you don't prefer a partner nor RPA, try to do some more research on Selenium online

- One more shout - below is a sample code one of my colleagues once wrote for clicking, which may help give the general direction

 

Selenium with form and button



from ayx import Alteryx



Alteryx.installPackages("urllib3")

Alteryx.installPackages("selenium")





from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.common.exceptions import TimeoutException

import pandas

import time



# Start the WebDriver and load the page

driver = webdriver.Chrome("C:/ChromeDriver/chromedriver")

driver.get("https://www.fincen.gov/reports/sar-stats")

#time.sleep(4)

# For dynamically generated websites wait for a specific ID tag

iframe = driver.find_element_by_xpath("//*[@id='block-sarstatsiframe']/div/p/iframe")

driver.switch_to.frame(iframe)

searchForm = driver.find_element_by_id("formInstitution_input").click();

element = driver.find_element_by_class_name("select2-group").click()



searchForm2 = driver.find_element_by_id("formSectionYear").click()

element2 = driver.find_element_by_xpath("//*[text()='2019']").click()





element3 = driver.find_element_by_id("formButtonGenerate").click()

time.sleep(4)

element4 = driver.find_element_by_id("formButtonCsv").click()

time.sleep(4)

driver.quit()

 

 

David Matyas
Sales Engineer
Alteryx
Labels