Hi Community - I have a use case where I need to automatically download files from a link within an email.
With the great guide of @DavidM and his article here https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Python-Code-Tool-Web-Scraping-Dynamic-... . I was able to open up a download link from an email, scrape the webpage, click the checkbox, and click download into our shared drive.
This however only worked when I ran the code within the Python Tool (file saved in shared drive).
When I ran the Alteryx workflow itself, the python code stopped when downloading into our shared drive (file not saved in shared drive).
It was able to open up the webpage and click the checkbox before closing the website automatically without download (no error message).
Hoping for some advice as I am new to Python! It's probably my slashes too but why does the code work inside the Python tool and not when I run Alteryx workflow? I used UNC path for the new download directory and to ChromeDriver as well because I would like to put the workflow on Alteryx Server.
from ayx import Alteryx
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
from selenium.webdriver.common.by import By
# Read in URL to download files from
dat = Alteryx.read("#1")
print(dat)
URL = dat['TextBody'].iloc[0]
# Change download location
options = webdriver.ChromeOptions()
options.add_experimental_option("prefs", {
"download.default_directory": r"\\UNC\path\directory",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"safebrowsing.enabled": True
})
# Start the WebDriver and load the page
driver = webdriver.Chrome("//UNC//path//chromedriver.exe", options=options)
# Enter URL you want to scrape
driver.get(URL)
# Wait 10 seconds until checkbox is clickable
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//*[text()='Select All']"))).click()
# Wait 5 seconds until download button is clickable
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.XPATH, "//*[text()='Download']"))).click()
Hi @yalteryx,
thanks for posting this.
just my two cents:
- My understanding is that this is being run against some internal system within your company
- It is going to be hard for anyone external to help with how the selenium actions are taken without approach to your internal system to test it out properly
- If the webpage opens, that means your chromium driver and all else are set correctly
- And the problem will be in this part of code, trying to call the elements
# Wait 10 seconds until checkbox is clickable
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//*[text()='Select All']"))).click()
# Wait 5 seconds until download button is clickable
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.XPATH, "//*[text()='Download']"))).click()
- I would suggest an approach of contacting your account executive who manages your account and try to establish whether there is an Alteryx partner in place for you who could help you write the code in a bit more project-like fashion as it can get quite tricky building those calls with Selenium
- Another approach may be to automate the website clicking by an RPA tool and pick up the data afterward with Alteryx
- Maybe at the same time if you don't prefer a partner nor RPA, try to do some more research on Selenium online
- One more shout - below is a sample code one of my colleagues once wrote for clicking, which may help give the general direction
Selenium with form and button
from ayx import Alteryx
Alteryx.installPackages("urllib3")
Alteryx.installPackages("selenium")
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
import pandas
import time
# Start the WebDriver and load the page
driver = webdriver.Chrome("C:/ChromeDriver/chromedriver")
driver.get("https://www.fincen.gov/reports/sar-stats")
#time.sleep(4)
# For dynamically generated websites wait for a specific ID tag
iframe = driver.find_element_by_xpath("//*[@id='block-sarstatsiframe']/div/p/iframe")
driver.switch_to.frame(iframe)
searchForm = driver.find_element_by_id("formInstitution_input").click();
element = driver.find_element_by_class_name("select2-group").click()
searchForm2 = driver.find_element_by_id("formSectionYear").click()
element2 = driver.find_element_by_xpath("//*[text()='2019']").click()
element3 = driver.find_element_by_id("formButtonGenerate").click()
time.sleep(4)
element4 = driver.find_element_by_id("formButtonCsv").click()
time.sleep(4)
driver.quit()