ALTERYX INSPIRE | Join us this May for for a multi-day virtual analytics + data science experience like no other! Register Now

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Python Code Tool - Web Scraping Dynamic Websites Using Selenium

bwiegert
6 - Meteoroid

Hi David,

 

Thank you for this post, I have been able to use the information you presented here to set up some workflows that successfully pull daily numbers from a few different public health websites for COVID-19 tracking. My question is about scheduling these workflows on Alteryx server. Since the web driver (chromedriver, in my case) has to be in an "executable path", how is that set up on the server? Should an Alteryx server admin be able to install a web driver in an executable path accessible to just the server? 

 

Thanks again!

DavidM
Alteryx
Alteryx

Hi @bwiegert ,

 

Yeah pretty much. The setup on the Server will be the same or similar on your machine.

 

I would just suggest that you align with the admin on what path to put the EXE in. 

 

Could even be a shared drive with UNC location I think. 

 

That way, when you push the WF to Server the code would not have to get change to find the resource of the chromedriver on Server env.

 

d

David Matyas
Sales Engineer
Alteryx
Asanchez77
5 - Atom

Hello,

First of all thanks for the guide it is very useful,

After some time trying to execute the code and after solving some problems I got stuck in the following:

print("before")

# Start the WebDriver and load the page

wd = webdriver.Chrome("C:/Users/*****/AppData/Local/Alteryx/bin/Plugins/chromedriver")

print("after ")

# Enter URL you want to scrape

wd.get("https://www.six-group.com/exchanges/bonds/security_info_en.html?id=DE000A19W2L5EUR4")

 


When I execute the code, the browser will appear, but in the address it says "data ;," and it does nothing else, executing the code with the print command, it only executes the print "before".

When i close the browser that message apears:

SessionNotCreatedException                Traceback (most recent call last)
<ipython-input-12-efb5d50a84f6> in <module>
     11 # Start the WebDriver and load the page
     12 # Using Chromium Driver here, need to change path to match youe env
---> 13 wd = webdriver.Chrome("C:/Users/*******/AppData/Local/Alteryx/bin/Plugins/chromedriver")
     14 # Enter URL you want to scrape
     15 wd.get("https://www.six-group.com/exchanges/bonds/security_info_en.html?id=DE000A19W2L5EUR4")

c:\users\*********\appdata\local\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\selenium\webdriver\chrome\webdriver.py in __init__(self, executable_path, port, options, service_args, desired_capabilities, service_log_path, chrome_options, keep_alive)
     79                     remote_server_addr=self.service.service_url,
     80                     keep_alive=keep_alive),
---> 81                 desired_capabilities=desired_capabilities)     82         except Exception:
     83             self.quit()

c:\users\*******\appdata\local\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\selenium\webdriver\remote\webdriver.py in __init__(self, command_executor, desired_capabilities, browser_profile, proxy, keep_alive, file_detector, options)
    155             warnings.warn("Please use FirefoxOptions to set browser profile",
    156                           DeprecationWarning, stacklevel=2)
--> 157         self.start_session(capabilities, browser_profile)
    158         self._switch_to = SwitchTo(self)
    159         self._mobile = Mobile(self)

c:\users\********\appdata\local\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\selenium\webdriver\remote\webdriver.py in start_session(self, capabilities, browser_profile)
    250         parameters = {"capabilities": w3c_caps,
    251                       "desiredCapabilities": capabilities}
--> 252         response = self.execute(Command.NEW_SESSION, parameters)
    253         if 'sessionId' not in response:
    254             response = response['value']

c:\users\*******+\appdata\local\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\selenium\webdriver\remote\webdriver.py in execute(self, driver_command, params)
    319         response = self.command_executor.execute(driver_command, params)
    320         if response:
--> 321             self.error_handler.check_response(response)
    322             response['value'] = self._unwrap_value(
    323                 response.get('value', None))

c:\users\*********\appdata\local\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py in check_response(self, response)
    240                 alert_text = value['alert'].get('text')
    241             raise exception_class(message, screen, stacktrace, alert_text)
--> 242         raise exception_class(message, screen, stacktrace)
    243 
    244     def _value_or_default(self, obj, key, default):

SessionNotCreatedException: Message: session not created
from disconnected: Unable to receive message from renderer
  (Session info: chrome=86.0.4240.193)



I don't know if it could help me.

Thank you very much and greetings.

 

Shannila
6 - Meteoroid

I placed the chromedriver in system32 folder and explicitly indicate chromedriver.exe.

This worked when i tested.

 

from ayx import Alteryx

from selenium.webdriver.support.ui import WebDriverWait
from selenium import webdriver

# Start the WebDriver and load the page
# Using Chromium Driver here, need to change path to match youe env
options = webdriver.ChromeOptions();
options.add_argument("--no-sandbox")
options.add_argument("start-maximized");


driver = webdriver.Chrome(options=options,executable_path="C:\\Windows\\System32\\chromedriver.exe")

# Enter URL you want to scrape
driver.get("https://www.six-group.com/en/products-services/the-swiss-stock-exchange/market-data/bonds.html?id=DE...")

 

Shannila
6 - Meteoroid

I am having same issue. Works on desktop but when I want to schedule at the server, throwing lots of errors.

FileNotFoundError Traceback (most recent call last) t:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\selenium\webdriver\common\service.py in start(self) 75 stderr=self.log_file, ---> 76 stdin=PIPE) 77 except TypeError: t:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors) 728 errread, errwrite, --> 729 restore_signals, start_new_session) 730 except: t:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session) 1016 os.fspath(cwd) if cwd is not None else None, -> 1017 startupinfo) 1018 finally: FileNotFoundError:

jxjx
5 - Atom

Thank you for your sharing!

 

I am able to create a workflow that extract data from the website. I am able to run it manually but it fails when I schedule the job. Image below shows the error. I would love to hear for some advice. Thank you!

 

jxjx_0-1619495287254.png

 

 

andre_arellano
5 - Atom

Hi, 

Same exact problem: chromedriver crashes.

 

SessionNotCreatedException: Message: session not created
from disconnected: Unable to receive message from renderer
  (Session info: chrome=90.0.4430.93)
Labels