ALTERYX INSPIRE | Join us this May for for a multi-day virtual analytics + data science experience like no other! Register Now

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Python Code Tool - Web Scraping Dynamic Websites Using Selenium

DavidM
Alteryx
Alteryx

Hi @ibesmond,

 

Seems like you were able to install the chromedriver correctly but the very last printscreen shows that you are experiencing some network issues.

 

I would suggest revisiting this with your IT and point them to what you are trying to do + point them to https://chromedriver.chromium.org/security-considerations

 

Another thing you can do is to research another driver - for instance Firefox/ geckodriver.

 

I would also be curious to see the full exception/error message in the Python tool alone (interactive mode) - as you have only shared a small portion of the exception there, so i don't see it whole.

 

David

David Matyas
Sales Engineer
Alteryx
ibesmond
8 - Asteroid

Hi @DavidM .

 

I came from an accounting background and haven't written HTML since myspace first came out.  20 years later I haven't done much coding.

 

I downloaded geckodriver and tried to use sublime text to run the python code.  I got these errors for both methods. Would it be helpful Re-run the code in pyCharm if that would help.  I do know what you mean by Python tool alone (interactive mode)? Can I run that?

 

sublime text.png

DavidM
Alteryx
Alteryx

Hi @ibesmond,

 

Can you please run the worklfow in Alteryx. And once done, click on the PYTHON CODE tool icon which is part of that worklfow, and on the left part of the screen scroll in the code until you find the errror message. Please if you can share that it would be great.

 

david

David Matyas
Sales Engineer
Alteryx
ibesmond
8 - Asteroid

Here is what I can see @DavidM 

 

---------------------------------------------------------------------------
SessionNotCreatedException                Traceback (most recent call last)
<ipython-input-2-fcccb8dfbe42> in <module>
      8 # Start the WebDriver and load the page
      9 # Using Chromium Driver here, need to change path to match youe env
---> 10 wd = webdriver.Chrome("C:/webdrivers/chromedriver")
     11 
     12 # Enter URL you want to scrape

c:\users\ibesmond\appdata\local\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\selenium\webdriver\chrome\webdriver.py in __init__(self, executable_path, port, options, service_args, desired_capabilities, service_log_path, chrome_options, keep_alive)
     79                     remote_server_addr=self.service.service_url,
     80                     keep_alive=keep_alive),
---> 81                 desired_capabilities=desired_capabilities)     82         except Exception:
     83             self.quit()

c:\users\ibesmond\appdata\local\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\selenium\webdriver\remote\webdriver.py in __init__(self, command_executor, desired_capabilities, browser_profile, proxy, keep_alive, file_detector, options)
    155             warnings.warn("Please use FirefoxOptions to set browser profile",
    156                           DeprecationWarning, stacklevel=2)
--> 157         self.start_session(capabilities, browser_profile)
    158         self._switch_to = SwitchTo(self)
    159         self._mobile = Mobile(self)

c:\users\ibesmond\appdata\local\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\selenium\webdriver\remote\webdriver.py in start_session(self, capabilities, browser_profile)
    250         parameters = {"capabilities": w3c_caps,
    251                       "desiredCapabilities": capabilities}
--> 252         response = self.execute(Command.NEW_SESSION, parameters)
    253         if 'sessionId' not in response:
    254             response = response['value']

c:\users\ibesmond\appdata\local\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\selenium\webdriver\remote\webdriver.py in execute(self, driver_command, params)
    319         response = self.command_executor.execute(driver_command, params)
    320         if response:
--> 321             self.error_handler.check_response(response)
    322             response['value'] = self._unwrap_value(
    323                 response.get('value', None))

c:\users\ibesmond\appdata\local\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py in check_response(self, response)
    240                 alert_text = value['alert'].get('text')
    241             raise exception_class(message, screen, stacktrace, alert_text)
--> 242         raise exception_class(message, screen, stacktrace)
    243 
    244     def _value_or_default(self, obj, key, default):

SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 83


 

You need to install the ChromeDriver to support Seleniumhttp://chromedriver.chromium.org/downloads

Adjust the path to the driver in the webdriver portion above to your chromium driver

Change the URL you want to scrape in the code above

To be able to scrape the dynamically generated websites (takes some time to fully load) you need to specify an ID of a html tag to wait for. Here "SIP_OV_ClosingPrice" is used. You need to change this to match your use case.

DavidM
Alteryx
Alteryx

Hi @ibesmond ,

 

I think the part that says 

This version of ChromeDriver only supports Chrome version 83

is crucial here.

 

not sure whether its an old version of Chrome you may be using.

 

you may need to find version of Chrome in place and re-download proper version of Chromium Driver

 

http://chromedriver.chromium.org/downloads

David Matyas
Sales Engineer
Alteryx
ibesmond
8 - Asteroid

That did it.  @DavidM . I could have sworn I checked and Chrome was on 83. Downloaded Chromedriver 81. Boom.  Can't believe I mixed that up.  Thank you a hundred times over! 

PragyaChouksey
5 - Atom

Hi,

I'm trying to install Selenium via Alteryx Python tool using below code. However it gives the error. It seems like a permission issue. I'm not sure if I need to raise a request to install Selenium at the specified path in Alteryx folder (c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages). Can you please advise on this.

 

# List all non-standard packages to be imported by your
# script here (only missing packages will be installed)
from ayx import Package
Package.installPackages(['Selenium'])

 

ERROR -

 

Collecting Selenium
Using cached https://files.pythonhosted.org/packages/80/d6/4294f0b4bce4de0abf13e17190289f9d0613b0a44e5dd6a7f5ca98...
Requirement already satisfied: urllib3 in c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages (from Selenium) (1.25.2)
Installing collected packages: Selenium
ERROR: Could not install packages due to an EnvironmentError: [WinError 5] Access is denied: 'c:\\program files\\alteryx\\bin\\miniconda3\\envs\\jupytertool_venv\\Lib\\site-packages\\selenium'
Consider using the `--user` option or check the permissions.
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
<ipython-input-5-57531c9622dd> in <module>
2 # script here (only missing packages will be installed)
3 from ayx import Package
----> 4 Package.installPackages(['Selenium'])

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\Package.py in installPackages(package, install_type, debug)
200 print(pip_install_result["msg"])
201 if not pip_install_result["success"]:
--> 202 raise pip_install_result["err"]

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\Utils.py in runSubprocess(args_list, debug)
118
119 try:
--> 120 result = subprocess.check_output(args_list, stderr=subprocess.STDOUT)
121 if debug:
122 print("[Subprocess success!]")

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\subprocess.py in check_output(timeout, *popenargs, **kwargs)
354
355 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
--> 356 **kwargs).stdout
357
358

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\subprocess.py in run(input, timeout, check, *popenargs, **kwargs)
436 if check and retcode:
437 raise CalledProcessError(retcode, process.args,
--> 438 output=stdout, stderr=stderr)
439 return CompletedProcess(process.args, retcode, stdout, stderr)
440

CalledProcessError: Command '['c:\\program files\\alteryx\\bin\\miniconda3\\envs\\jupytertool_venv\\python.exe', '-m', 'pip', 'install', 'Selenium']' returned non-zero exit status 1.

DavidM
Alteryx
Alteryx

Hi @PragyaChouksey 

 

This looks like permissions issue. You may need to run Alteryx in admin mode to install the package.

 

Plan B is try to install it using CONDA

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Manage-packages-with-Conda-for-the-Pyt...

 

dm

David Matyas
Sales Engineer
Alteryx
sraheja21
5 - Atom

Hi Guys,

 

I need some help here. 

 

I have already established a connection with the website from where I need to download the data using the chromedriver and packaged like selenium. When I am running the code it is opening up a window in which there are multiple steps taking place to download an excel file but that is happening currently on my desktop. My main goal is to download the excel file in Alteryx so that i could use the same as input in my Alteryx workflow. 

 

Any help here would be much appreciated. 

 

Gist - My code is running perfectly fine in Jupyter in downloading the excel file from a website but the same code in Python tool in Alteryx is downloading the same excel file but on desktop rather i want that file to be considered as input to Alteryx. 

DavidM
Alteryx
Alteryx

@sraheja21 I would suggest you download the file to your file system and then pick it up with an INPUT DATA tool.

 

This will be much simpler than coding the excel parsing directly in Python Code tool of the scratch.

David Matyas
Sales Engineer
Alteryx
Labels