Dear All,
Do we have any workflows to validate the VAT ID pertains to UK using below website.
https://www.tax.service.gov.uk/check-vat-number/known
as this was previously used to validate the status of VAT ID in VIES web portal.
Regards
Mahadeva Swamy
An API based approach is always preferable, but I was able to leverage Alteryx + Python + Selenium to remote control my web browser, load the value, hit enter, and scrape the resulting page. I took my inspiration from this page: https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Python-Code-Tool-Web-Scraping-Dynamic-...
Here is my Python script that I used for the process:
# Import necessary requirements and install if does not exist
from ayx import Package
Package.installPackages(['pandas','numpy','selenium', 'urllib3'])
from ayx import Alteryx
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from time import sleep
import pandas
#Read in VAT Number
VATdf = Alteryx.read("#1")
VAT = str(VATdf.iloc[0, 0])
# Start the WebDriver and load the page
wd = webdriver.Chrome("C:/Program Files/Alteryx/bin/Plugins/chromedriver")
wd.get("https://www.tax.service.gov.uk/check-vat-number/enter-vat-details")
# VAT variable to be sent in to target element
#VAT = "GB243510593"
#Identify target element, send VAT, and hit return
elem = wd.find_element_by_id("target")
elem.send_keys(VAT)
elem.send_keys(Keys.RETURN)
# let page load for a few seconds before scraping
time.sleep(5)
# And grab the page whole HTML source
html_page = wd.page_source
# Attempt to close chromium instance
wd.quit()
# Turn the variabe with html page into Pandas' DF
df = pandas.DataFrame({"html_page":[html_page]})
# Write the data frame to Alteryx workflow for downstream processing
Alteryx.write(df,1)
You can either turn this workflow into a macro where you add a control parameter and action tool to the text input and a macro output at the end, or you could modify the python scrip to loop through every value to output a row to then be parsed. Hopefully this is a helpful start! I have attached my workflow to this post.
Dear @BrandonB ,
thanks for the time and effort on this.
However, when I check in my C drive I did not fine "chromedriver", can you please advise where do I download the same in order to run the scrip given in the workflow.
Correct, that’s because it is an additional download found in the first link. You will need the correct chrome driver for whatever version of chrome that you have. http://chromedriver.chromium.org/downloads
You will want to put the chromedriver.exe file in the C drive location mentioned in the script so that it can be referenced.
I am getting the below error when I run the above attached Workflow
Error: Python (2): ---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
<ipython-input-1-1634d9cc6e05> in <module>
1 # Import necessary requirements and install if does not exist
2 from ayx import Package
----> 3 Package.installPackages(['pandas','numpy','selenium', 'urllib3'])
4 from ayx import Alteryx
5 from selenium import webdriver
c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\Package.py in installPackages(package, install_type, debug)
200 print(pip_install_result["msg"])
201 if not pip_install_result["success"]:
--> 202 raise pip_install_result["err"]
c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\Utils.py in runSubprocess(args_list, debug)
118
119 try:
--> 120 result = subprocess.check_output(args_list, stderr=subprocess.STDOUT)
121 if debug:
122 print("[Subprocess success!]")
c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\subprocess.py in check_output(timeout, *popenargs, **kwargs)
409 kwargs['input'] = '' if kwargs.get('universal_newlines', False) else b''
410
--> 411 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
412 **kwargs).stdout
413
c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\subprocess.py in run(input, capture_output, timeout, check, *popenargs, **kwargs)
510 retcode = process.poll()
511 if check and retcode:
--> 512 raise CalledProcessError(retcode, process.args,
513 output=stdout, stderr=stderr)
514 return CompletedProcess(process.args, retcode, stdout, stderr)
CalledProcessError: Command '['c:\\program files\\alteryx\\bin\\miniconda3\\envs\\jupytertool_venv\\python.exe', '-I', '-m', 'pip', 'install', 'pandas', 'numpy', 'selenium', 'urllib3']' returned non-zero exit status 1.
This Python script uses additional packages so Alteryx must be launched as administrator the first time and then run this workflow so that it can install the required packages. Then the workflow will work as expected
@BrandonB thanks for the help, however what do you mean by run as administrator, since this License is procured and given by our company and we are also using the same as administrator only, is there any specific changes required while launching this. please help on this.
Regards
Mahadev
You will want to hold down the shift button and right click the Alteryx Designer icon and then click “Run as administrator”
Hi @BrandonB , I did as advised but still getting the below error.
Error: Python (2): ---------------------------------------------------------------------------
PermissionError Traceback (most recent call last)
c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\selenium\webdriver\common\service.py in start(self)
71 cmd.extend(self.command_line_args())
---> 72 self.process = subprocess.Popen(cmd, env=self.env,
73 close_fds=platform.system() != 'Windows',
c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text)
853
--> 854 self._execute_child(args, executable, preexec_fn, close_fds,
855 pass_fds, cwd, env,
c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session)
1306 try:
-> 1307 hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
1308 # no special security
PermissionError: [WinError 5] Access is denied
During handling of the above exception, another exception occurred:
WebDriverException Traceback (most recent call last)
<ipython-input-2-07b44840375d> in <module>
1 # Start the WebDriver and load the page
----> 2 wd = webdriver.Chrome("C:/Program Files/Alteryx/bin/Plugins/chromedriver")
3 wd.get("https://www.tax.service.gov.uk/check-vat-number/enter-vat-details")
4
5 # VAT variable to be sent in to target element
c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\selenium\webdriver\chrome\webdriver.py in __init__(self, executable_path, port, options, service_args, desired_capabilities, service_log_path, chrome_options, keep_alive)
71 service_args=service_args,
72 log_path=service_log_path)
---> 73 self.service.start()
74
75 try:
c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\selenium\webdriver\common\service.py in start(self)
84 )
85 elif err.errno == errno.EACCES:
---> 86 raise WebDriverException(
87 "'%s' executable may have wrong permissions. %s" % (
88 os.path.basename(self.path), self.start_error_message)
WebDriverException: Message: 'chromedriver' executable may have wrong permissions. Please see https://sites.google.com/a/chromium.org/chromedriver/home
Hi All
Do you have maybe the workflow in Alteryx?
Thanks, Kinga