Hello,
Is there any way to use a field from input connection for the python tool? I'm using a python tool for web scraping. Currently, I'm directly embedding a URL in the python tool to scrape. However, I would like to dynamically scrape multiple pages on the website.
I tried to read the connected tool:
data = Alteryx.read("#1")
Alteryx.read(Alteryx.getIncomingConnectionNames()[0])
data_url = data['file']
wd = webdriver.Chrome("C:/Program Files/Alteryx/bin/Plugins/chromedriver.exe")
wd.get(data_url)
However, data_url is an "object" not a "string". wd.get(data_url) command gives an error message: # For dynamically generated websites wait for a specific ID tag.
Solved! Go to Solution.
Thank you! Yes, it read the first record. Do you know if we can read multiple records from the incoming connection for scraping multiple pages in the python tool? In this case, I wonder if I need to use an iterative macro.
I'm not a python expert but maybe using slicing such as data_url[0:2] could work !
Hi @knozawa,
@Jean-Balteryx is on the right track!
You can definitely reference specific fields from the input. If 'data' is the variable name from input anchor #1, you can do
data[rowNumber]['columnName' or columnNumber]
to reference a specific column from a specific row.
If you want to iterate through all the rows in the input, Python already lets you write for that.
I'd use a "for loop". More detailed information here: https://automatetheboringstuff.com/chapter2/. Search "in range" to skip straight to for loops.
It'll be something like:
for x in range(0, len(data)):
variableYouChoose = data[x]['columnName']
otherCodeHere...
I hope this helps! If it does, please consider marking it as a solution so others may find it.
Thank you! I added following code, but webpage = data[x]['file'] is having a key error:
KeyError
69
70 for x in range(0, len(data)):
---> 71 webpage = data[x]['file']
72 if __name__ == "__main__":
73 main()
c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2897 if self.columns.nlevels > 1:
2898 return self._getitem_multilevel(key)
-> 2899 indexer = self.columns.get_loc(key)
2900 if is_integer(indexer):
2901 indexer = [indexer]
Thank you! It successfully iterated and read rows one at a time. I will close this case now.
However, output has only the last iteration result. I created another case since it's a bit different question: https://community.alteryx.com/t5/Alteryx-Designer-Discussions/how-to-append-data-frame-output-in-pyt...
If you can give me some suggestions, that would be helpful. Thanks!
Sure, will do.