Alteryx Designer Desktop Discussions

knozawa · ‎07-26-2021

Hello,

Is there any way to use a field from input connection for the python tool? I'm using a python tool for web scraping. Currently, I'm directly embedding a URL in the python tool to scrape. However, I would like to dynamically scrape multiple pages on the website.

I tried to read the connected tool:

data = Alteryx.read("#1")
Alteryx.read(Alteryx.getIncomingConnectionNames()[0])
data_url = data['file']

wd = webdriver.Chrome("C:/Program Files/Alteryx/bin/Plugins/chromedriver.exe")

wd.get(data_url)

However, data_url is an "object" not a "string". wd.get(data_url) command gives an error message: # For dynamically generated websites wait for a specific ID tag.

Jean-Balteryx · ‎07-26-2021

Hi @knozawa ,

Does data_url[0] returns something ?

knozawa · ‎07-26-2021

@Jean-Balteryx

Thank you! Yes, it read the first record. Do you know if we can read multiple records from the incoming connection for scraping multiple pages in the python tool? In this case, I wonder if I need to use an iterative macro.

Jean-Balteryx · ‎07-26-2021

I'm not a python expert but maybe using slicing such as data_url[0:2] could work !

clmc9601 · ‎07-26-2021

Hi @knozawa,

@Jean-Balteryx is on the right track!

You can definitely reference specific fields from the input. If 'data' is the variable name from input anchor #1, you can do

data[rowNumber]['columnName' or columnNumber]

to reference a specific column from a specific row.

If you want to iterate through all the rows in the input, Python already lets you write for that.

I'd use a "for loop". More detailed information here: https://automatetheboringstuff.com/chapter2/. Search "in range" to skip straight to for loops.

It'll be something like:

for x in range(0, len(data)):
   variableYouChoose = data[x]['columnName']
   otherCodeHere...

I hope this helps! If it does, please consider marking it as a solution so others may find it.

knozawa · ‎07-26-2021

@clmc9601

Thank you! I added following code, but webpage = data[x]['file'] is having a key error:

KeyError                                  
     69 
     70 for x in range(0, len(data)):
---> 71     webpage = data[x]['file']
     72     if __name__ == "__main__":
     73         main()

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2897             if self.columns.nlevels > 1:
   2898                 return self._getitem_multilevel(key)
-> 2899             indexer = self.columns.get_loc(key)
   2900             if is_integer(indexer):
   2901                 indexer = [indexer]

clmc9601 · ‎07-26-2021

Hi @knozawa,

Sorry, I wrote rows and columns in the wrong order. Try this instead:

data['file'][x]

knozawa · ‎07-26-2021

@clmc9601

Thank you! It successfully iterated and read rows one at a time. I will close this case now.

However, output has only the last iteration result. I created another case since it's a bit different question: https://community.alteryx.com/t5/Alteryx-Designer-Discussions/how-to-append-data-frame-output-in-pyt...

If you can give me some suggestions, that would be helpful. Thanks!

clmc9601 · ‎07-26-2021

Sure, will do.

Alteryx Designer Desktop Discussions

Read a field from connected tool in python tool

Re: Is there any way the computer vision tools can...

Re: Batch Macro

Re: How to get cell reference address from excel

Re: Replacing Forecast columns with Actual Data

Re: Row creation