Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

how to append data frame output in python tool

knozawa
11 - Bolide

Hello,

 

Would it be possible to append data frame output in python tool?

I'm writing a web scrape program using a python tool.  This python tool contains a loop to scrape multiple pages from the website.

 

df = pandas.DataFrame.append({"html_page":[html_page]}) --> TypeError: append() missing 1 required positional argument: 'other'

for i in range (0,2):
page += i

webpage = webpage + str(page)
if __name__ == "__main__":
    main()

Alteryx.write(df,1)
9 REPLIES 9
Jean-Balteryx
16 - Nebula
16 - Nebula

Hi @knozawa ,

 

What do you mean by "append data frame" ?

 

Your line Alteryx.write(df,1) should return a data frame !

knozawa
11 - Bolide

@Jean-Balteryx 

 

When I ran the workflow, the output only contained the last iteration result.  I would like to append outputs from all the iterations in the python tool.

Jean-Balteryx
16 - Nebula
16 - Nebula

It looks like your instructions where you concatenates pages isn't in your for loop so it only use last page.

clmc9601
13 - Pulsar
13 - Pulsar

Hi @knozawa,

 

You need to create your output variable before the for loop and then add to the output variable during each iteration.

 

I'm having a hard time following exactly what your code is doing (especially since the indentations are not visible), but here's an example of what I mean:

 

outputdf = pd.DataFrame() # create your dataframe like you did earlier

for i in range(0,2):
   outputdf.append({'colName':i}) # this will append for EACH iteration, not just once

 

I hope this helps! 

If not, could you please specify what exactly you're trying to output and put the indentations back in your code?

knozawa
11 - Bolide

@clmc9601 

 

Thank you!  Sorry that the indentations were not visible. I was not sure where to put the append statement, so I placed it within the main() method which is called at every iteration.  When I placed the Alteryx.write(df,1) outside of the loop, it gave me TypeError.

...
data = Alteryx.read("#1")
df = pandas.DataFrame()

def main():
    wd = webdriver.Chrome("C:/Program Files/Alteryx/bin/Plugins/chromedriver.exe")
    wd.get(webpage)
...
    # And grab the page whole HTML source
    html_page = wd.page_source

    # Attempt to close chromium instance
    wd.quit()
    
    df.append({"html_page":[html_page]},ignore_index=True)
    
for x in range(0, 2):
    webpage = data['file'][x]
    print(webpage)
    if __name__ == "__main__":
        main()

#Write the data frame to Alteryx workflow for downstream processing
Alteryx.write(df,1) --> TypeError: [Datafile.writeData]: metadata arg is required for yxdb and expected to be dict like {'Field1': {'type': 'FixedDecimal', 'length': (8, 3), 'source': 'PythonTool:', 'description': 'my description'}, 'Field2': {...}}

 

clmc9601
13 - Pulsar
13 - Pulsar

Hi @knozawa,

 

Thanks for the indentations and longer code! Much easier to follow.

 

Ok, so that TypeError is misleading and actually means that you're trying to output an empty dataframe. Alteryx & Python refuse to output an empty dataframe. I think the empty dataframe is happening because you're not storing the changes made to df and because you're trying to use df as a global variable. All variables created within a function are local variables by default.

 

Since you're defining main() in your code, here are a couple of changes I'd make:

 

data = Alteryx.read("#1")
df = pandas.DataFrame()

def main(): 
    wd = webdriver.Chrome("C:/Program Files/Alteryx/bin/Plugins/chromedriver.exe")
    wd.get(webpage)
...
    # And grab the page whole HTML source
    html_page = wd.page_source

    # Attempt to close chromium instance
    wd.quit()
    
    return html_page # by returning the results, you can append it to a global dataframe external to the function
    
for x in range(0, 2):
    webpage = data['file'][x]
    print(webpage)
    if __name__ == "__main__":
        df = df.append({'html_page':main()}, ignore_index = True) # you have to redefine the df variable. if you just do df.append, the change is executed and then lost

#Write the data frame to Alteryx workflow for downstream processing
Alteryx.write(df,1)

 

I hope this helps! I tested as much as I could without having those exact packages, but there might still be syntax errors.

I look forward to hearing from you.

knozawa
11 - Bolide

@clmc9601 

 

Thank you very much!!  It solved the data frame issue and successfully appended multiple iteration results in the global data frame.

It's good to know that all variables created within a function are local variables by default, need to add a return statement within the function, and redefine the df variable to append.  It was really helpful and I learned a lot from this.  Thank you again!

 

clmc9601
13 - Pulsar
13 - Pulsar

Hi @knozawa,

 

I'm glad to hear it worked!

 

It is possible to reference global variables within a function by using a "global statement" within the function. Search this page for "global statement" for more details. In my opinion, it's generally more dynamic to not use global statements. Instead, just alter the global variables outside the function like in the above solution.

 

Happy solving!

knozawa
11 - Bolide

@clmc9601 

 

It's very helpful that you give me detailed explanations and hyper link references for more information.  Thank you again!

Labels