Hello,
Would it be possible to append data frame output in python tool?
I'm writing a web scrape program using a python tool. This python tool contains a loop to scrape multiple pages from the website.
df = pandas.DataFrame.append({"html_page":[html_page]}) --> TypeError: append() missing 1 required positional argument: 'other'
for i in range (0,2):
page += i
webpage = webpage + str(page)
if __name__ == "__main__":
main()
Alteryx.write(df,1)
Solved! Go to Solution.
Hi @knozawa ,
What do you mean by "append data frame" ?
Your line Alteryx.write(df,1) should return a data frame !
When I ran the workflow, the output only contained the last iteration result. I would like to append outputs from all the iterations in the python tool.
It looks like your instructions where you concatenates pages isn't in your for loop so it only use last page.
Hi @knozawa,
You need to create your output variable before the for loop and then add to the output variable during each iteration.
I'm having a hard time following exactly what your code is doing (especially since the indentations are not visible), but here's an example of what I mean:
outputdf = pd.DataFrame() # create your dataframe like you did earlier
for i in range(0,2):
outputdf.append({'colName':i}) # this will append for EACH iteration, not just once
I hope this helps!
If not, could you please specify what exactly you're trying to output and put the indentations back in your code?
Thank you! Sorry that the indentations were not visible. I was not sure where to put the append statement, so I placed it within the main() method which is called at every iteration. When I placed the Alteryx.write(df,1) outside of the loop, it gave me TypeError.
...
data = Alteryx.read("#1")
df = pandas.DataFrame()
def main():
wd = webdriver.Chrome("C:/Program Files/Alteryx/bin/Plugins/chromedriver.exe")
wd.get(webpage)
...
# And grab the page whole HTML source
html_page = wd.page_source
# Attempt to close chromium instance
wd.quit()
df.append({"html_page":[html_page]},ignore_index=True)
for x in range(0, 2):
webpage = data['file'][x]
print(webpage)
if __name__ == "__main__":
main()
#Write the data frame to Alteryx workflow for downstream processing
Alteryx.write(df,1) --> TypeError: [Datafile.writeData]: metadata arg is required for yxdb and expected to be dict like {'Field1': {'type': 'FixedDecimal', 'length': (8, 3), 'source': 'PythonTool:', 'description': 'my description'}, 'Field2': {...}}
Hi @knozawa,
Thanks for the indentations and longer code! Much easier to follow.
Ok, so that TypeError is misleading and actually means that you're trying to output an empty dataframe. Alteryx & Python refuse to output an empty dataframe. I think the empty dataframe is happening because you're not storing the changes made to df and because you're trying to use df as a global variable. All variables created within a function are local variables by default.
Since you're defining main() in your code, here are a couple of changes I'd make:
data = Alteryx.read("#1")
df = pandas.DataFrame()
def main():
wd = webdriver.Chrome("C:/Program Files/Alteryx/bin/Plugins/chromedriver.exe")
wd.get(webpage)
...
# And grab the page whole HTML source
html_page = wd.page_source
# Attempt to close chromium instance
wd.quit()
return html_page # by returning the results, you can append it to a global dataframe external to the function
for x in range(0, 2):
webpage = data['file'][x]
print(webpage)
if __name__ == "__main__":
df = df.append({'html_page':main()}, ignore_index = True) # you have to redefine the df variable. if you just do df.append, the change is executed and then lost
#Write the data frame to Alteryx workflow for downstream processing
Alteryx.write(df,1)
I hope this helps! I tested as much as I could without having those exact packages, but there might still be syntax errors.
I look forward to hearing from you.
Thank you very much!! It solved the data frame issue and successfully appended multiple iteration results in the global data frame.
It's good to know that all variables created within a function are local variables by default, need to add a return statement within the function, and redefine the df variable to append. It was really helpful and I learned a lot from this. Thank you again!
Hi @knozawa,
I'm glad to hear it worked!
It is possible to reference global variables within a function by using a "global statement" within the function. Search this page for "global statement" for more details. In my opinion, it's generally more dynamic to not use global statements. Instead, just alter the global variables outside the function like in the above solution.
Happy solving!
It's very helpful that you give me detailed explanations and hyper link references for more information. Thank you again!