Important Community update: The process for changing your account details was updated on June 25th. Learn how this impacts your Community experience and the actions we suggest you take to secure your account here.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parsing Tables using Python within Alteryx

NeilFisk
9 - Comet

I have a workflow that I developed outside of Alteryx that parses all the tables in a PDF file.  I wanted to incorporate that workflow into Alteryx so I can then do downstream cleansing.  Unfortunately, I'm having challenges getting it to work.  The error I am getting is that no such file or directory exists, so I'm thinking I'm missing some syntax I'm not familiar with.  

 

Attached is the workflow (the path would have to be changed for someone else to run) as well as the PDF file.

 

Any assistance would be greatly appreciated.

 

Neil

5 REPLIES 5
apathetichell
18 - Pollux

two things:

1) you are passing in a dataframe and converting it to a string - try: FullPath = str(Alteryx.read("#1").iloc[0,0])

2) before you send this in - swap your "/" with a replace tool with a formula tool  - replace([field1],"/","\\")

 

PhilipMannering
16 - Nebula
16 - Nebula

As @apathetichell mentions, the key here is the `.iloc[0,0]`. You also need to make sure that Java is in your PATH environment variable (though assume this is done if it was working outside of Alteryx). You don't need to do anything with the forward/back slashes. Here's the full code, I used to get it to work,

 

 

import pandas as pd
from ayx import Alteryx
from tabula.io import read_pdf

# read in filepath
FullPath = Alteryx.read("#1").iloc[0,0]

# read in the PDF and create a dataframe list
dfs = read_pdf(FullPath, pages='all')

# combine dataframes within list to one complete file   
df_tables = pd.concat(dfs) 

# and then send it to one of the output anchors
Alteryx.write(df_tables, 1)

 

 

PhilipMannering_0-1683015719747.png

 

NeilFisk
9 - Comet

Thank you both.  The iloc[0,0] was all that was needed.  As a follow on question, when I run this in Python, I get index numbers that reset for each new table.  How do I get this when running the same code in Alteryx?  This is a valuable piece of information and will help me more quickly parse the tables within Alteryx.

PhilipMannering
16 - Nebula
16 - Nebula

Hey @NeilFisk 

 

You can use,

df_tables = df_tables.reset_index()

Just before writing out

NeilFisk
9 - Comet

Perfect!

Labels