This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
General Discussions has some can't miss conversations going on right now! From conversations about automation to sharing your favorite Alteryx memes, there's something for everyone. Make it part of your community routine!
hi @esridhar126 i think that you may need to ask your IT to add you full permissions in WINDOWS on ALTERYX folder including all the%alteryx%\bin\miniconda3\pythontool_venv and its subdirectories/ files
Awesome! But in my case, I want to extract from several pdf files in one directory and these were my steps which didn't work.
I used the directory tool, and used a wild card but this didnt work.
How do I go about this?
Hi @tochy,
I would suggest you create a batch macro which contains that Python tool reading PDF inputs.
A control parameter would be used to reconfigure the macro every time for every single PDF file you are trying to read.
You can refer to
https://community.alteryx.com/t5/Alteryx-Knowledge-Base/The-Ultimate-Input-Data-Flowchart/ta-p/20480
and
https://www.youtube.com/watch?v=YIAbQGQ_Hkg
cheers,
d
I have an 87 page document and each page contains a table. I tried to use the iteration below but it keeps only extracting the table on the first page. Any ideas?
Thanks David for your time.
Is the 80 pages all with the same pdf schema?
83 of the 87 pages have the same schema. The pdf only contain tables.
Does is just dont go beyond page 1 even on shorter docs that have the same schema?
Yes.
Did you try the improvement of the code i suggested a few posts back on reading multiple tables?
Yes, but it still reads only page 1.
I have sent you an email. Let me know what you think.
Thanks a bunch!
Hi @tochy,
Yeah i think i have it. I could not test this on your pdf (got filtered out by a spam filter) but tested on my foo.pdf with multiple tables across multiple pages.
There was a need to loop through that tables list + actually specify pages range in that camelot.read_pdf call.
Without those pages spec it just did not work.
Something like this should fix the problem
#Parse the tabular data
import camelot
#specify the path to your PDF document
#need to include param pages to go beyond page 1
tables = camelot.read_pdf('foo-more-tables.pdf', pages='1-2')
#Get the dataframe from the PDF table data
output_number = 1
#Loop through the tables and output all of them
for table in tables:
df = table.df
#print(df)
output_number+=1
#Write the dataframe with tabular data to the tool output number 1
#Alteryx.write(df,1)
And get you something like this
From a PDF like this
And if you just want your program to run through all pages without specifying page number you can replace the last page # with 'end':
tables = camelot.read_pdf('foo-more-tables.pdf', pages='1-end')