Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Using dynamic tools to pick up pdf file names and use as sheet name in excel file

LFLee
8 - Asteroid

I've done up a pdf extraction macro using to parse the data in the pdf files in a directory and output the results in excel. Currently, the extracted data of all files sit on one excel sheet.

 

I would like for each file's data to be output to separate sheet on the excel output file by using the pdf file name. Is there any way this can be done using the dynamic tools or any other tools in Alteryx to achieve this?

 

I've attached my macro workflow here.

 

Any assistance on this would be much appreciated.

8 REPLIES 8
CarlDi
Alteryx
Alteryx

Hi @LFLee

 

I tried running your macro but it error'd out. Do you have a column that maps the PDF file's name? If so, in the Output data tool, click on the 'Take File/Table Name From Field' checkbox. In The dropdown below, select 'Change File/Table Name' and in the 2nd dropdown, select the column with PDF filenames. See the screenshot below:

 

2019-10-28_20-53-17.jpg

Hope that helps!

LFLee
8 - Asteroid

Hi @CarlDi , unfortunately there is no column for the PDF file's name. I've attached the output below for your reference on how it looks like, hence, was hoping that there is a way to pick up the input file name from the selected file. I've also attached the work flow again that contains the batch macro.

 

Any assistance on this would be much appreciated.

ganesamurthy
8 - Asteroid

cond.install <- function(package.name){
options(repos = "http://cran.rstudio.com") #set repo
#check for package in library, if package is missing install
if(package.name%in%rownames(installed.packages())==FALSE) {
install.packages(package.name)}else{require(package.name, character.only = TRUE)}}

cond.install("pdftools")
cond.install("tesseract")

files <- list.files(path="C:\\Users\\ganes\\Desktop\\Training\\AlterYX\\Alteryx Solution\\PDF Read Batch Macro", pattern="*.pdf", full.names=TRUE, recursive=FALSE)
for (file in files){
pngfile <- pdftools::pdf_convert(file,dpi = 200)
text <- tesseract::ocr(pngfile)
x <- data.frame(text,file)
write.Alteryx(x,1)

 

Use this additional information in the output of your R code, then you will get the full path of the file name. From this you shall extract the Filename that you can use in Output Data tool to write into separate sheet, as suggested.

LFLee
8 - Asteroid

Hi @ganesamurthy , thanks for sharing this helpful tip. I was just running the macro and realise that the same file was being extracted twice and I'm thinking it is due to the specification of the input directory twice, one on the workflow interface and the other on the macro interface. 

 

Is there any way to update the R script to ensure that it is not extracting it twice? 

 

Any assistance on the updated R coding will be much appreciated. I've attached the workflow for reference.

LFLee
8 - Asteroid

Hi @ganesamurthy , thanks for sharing this useful tip. While I can view the file name in the results section when I click on the macro in my workflow which embeds the macro, I can't seem to extract the file name using the output data tool. I've tried the dynamic tools as well but can't seem to get it.

 

Any idea how this can be done? Any help on this is much appreciated.

ganesamurthy
8 - Asteroid
If you see same file name multiple times, it could mean that particular PDF file has multiple pages that is produced as multiple rows of output. Can you please validate this?

I didn't understand your other question, so please me some screenshots of what you are talking about.

Thanks
LFLee
8 - Asteroid

Hi @ganesamurthy , the first pdf file has 7 pages however, the same output is duplicated over two rows for the same page of data extracted. I've attached a copy of the test output for your reference.

 

For the other question on extracting the file name from the output from R and use as sheet name in excel file, I have tried to use the output tool configuration (as seen in the screenshot below) but it is not picking up the file name but the value of what was extracted as shown in the attachment (test 3.xlsx) and it also shows an error message as well (refer to file attached).

 

I've also attached a screenshot on the results preview when I click on the pdf extraction macro.

 

Any suggestions or idea on how to map the file name to the output file?

 

Apologies for the numerous questions as I'm new to Alteryx.

ganesamurthy
8 - Asteroid

This Macro should not be batch, it should be standard macro. I am attaching the workflow that you need to acheive your result.

 

Hope it works well.

Labels