Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Error in read.Alteryx when converting PDF to Excel

organicchocolate
6 - Meteoroid

Hi community,

 

I am having trouble below with an error in read.Alteryx, after following the instructions below the photo for converting PDF to Excel. Any tips? Thank you in advance!

 

organicchocolate_1-1598049129559.png

 

 

install.packages("Rcpp",  dependencies = TRUE, repos = "http://cran.us.r-project.org")

install.packages("pdftools",  dependencies = TRUE, repos = "http://cran.us.r-project.org")

 

Note: The Rcpp package is a dependency and is not necessary but I use it to prevent issues that occur with other R GUI's.

 

Now define your data input (The FilePath to your pdf found using the directory tool)

 

data <- read.Alteryx("#1", mode="data.frame")

 

Finally change the format of your data:

         1                2            3           4          5  $    6            7

write.Alteryx(pdftools::pdf_text(file.path(data$FullPath)), 1)

 

Breakdown of the code:

1 & 7 = Alteryx specific R code that defines the output

2 = calls the package we will be using

3 = the command that will convert the pdf to text

4 = used to reformat the cell in our data frame as a file path

5 = the data frame we defined earlier

$ = print

6 = the field name of the cell from the directory tool

 

There it is a very simple solution that allows us to convert pdf to a usable format with in Alteryx.

7 REPLIES 7
ImadZidan
12 - Quasar

Hello @organicchocolate ,

 

just stating the obvious but it may help.

 

ta <- read.Alteryx("#1", mode="data.frame")

 

The above will run assuming that you have connected your datainput to RTool anchor 1.

 

Please check.

organicchocolate
6 - Meteoroid

Hi @ImadZidan

 

Thank you for reaching out. This proposed idea unfortunately led to the same error (please see below): Any other ideas? Thank you in advance

 

Photo 2.png

ImadZidan
12 - Quasar

@organicchocolate ,

 

Another also obvious observation.

 

your input will be converted to a dataframe. It could be that Alteryx is trying to do that and failing.

 

Is your ta variable required if not, I would just create two columns input text with one row. Fake input if you like(see attached)

organicchocolate
6 - Meteoroid

@

 

ImadZidan
12 - Quasar

Hello @organicchocolate ,

 

see attached, consider the code as a skeleton. It reads a pdf and converts to text. you need to do further processing based on your PDF format I guess.

 

Its just a start. have a look and lets build on it.

 

 

organicchocolate
6 - Meteoroid

Thanks. This serves as a good skeleton. Is the Field the go-to destination for tailoring the code to continue advancing toward full PDF conversion to PDF?

 

organicchocolate_0-1598102448575.png

 

ImadZidan
12 - Quasar

Hello @organicchocolate ,

 

First, you need an input for the RTool. So, this would be an input. 

Second, if you have different file names and different paths, this can become useful.

 

Example:

File1 - resides in directory1 and has a name file1

File2 - resides in directory2 and has a name file2

 

So you would loop through the data frame and do the conversion for each pdf.

 

I would say, initially start with one file and hardcode the path to as in the workflow. Once happy implement the loop logic.

 

Fields main objective is to store the file configuration. All your conversion will happen after you have read the pdf and converted to text.

 

If you provide a pdf and let us know what you are after, together we will work it out.

 

To be comprehensive, I am assuming you don't want to use the two PDF Tools available.

 

One requires a license and one doesn't. to have a look at them, type in the search PDF.

 

I hope this helps. 

 

PDF.PNG

Labels