Alteryx Designer Desktop Discussions

organicchocolate · ‎08-21-2020

Hi community,

I am having trouble below with an error in read.Alteryx, after following the instructions below the photo for converting PDF to Excel. Any tips? Thank you in advance!

install.packages("Rcpp", dependencies = TRUE, repos = "http://cran.us.r-project.org")

install.packages("pdftools", dependencies = TRUE, repos = "http://cran.us.r-project.org")

Note: The Rcpp package is a dependency and is not necessary but I use it to prevent issues that occur with other R GUI's.

Now define your data input (The FilePath to your pdf found using the directory tool)

data <- read.Alteryx("#1", mode="data.frame")

Finally change the format of your data:

1 2 3 4 5 $ 6 7

write.Alteryx(pdftools::pdf_text(file.path(data$FullPath)), 1)

Breakdown of the code:

1 & 7 = Alteryx specific R code that defines the output

2 = calls the package we will be using

3 = the command that will convert the pdf to text

4 = used to reformat the cell in our data frame as a file path

5 = the data frame we defined earlier

$ = print

6 = the field name of the cell from the directory tool

There it is a very simple solution that allows us to convert pdf to a usable format with in Alteryx.

ImadZidan · ‎08-21-2020

Hello @organicchocolate ,

just stating the obvious but it may help.

ta <- read.Alteryx("#1", mode="data.frame")

The above will run assuming that you have connected your datainput to RTool anchor 1.

Please check.

organicchocolate · ‎08-21-2020

Hi @ImadZidan

Thank you for reaching out. This proposed idea unfortunately led to the same error (please see below): Any other ideas? Thank you in advance

ImadZidan · ‎08-21-2020

@organicchocolate ,

Another also obvious observation.

your input will be converted to a dataframe. It could be that Alteryx is trying to do that and failing.

Is your ta variable required if not, I would just create two columns input text with one row. Fake input if you like(see attached)

organicchocolate · ‎08-21-2020

@ImadZidan

To clarify, my objective is to convert PDFs into Excel. What is the ta variable, and what form does the input need to be in prior to conversion to Excel?

ImadZidan · ‎08-21-2020

Hello @organicchocolate ,

see attached, consider the code as a skeleton. It reads a pdf and converts to text. you need to do further processing based on your PDF format I guess.

Its just a start. have a look and lets build on it.

organicchocolate · ‎08-22-2020

Thanks. This serves as a good skeleton. Is the Field the go-to destination for tailoring the code to continue advancing toward full PDF conversion to PDF?

ImadZidan · ‎08-22-2020

Hello @organicchocolate ,

First, you need an input for the RTool. So, this would be an input.

Second, if you have different file names and different paths, this can become useful.

Example:

File1 - resides in directory1 and has a name file1

File2 - resides in directory2 and has a name file2

So you would loop through the data frame and do the conversion for each pdf.

I would say, initially start with one file and hardcode the path to as in the workflow. Once happy implement the loop logic.

Fields main objective is to store the file configuration. All your conversion will happen after you have read the pdf and converted to text.

If you provide a pdf and let us know what you are after, together we will work it out.

To be comprehensive, I am assuming you don't want to use the two PDF Tools available.

One requires a license and one doesn't. to have a look at them, type in the search PDF.

I hope this helps.

Alteryx Designer Desktop Discussions

Error in read.Alteryx when converting PDF to Excel