In case you missed the announcement: The Alteryx One Fall Release is here! Learn more about the new features and capabilities here
ACT NOW: The Alteryx team will be retiring support for Community account recovery and Community email-change requests after December 31, 2025. Set up your security questions now so you can recover your account anytime, just log out and back in to get started. Learn more here
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Running R script in Alteryx workflow to convert PDF to excel

Ekta
8 - Asteroid

Dear All, 

I am trying to convert the PDF to excel  and then use the converted data from the output anchor in the following Alteryx tools 

Currently i am using R script outside of Alteryx to convert PDF to Excel and then taking that converted Excel as input to run the alteryx workflow.

 

Now, i am trying to add this step of conversion in the Alteryx workflow.

 

Expected Alteryx workflow is this  ->

Input PDF -> R script -> other tools to process the converted data 

 

R script to configure is this -

install.Rtools()
library(pdftools)
library(stringr)
library(xlsx)
library(rjava)
install.packages('openxlsx')
library(openxlsx)

tx <- pdf_text("C:/Users/Combined file.pdf")
tx2 <- unlist(str_split(tx, "[\\r\\n]+"))
tx3 <- str_split_fixed(str_trim(tx2), "\\s{2,}", 5)

tx3
write.xlsx(tx3, file="C:/Users/Combined file.xlsx")

 please help me with this

 

TIA

1 REPLY 1
BrandonB
Alteryx
Alteryx

Is there a reason that you are trying to write it to Excel as part of your script? You can write a data frame right back into an Alteryx workflow so the Excel step isn't necessary. This is an example of a macro that leverages R to bring in PDFs that seems in line with what you are attempting: https://gallery.alteryx.com/#!app/PDF-Input--Text-and-Image-/5be5ec8d0462d71ffce6deaa

 

However, if you are looking for additional advanced features from drag and drop tools, I would also recommend taking a look at the Alteryx Intelligence Suite that was recently released: https://www.alteryx.com/products/alteryx-platform/intelligence-suite

 

It not only allows you to bring in PDFs, but you can use templates to specify regions to extract across multiple PDFs which helps you avoid needing to use regex or a bunch of parsing rules to get to what you need. It also has a variety of text analysis tools and assisted modeling functionality that comes with it. 

Labels
Top Solution Authors