Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Split PDF File into multiple PDFs

jmedinat
5 - Atom

Hi,

 

Looking for some help here. I'm trying to split one PDF file into multiple files based on employee ID and save them as separate PDFs. Usually each employee has only one page in the input file, but it could potentially be more than 1, so I'd need Alteryx to read the input file and identify changes on employee ID to know when to create a new file (if an employee has multiple pages, all of them would be placed consecutively).

 

Any ideas on how to get this accomplished?

 

Thanks,

Jose M.

10 REPLIES 10
mceleavey
17 - Castor
17 - Castor

Hi @jmedinat ,

 

Yes, but to achieve this we'll need two things.

First, we need the pdf you're reading in so we know how to build in the separators, and second you will need the Intelligence Suite tools to be able to read and parse pdf files.

If you don't have IS, you can use a few pdf reader tools that are in the community but I can't vouch for how well they work if indeed at all.

 

Without the original pdf (or one with mock data but representative of exactly how it looks) we can't help you.

 

M.



Bulien

jmedinat
5 - Atom

Hi Mceleavey,

 

Thanks for the quick response. I don't currently have a license for IS. The issue is the input files that I'm working with contains employees' sensitive information and I wouldn't be able to share. I could provide a screenshot where I can hide the sensitive information if that would help. If you can point me on the direction of the pdf reader tools I can take a peak and see what I can find out.

 

Thanks again for your help!

Jose M.

mceleavey
17 - Castor
17 - Castor

Hi @jmedinat ,

 

parsing pdf is like parsing unstructured text, so we'd need the exact pdf I'm afraid, so we can't help you there.

To output into multiple pdf files you simply need the user ID as a column and use the render tool with the User ID column as the grouping field. You can then output to a specific file and select pdf as the format.

 

Hope this helps with the output side of things.

 

M.



Bulien

PhilipMannering
16 - Nebula
16 - Nebula

@jmedinat I don't suppose you could share a redacted PDF or something similar to what you're trying to split?

jmedinat
5 - Atom

@PhilipMannering @mceleavey  I was able to create a file with dummy employee data. The file attached contains 2 employees. one has only two pages and the other one has 1 page only. The field that shows the EE ID is called Empl. no.

 

Thank you for your help with this!

mceleavey
17 - Castor
17 - Castor

Hi @jmedinat ,

 

I've built this using the Intelligence Suite tools, which does exactly what you need:

 

mceleavey_0-1634648921495.png

 

This loads the pdf in and parses the text. It then uses regex to determine the Employee ID which is then associated to the image of each individual page. I then wrapped the output in a macro to allow it to group the images to the correct employee and output accordingly. 

You then get the attached PDF files.

 

Without the Intelligence Suite tools you will need to find a macro that has been built that still works (I can't find one) or an R package that will do it for you.

You will then need to change the flow to convert the text back into pdf format.

 

M.



Bulien

jmedinat
5 - Atom

Hi @mceleavey 

 

Thanks for you help with this! I don't have the IS but I'll take a look and see if I can get that. What I noticed on the output files you sent is that they both contain the same number of pages as the original (3). What I was trying to accomplish is getting each employee's pages isolated into an individual PDF file so they can be uploaded to their individual profiles. Does that make sense?

mceleavey
17 - Castor
17 - Castor

@jmedinat ,

 

oops! my mistake. I meant to embed the image in the macro.

It's fixed now.

The one for Employee 1 should have two pages, and the other should have one.

 

M.



Bulien

rosslogie
5 - Atom

Really interesting stuff! Has anyone found a way to do this without Intelligence Suite? 

Labels