Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

PDF separation problem

Dave
8 - Asteroid

Hi All,

 

I have an interesting problem I hope you might be able to shed some light on - 

 

I have a workflow that separates PDF files into individual files based on employee number. It works fine and is easily adaptable for different file layouts etc. 

 

Dave_0-1654170391824.pngDave_1-1654170435610.png

 

 

However I have now been asked to adjust the flow to separate some Canadian year end files. The problem is, each individual employee PDF contains the same information doubled, the page essentially split in half with the data mirrored. This causes a problem when splitting as there are no singular identifiers on the page - everything appears twice.

 

Is there a way to only take the FIRST instance of a string (say RPC/RRQ) from each page and split on that first instance?

 

Any help would be appreciated. 

 

Dave

6 REPLIES 6
Treyson
13 - Pulsar
13 - Pulsar

@Dave 

First of all, congrats on getting the username "Dave" that's pretty awesome. First name usernames are the best!

 

Can you give us some sample data of what is going into the regex tools? I can help build out the REGEX expression, but need to see what things are going to look like.

Treyson Marks
Managing Partner
DCG Analytics
Dave
8 - Asteroid

Awesome - thank you

 

Here is the regex expression currently

Dave_0-1654180005268.png

 

This is what is linked to it

 

Dave_1-1654180044438.png

 

This is form - the green box is the first instance of where I need the split (its currently returning the social sec number in the firt instance which is perfect)

 

The red box is the repeat of the instance on the same page

 

Dave_2-1654180370550.png

 

Treyson
13 - Pulsar
13 - Pulsar

Can you show me an example of what Alteryx pulls out of one of these files? That's the part I think we need to adjust the REGEX for. Like if it's a single cell with two RPC/RRQ numbers in it then we just have to parse that first one out? 

Treyson Marks
Managing Partner
DCG Analytics
Dave
8 - Asteroid

Sure

 

Dave_0-1654184697871.png

 

Its capturing some of the SS numbers (longer string) and then Nulls and additional data. The nulls are fine, they refer to a common sheet that I do not need in the final split. The two digit numbers are the concerning figures - Im not sure why its picking those up in some iterations and the long string in others

Dave
8 - Asteroid

Hi all - I am still searching for an answer to this however I have decided to simply a little. How do I use alteryx to rename a PDF file based on the first instance of an Employee ID on the page

 

Basically I would like this PDF file to be called 123456.T4.2021

 

Note the page has mirrored data (its a Canadian year end form)

 

 

Dave_1-1655375986654.png

 

 

Dave
8 - Asteroid

I will close this - it seems the structure of the file is the issue, not the process flow

 

Labels