PDF separation problem
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi All,
I have an interesting problem I hope you might be able to shed some light on -
I have a workflow that separates PDF files into individual files based on employee number. It works fine and is easily adaptable for different file layouts etc.
However I have now been asked to adjust the flow to separate some Canadian year end files. The problem is, each individual employee PDF contains the same information doubled, the page essentially split in half with the data mirrored. This causes a problem when splitting as there are no singular identifiers on the page - everything appears twice.
Is there a way to only take the FIRST instance of a string (say RPC/RRQ) from each page and split on that first instance?
Any help would be appreciated.
Dave
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
First of all, congrats on getting the username "Dave" that's pretty awesome. First name usernames are the best!
Can you give us some sample data of what is going into the regex tools? I can help build out the REGEX expression, but need to see what things are going to look like.
Managing Partner
DCG Analytics
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Awesome - thank you
Here is the regex expression currently
This is what is linked to it
This is form - the green box is the first instance of where I need the split (its currently returning the social sec number in the firt instance which is perfect)
The red box is the repeat of the instance on the same page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Can you show me an example of what Alteryx pulls out of one of these files? That's the part I think we need to adjust the REGEX for. Like if it's a single cell with two RPC/RRQ numbers in it then we just have to parse that first one out?
Managing Partner
DCG Analytics
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Sure
Its capturing some of the SS numbers (longer string) and then Nulls and additional data. The nulls are fine, they refer to a common sheet that I do not need in the final split. The two digit numbers are the concerning figures - Im not sure why its picking those up in some iterations and the long string in others
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi all - I am still searching for an answer to this however I have decided to simply a little. How do I use alteryx to rename a PDF file based on the first instance of an Employee ID on the page
Basically I would like this PDF file to be called 123456.T4.2021
Note the page has mirrored data (its a Canadian year end form)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I will close this - it seems the structure of the file is the issue, not the process flow
