General Discussions

myl9713 · ‎07-19-2022

Hi,

The Excel file that I'm working with is converted from PDF, so the format is really off for certain lines. I'm trying to match up the columns by concatenating everything in Excel and then using text-to-columns in Alteryx. However, I have some columns that are not filled for every entry (Ex: A and F), which can lead to column mismatches when I concatenate and then text-to-columns. I'm thinking of using RegEx to take out the 4 in A2 and 113121 in F3, but I'm not sure what the right equation might be.

The first picture is the raw data imported from PDF, the second picture is what I'll be uploading to Alteryx. If anyone has a better way to clean up the data, please let me know too!

Luke_C · ‎07-20-2022

Hi @myl9713

This looks like a tricky ask. You might get more input if you post the data rather than screenshots. As it stands someone would have to retype all of that to help prototype a solution and there appears to be varying amounts of spaces in your data that would easily be mistyped.

BrianR · ‎07-20-2022

Hi @myl9713 - as @Luke_C mentioned, definitely a tricky ask, as there are multiple patterns involved. The attached workflow should work, with a caveat, that there will only be these 4 patterns observed in the data. As more data flows into the workflow, you might uncover more patterns, which of course would have to be accounted for. If by chance your data is structured and appears similar, perhaps the attached workflow might work. Others here on Community might have some more efficient ideas on regex parsing as well, so keep an eye out for more potential solutions from others.

General Discussions

Using RegEx to Format Data

Re: Euleryx Project 8 - Largest Product in a Serie...

Re: Euleryx Project 7 - 10 001st Prime

Re: Euleryx Project 7 - 10 001st Prime

Re: Euleryx Project 6 - Sum Square Difference

AYX TV Cooking Up Excel Outputs workflow