Regex trouble...
See attached workflow.
I am trying to extract the table fields from a table input (using computer vision tools in a batch macro).
Unfortunately, the source PDF was of poor quality and Alteryx was not always able to delineate the fields.
In my workflow....
If RQST and BILL are BOTH NOT NULL then the import was good. If they are 1 or both are NULL then the import was bad. The screenshot depicts all 3 cases.
The field to split is RAW.
I only care about
How can I extract this.
@danilang Sorry. There now
I did as much clean as possible in Alteryx -- e.g., removing unnecessary punctuation, etc. But, I ended exporting to a text editor and manually fixing as much as possible before reintroducing into Alteryx and then Tableau. Ended up going thru a few cycles until I got everything to total up correctly, etc.
Please find my solution done using Regex -
Regular expression - (\d+).+\|(.+)\|(.+)
@EN6924 Thank you. In theory that should work. In practice hundreds of lines lack the correct delimiters due the less than perfect quality of the original PDF/images processed by the Computer Vision Tools.
Hi @hellyars
There may be a way to handle all this in a single regex statement, my regex-fu is not that good. Instead I decided to break it down into manageable chunks.
The first pattern is \d+.*?\|.*?(\d+[,|\.|\s]*\d+)[\||\s](\d*[,|\.|\s]*\d+) and matches 12|abc..yz|xx,xxx|x,xxx with various possibilities for the delimiters and thousands separator. After reversing all the rows that didn't match, I used this (\d*[,|\.]*\d+)?[\|]+(\d*[,|\.]*\d+)?.* where the BILL and RQST groups are now optional handling the case where only one of them is there. This left me with only rows with a single number at the from separated from the rest of the string by a space. I then unioned, reversed the B and R values and finally unioned with the first set of matches.
In the sample output, the R and B columns correspond to your RQST and BILL columns
Of course you'll have to validate all the rows, but this should get you at least 99% of the way there.
Dan