Conditions: Need to look for 'Report-47' and then 'Monetary Instrument Sales'. It it's then extract the values for 'Purchaser Account Number', 'Cash In', 'Reason Code', 'STAT' as highlighted. The flat file is just 1% of daily report so we have to read through complete file for each day then consolidate other daily files to get a monthly report. I am a new Trifacta user. Any advise on this will be highly appreciated!. Thanks in advance!
Solved! Go to Solution.
Hey Sundar!
Attached is a Recipe to parse this file. Here is output + recipe panel opened
So the essence of working with unstructured data like this is to find patterns and positions that you can use to start to break the data out into rows and columns. I'm not going to walk through every step in the recipe in depth because it is extensive, but to summarize the steps taken:
Thanks to @Alon Bartur? who came up with this solution. Shameless pitch, but the great part about using Trifacta to solve this was that seeing a preview of every step we are creating really allowed us to validate that the steps we were taking were moving us in the right direction. All in all it didn't take too long to solve (i'd be hopeless trying to do this with python 😅).
Go ahead and give it a shot and let us know if it helps!!
Wow, outstanding solution @David McNamara?
The common strategy with this and similar “structured” data is to turn it all down and rebuild.
?Excellent! Thank you so much @David McNamara? for the solution. I really appreciate all your efforts for such a quick turnaround. I am preparing recipe now using release 4.2 and post the update very soon. Thanks again!
Hi David, Can you post the image of the remaining steps of the recipe. I can see until step 17 and want to see the remaining steps for validation. Thanks @David McNamara? ?
Hey @Sundar Jagan? , no problem!
Here are the last few steps:
I've also attached the flow in version 4.2 to this comment. You should be able to import the flow (go to flow view, select more options (...), click Import, and choose the attached .zip file) and then simply replace the source file with you flat file example, that way you don't have to recreate the recipe from scratch 🙂
Hope this helps!
David
Amazing! Thanks a lot @David McNamara? . Really appreciate for attaching the flow for version 4.2 it's very much helpful. Hats off again to both of you @David McNamara? and @Alon Bartur? . Just wondering if you would recommend any resources to extract the data in a same way from PDF.?
Hi Sundar,
No resources that I know of, unfortunately. I can try pull together some information and send it along to you. I will add to my todo for next week.
Have a nice weekend :)
David