community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Parse only certain components of .txt file to Excel

Hi

 

I am very new to Alteryx and have a side project I am working on. I have a single .pdf file with multiple Tax invoices. I have parsed that file to a txt doc. I am trying and completely failing at extracting only the required information from that document. All that is required from the doc is the Cell Number, Invoice Number, the billed items for each invoice and the amount of those items. The number of billed items varies for each invoice. I have attached a small section of the txt doc and the desired excel output. If anyone could help with this problem it would be greatly appreciated.

Highlighted
Pulsar

Perhaps there's a more elegant way, but here's my solution

 

cellular.png

Pulsar

It looks like the 3 amounts on the itemised lines are Net VAT and Gross, but you've only asked for Net. Let me know if you want the other 2 as well.

Quasar

Hi @Christopher_Waspe 

 

@DavidP 's solution looks great. Another approach is to start by filtering out only the pieces you need, and then creating one row with all of those pieces by using a Summarize tool. From there, a few RegEx parses (one to parse out the cellular and invoice numbers, one to parse multiple billed items into rows, and the final one to parse out the billed items into columns), and a record ID tool to add unique identifiers per row.

Give it a whirl, and let me know if it works for you. I parsed into billed and VAT, in case you needed both. If you only need the amount without the VAT, then modify the Regular Expression in the last RegEx tool to be (.*?)\s(\d+\.\d+)

 

image.png

Thank  you very much for the assistance. Just required a tiny bit of tweaking but did exactly what was required. Don't need the other two amounts but can get there from what you provided. Thanks again

Labels