This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I am very new to Alteryx and have a side project I am working on. I have a single .pdf file with multiple Tax invoices. I have parsed that file to a txt doc. I am trying and completely failing at extracting only the required information from that document. All that is required from the doc is the Cell Number, Invoice Number, the billed items for each invoice and the amount of those items. The number of billed items varies for each invoice. I have attached a small section of the txt doc and the desired excel output. If anyone could help with this problem it would be greatly appreciated.
@DavidP 's solution looks great. Another approach is to start by filtering out only the pieces you need, and then creating one row with all of those pieces by using a Summarize tool. From there, a few RegEx parses (one to parse out the cellular and invoice numbers, one to parse multiple billed items into rows, and the final one to parse out the billed items into columns), and a record ID tool to add unique identifiers per row.
Give it a whirl, and let me know if it works for you. I parsed into billed and VAT, in case you needed both. If you only need the amount without the VAT, then modify the Regular Expression in the last RegEx tool to be (.*?)\s(\d+\.\d+)