I have a .txt file where the data is somewhat unstructured as shown in the two attachments, especially page 2. Highlighted in both are the data I would like to bring into Alteryx as a table with the field names (e.g. PAYT GRP ID/NAME) and their corresponding values (e.g. 24/MONTHLY AFIS). How is the best way to trap this information and bring it in. The other issue is this txt document can have a page range of anywhere between 3 - 15 but I'm only needing the information from pages 1 & 2. Thanks.
Solved! Go to Solution.
Felipe - I'm trying to accomplish a similar objective with the attached .txt file, but I don't understand how to adapt your Regex formula to work with this data set. When running it, I can't get the expected values within the Key and Value fields as I don't understand how this formula works. Also attached is a screencapture (.png) of the fields I'm trying to make use of along with the workflow. Are you able to take a look and advise how to get this regex to work? Thanks.
@Jake5 for the Field 1 error, you need to uncheck the box in the input configuration that says "First Row Contains Field Names"
Thank you Felipe. That achieved the desired results, but can you share what update you made compared to the workflow I sent you? The regex formula looks the same as what I included in mine.
Thank you, Felipe. So something I just realized is for this input, there can be instances where page 1 layout can repeat in subsequent pages, the difference from one page to the next being the field value for INCLUDE/EXCLUDE GROUP ID/NAME:. Using the attached txt file, page 1 has INCLUDE/EXCLUDE GROUP ID/NAME: = 5/QUARTERLY FI - A RKCONNECT, page 5 = 11/QUARTERLY TPA - R RKCONNECT and so forth. I need the workflow to identify all pages where this layout is present in order to render the attached, desired output. My challenge is trying to trap the unique value for INCLUDE/EXCLUDE GROUP ID/NAME from each page. Are you able to assist?
Hi, Felipe. The output seems to work with that input file but I just tried running the workflow against this version (see attached Sample file_R00848_v3.txt.) There seems to be an issue with the page numbers that are being assigned as the highest page number the multi-row formula generates is 9 but this is a 37 page document. In this version the values I"m after are on pages 1, 17, 29, and 33. Sorry to continue to ask for your help - you've been amazing! Can you look over and advise? Thank you!
Hi @Jake5
You are right, there is an error unable to parse page numbers above the page 10. Please replace the expression of the multi row formula by this one, it should work now.
if CONTAINS([Field_1], 'PACKET PAGE:') THEN TONUMBER(REGEX_REPLACE([Field_1], '(.*?)(\d+)(.*)', '$2')) ELSE [Row-1:Page] ENDIF
(Basically, i just added the red ? to the previous version)
Yes - that worked - thank you!
Hi, Felipe
In the attached workflow you provided, you used \s{4,}(.*?)\s{4,}(.*?)\s{4,}.* as the Regex formula (Regex tool). Can you help me understand how this formula is able to effectively parse Field_1 into the Key and Value fields? I'm just wanting to learn for my own understanding. Thanks!
User | Count |
---|---|
17 | |
15 | |
15 | |
8 | |
6 |