Using Parsing and Multi-Field to format document
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I have a file that comes out of my tax software that need to break out into columns and apply some kind of multi-field condition to.
The basic flow is data is stacked in the following order:
- IRS Form Number 1120 Pg. 1 Line number - Line Description
- Software's TRC No - TRC Description
- Software's TCC No - TCC Descripion
- Our Company's GL A/C No - G/L Description
Some inherent data flow considerations are:
- Form Line Numbers are alphanumeric and will include letters and numbers and will not be the same number of characters
- Form Line Descriptions while generally alpha numeric, can contain special characters (":", "&", "%", and "/" being the most common)
- TRC No are /d{2}-/d{3}
- TRC Descriptions while generally alpha numeric, can contain special characters (":", "&", "%", and "/" being the most common)
- TCC No are /d{3}
- TCC Descriptions while generally alpha numeric, can contain special characters (":", "&", "%", and "/" being the most common)
- G/L AC No.s are /w{6}
- G/L Descriptions are Descriptions while generally alpha numeric, can contain special characters (":", "&", "%", and "/" being the most common)
- Not every Line Number has a TRC Number
- If it does not have a TRC Number, it will not have a TCC Number, or G/L Number
- Not every TRC Number has a TCC Number
- If it does not have a TCC Number, it will not have a G/L Number
- Not every TCC Number has a G/L Number
- If a Line Number has a TRC Number, it can have 1 or more TRC Numbers
- If a TRC Number has a TCC Number, it can have 1 or more TCC Numbers
- If a TCC Number has a G/L No., it can have 1 or more G/L Numbers
- Each G/L No can have ONLY ONE Line No, TRC No., and TCC No.
End Result would be to organize the G/L accounts by Line No and Description; TRC No. and Description; TCC No. and Description. Then I would concatenate the TRC No and TCC No. in a separate column.
GL Acct No. | GL Desc | 1120 Pg. 1 Ln No. | 1120 Pg. 1 Ln Desc | TRC/TCC Combo | TRC No. | TRC Desc | TCC No. | TCC Desc |
400000 | Sales | 1a | Gross Receipts or Sales | 30-100 010 | 30-100 | Gross Receipts or Sales | 010 | Gross Receipts/Sales |
400001 | Product Sales-Manual | 1a | Gross Receipts or Sales | 30-100 010 | 30-101 | Gross Receipts or Sales | 010 | Gross Receipts/Sales |
Ideally, I would be able to have a filter near the end where True would be !Null G/L and False I still would be able to see the Line No/TRC/TCCs that do not have an account, but that still has the TCC on the same line with its TRC and Line No on the same row on the F.
I have tried working with Parse and for the life of me I can't get the parse to work. I can text to columns, but because each is slightly different it creates more columns than I want and separates the TRC Number as it contains I dash.
Solved! Go to Solution.
- Labels:
- Best Practices
- Common Use Cases
- Parse
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @davidlocke,
(@others: Feel free to use my parsing as a starting point - I am not sure if I can figure out the whole thing)
I won't have a full solution for you, but a first parsing attempt:
Result looks like this:
Is the parsing right or do we need to adapt something? Workflow is attached.
Best
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hey @grossal,
That didn't quite work for me. However, I was able to find my parsing error. I was using / instead of \ on my metacharacters.
So I have my data parsed. Now I need an answer to turn that into my final product. Maybe, Cross-tab/multi-field?
@TonyA you helped me on something on the same general lines earlier, do you have an idea?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I'll take a look at this tonight. It won't be a simple crosstab -- we'll need to fill down the line numbers, TRC's and TCC's according to a set of rules.
Just to verify, is it correct to say that the only possibilities for output records is:
- Line Number only
- Line Number and TRC
- Line Number, TRC, and TCC
- Line Number, TRC, TCC, and GL?
Questions:
- How do I identify line numbers? If a line number always includes alphabetic characters, then I can distinguish it from TRC and TCC codes, but if GL accounts are and line numbers are both alphanumeric and line numbers can have any number of characters, how do I distinguish between 6 character line number and a GL code?
- If I have a new TCC number, I assume the line number and TRC carry down. If I get a new TRC and it is not followed by a TCC, is the TCC now null or does the previous TCC carry down? If the TCC is null, then I assume the TRC must be followed by either a TCC or a line number.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Your whole reply didn't show up in the community for some reason. So I am pasting here with my answers in bold:
Just to verify, is it correct to say that the only possibilities for output records is:
- Line Number only Line Number and TRC Line Number, TRC, and TCC Line Number, TRC, TCC, and GL?
I would expect the only outputs to be:
1120 Pg. 1 Line No. | 1120 Pg. 1 Line Desc | TRC No. | TRC Desc | TCC No. | TCC Desc | GL AC No. | GL AC Desc | |
Result 1 | 1120 Pg. 1 Line No. | 1120 Pg. 1 Line Desc | Null | Null | Null | Null | Null | Null |
Result 2 | 1121 Pg. 1 Line No. | 1121 Pg. 1 Line Desc | TRC No. | TRC Desc | Null | Null | Null | Null |
Result 3 | 1122 Pg. 1 Line No. | 1122 Pg. 1 Line Desc | TRC No. | TRC Desc | TCC No. | TCC Desc | Null | Null |
Result 4 | 1123 Pg. 1 Line No. | 1123 Pg. 1 Line Desc | TRC No. | TRC Desc | TCC No. | TCC Desc | GL AC No. | GL AC Desc |
Questions:
- How do I identify line numbers? If a line number always includes alphabetic characters, then I can distinguish it from TRC and TCC codes, but if GL accounts and line numbers are both alphanumeric and line numbers can have any number of characters, how do I distinguish between 6 character line number and a GL code?
Line Numbers are alpha numeric. They are no more than 3 characters, which would be either just a one to two digit number possibly followed by a letter (subline) (just so I can use this on other forms, range from (1a-1z)-(99a-99z)) To get a better understanding you can look at the form in question f1120, which I have attached. Basically our tax software maps GL ACs to a form line number using a TRC/TCC code combination. If we didn't have any activity this year, but had activity in a prior year on a line there would be no GLs, but a TRC and TCC. If we have never had any activity on a line there would be only a TRC (we create the TCC, the tax software creates the TRC). If the line cannot be mapped to (lines on the tax form that total or subtotal) then there would not be a GL, a TRC, or a TCC.
- If I have a new TCC number, I assume the line number and TRC carry down.
Correct. See example below:
Excerpt of Alteryx Data at Present | |||||||||
ID No | ID Desc | NOTE NOT IN DATA | |||||||
4 | Additional section 263A costs | Line Number | |||||||
40-140 | COGS: Additional Sec. 263A Costs | TRC Number | |||||||
010 | Additional Section 263A Costs | TCC Number | |||||||
550103 | Additional 263A Costs | GL | |||||||
5 | Other costs | Line Number | |||||||
40-150 | COGS: Other Costs | TRC Number | |||||||
020 | Advertising | TCC Number | |||||||
557130 | Other Advertising | GL | |||||||
025 | Allocation & Settlement - Ledge | TCC Number | |||||||
558990 | Settlement Account | GL | |||||||
559991 | CC Alloc In/Out | GL | |||||||
559992 | Order Settlement | GL | |||||||
559999 | FI-CO Recon. | GL | |||||||
Alteryx Expected Result | |||||||||
GL Acct No. | GL Desc | 1120 Pg. 1 Ln No. | 1120 Pg. 1 Ln Desc | TRC/TCC Combo | TRC No. | TRC Desc | TCC No. | TCC Desc | |
550103 | Additional 263A Costs | 4 | Additional section 263A costs | 40-140 010 | 40-140 | COGS: Additional Sec. 263A Costs | 010 | Additional Section 263A Costs | |
557130 | Other Advertising | 5 | Other costs | 40-150 020 | 40-150 | COGS: Other Costs | 020 | Advertising | |
558990 | Settlement Account | 5 | Other costs | 40-150 025 | 40-150 | COGS: Other Costs | 025 | Allocation & Settlement - Ledge | |
559991 | CC Alloc In/Out | 5 | Other costs | 40-150 025 | 40-150 | COGS: Other Costs | 025 | Allocation & Settlement - Ledge | |
559992 | Order Settlement | 5 | Other costs | 40-150 025 | 40-150 | COGS: Other Costs | 025 | Allocation & Settlement - Ledge | |
559999 | FI-CO Recon. | 5 | Other costs | 40-150 025 | 40-150 | COGS: Other Costs | 025 | Allocation & Settlement - Ledge |
- If I get a new TRC and it is not followed by a TCC, is the TCC now null or does the previous TCC carry down? If the TCC is null, then I assume the TRC must be followed by either a TCC or a line number.
If a Line Number is not followed by a TRC, TRC, TCC, GL would be Null
If a new TRC is not followed by a TCC, TCC & GL would be Null
If a TCC is not followed by GL, GL would be Null
IMPORTANT
If a GL is followed by TCC and NOT a TRC then the TRC should be the last TRC (see example above for TRC 40-150)
Let me know if you have additional questions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Take a look at this. You'll need to check my logic carefully, I did this kind of quickly. Especially the last three tools where I decided when to keep rows that didn't have GL values.
I focused on getting everything into rows and getting rid of unneeded rows. So there is a little cleanup left to do including creating the concatenated column.
Also, since I did do this quickly you might be able to tighten it up.
Edit:
One more thing. I did not do any error checking. For example, I'm only checking to see if a change in line number is followed by a change in TRC and keeping the line if it isn't. I'm not checking to see if a change in line number is followed by, for example a new GL, which would be an error.
I just looked through your example data and that does happen in one place at the end of the data set, so you might want to account for that. It would just require some more complex logic in the last Multi-Row Formula tool.
Edit (2): I found an error that I just fixed. If you downloaded before 8am Pacific on May 1, please download again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I compared the result against what we had for 2018 and it all tied out. Thanks for your help.
