Hi,
The data source is a daily email of announcements. The data is trapped in one initial output field. I need to reshape it. I am thinking transpose, parse, multi-role formula, etc. But, I don't know where to begin. Thanks
My first step is to import the email as a TXT file. I have Delimiter import options set to \n. I can filter some of the header information using the Start Data Import on Line setting. The data table below represents the initial output of the Input Tool.
I need to parse, transpose the data into columns and rows. The tables below depict 1) the data after import and 2) the desired output.
A few things...
There are 3, sometimes 4 categories. They are always named the same (CATEGORY A, CATEGORY B, CATEGORY C, CATEGORY D). The number of items under each CATEGORY (represented by the Lorem Ipsum) can vary from 0 to N on a day-to-day basis. And, I want to ignore everything after "END" statement, which in the real world is *small business".
For Release AT |
5 p.m. ET |
No. CR-246-18 |
Dec 29, 2018 |
[Null] |
Dec. 29, 2018 |
[Null] |
Announcements For Dec. 29, 2018 |
[Null] |
CATEGORY A |
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. |
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. |
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. |
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. |
CATEGORY B |
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. |
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. |
CATEGORY C |
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. |
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. |
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. |
End |
gibberish |
[Null] |
gibberish |
gibberish |
gibberish |
Desired Output
Date | Category | Description |
Dec. 29, 2018 | CATEGORY A | Lorem ipsum dolor sit amet, consectetur... |
Dec. 29, 2018 | CATEGORY A | Lorem ipsum dolor sit amet, consectetur... |
Dec. 29, 2018 | CATEGORY A | Lorem ipsum dolor sit amet, consectetur... |
Dec. 29, 2018 | CATEGORY A | Lorem ipsum dolor sit amet, consectetur... |
Dec. 29, 2018 | CATEGORY B | Lorem ipsum dolor sit amet, consectetur... |
Dec. 29, 2018 | CATEGORY B | Lorem ipsum dolor sit amet, consectetur.. |
Dec. 29, 2018 | CATEGORY C | Lorem ipsum dolor sit amet, consectetur.. |
Dec. 29, 2018 | CATEGORY C | Lorem ipsum dolor sit amet, consectetur.. |
Dec. 29, 2018 | CATEGORY C | Lorem ipsum dolor sit amet, consectetur.. |
Solved! Go to Solution.
Okay.
I found one way to breakout and fill-in for Category.
1. I created a new field using the Formula tool.
if [Field_1]='A' then 'A' elseif [Field_1]='B' then 'B' elseif [Field_1]='C' then 'C' elseif [Field_1]='D' then 'D' else Null() endif
2. I then used the Multi-Row Formula tool to 'fill-down'. Check out this Tool Mastery | Multi Row article.
iif(isnull([Category]),[Row-1:Category],[Category])
Based on the sample you provided I suggest:
1. Use a multi row formula to create the date column. I used REGEX_Match to look for a date and then DateTimeParse to parse:
IIF(Regex_Match([Field1],"[A-Z][a-z][a-z] \d{2},\s+\d{4}"), DateTimeParse([Field1],"%b %d,%Y"), [Row-1:Date])
2. Use a second multi row formula to make the category. Again I chose Regex to match the category text:
IIF(Regex_Match([Field1],"^CATEGORY .*"), [Field1], IIF([Field1]="End",NULL(),[Row-1:Category]))
3. Finally filter to rows with a Category and then a select to tidy up
Sample attached.
I love the simplicity of this approach. I just came up with my own solution, but I think this approach provides more flexibility should any anomalies creep up in the source data. It also requires 1/3 the steps.
Many Thanks
How might you modify the 2nd Multi-Row/Regex formula if Category was A, B, C, D and NOT Category A, Category B, Category C, and Category D?
Thanks
Assuming you know the list of categories I would suggest:
IIF([Field1] IN ("A","B","C","D"), [Field1], IIF([Field1]="End",NULL(),[Row-1:Category]))
Otherwise, you will need to come up with some rule to identify the category cells. Feel free to DM me if you want me to look at real list in private
User | Count |
---|---|
18 | |
14 | |
13 | |
9 | |
8 |