community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Parsing single column of text into fields using empty rows to distinguish between fields

Highlighted
Fireball

Okay.  My data is trapped in a single field at import.  I need to parse the text into fields.   I assume a Multi Row Formula tool is needed.  And, I know I can use the empty rows to distinguish between different parts or groupings of text that correspond to a given field.  The empty rows can be 1-n.  

 

The first row is the field  ORGANIZATION.  In the example Coffee Company.

The next row of text is the field DOCUMENT TYPE.  In the example below it is News Release.

The next row of text is the field NOTICE ID.  In the example below it is 10-55.

The next row of text is the field ARTICLE TITLE.

This next row of text always starts with Washington.  Instead of one line there are 1-n lines of text before the next empty row.  These rows need to be merged under the field INTRODUCTION.

The remaining fields of text fall under the fields.   Instead of one line there may be 1-n lines of text before the next empty row.  These rows need to be merged under each respective grouping as BODY 1, BODY 2, BODY 3, etc. 

The -30- needs to be dropped. 

 

Coffee Company
 
NEWS RELEASE
 

On the web oursite

 
 
Notice No. 10-55
 
 
THIS IS THE TITLE OF THE ARTICLE
 
 

WASHINGTON, October 4, 2019 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. 
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
 
Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed
 
 
Excepteur sint occaecat cupidatat non proident,
sunt in culpa qui officia deserunt mollit anim id est laborum.
 
 
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
 
-30-
 
 

 

Quasar

Hi @hellyars ,

Multi row formula

iif(isnull([Row-1:Field1]) , 1 ,                   // first row
iif(isempty([Field1]), [Row-1:row] ,          // empty rows 
iif(isempty([Row-1:Field1]), [Row-1:row] + 1 , [Row-1:row])    //  new field  
)

)

 

 

clipboard_image_0.png

 

Meteor

I'm sure someone will have a pretty elegant solution, but I'd tackle it by parsing it on the newline characters if it isn't coming in that way already.  You could parse the fields by relative position to get everything except the "Body" sections because it seems like that would be the only variable number of fields.  Since it's an unknown number, you could make it a macro that outputs the Body sections as 1,2,3 etc. etc.  then join it back to the regular fields.  You could also do it as I have below and just pivot it back.  If you already have the data set up as you have below, you should get the right number of body segments, mine added more due to the /n when I saved.  

Screen Shot 2019-10-18 at 12.23.14 PM.png

Are you reports coming in one at a time or a long import?  If individual, you should be able to do the above and union records together.  If one long report, you're right, you'd could to do a multi-row formula to identify the start/stop of each record and could pass it through a macro doing the same parsing as above then union the results back together with a batch macro.  to get rid of the -x- (if that format stays the same, you can do a Regex to identify that row and remove them or drop the last "body" segment if you when the batch macro way.  

 

Hope this helps

 

Fireball

@Bdonahue  Nice.  But, would it be possible to attach the workflow.  Otherwise I can't tell how you setup each tool.  I just see the tools you used and the outcome, which appears to be exactly what I need. 

Fireball

@benakesh  The name of the organization might change.  But the format structure will always be the same. 

Bolide
Bolide

@hellyars , i believe you are looking for something like the attached. it groups the blocks of text and then concatenates them into the "fields" you described, however, you missed the "On the web oursite" (line 3 in your example). I added that in as 'WEBSITE LINK' as a place holder, or you can just filter that field out. 

Labels