Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

REGEX Help - String data and Text-to-Columns - Replace some delimiters based on condition

ryanhartman32
7 - Meteor

Hello, I have string data in the following format:

 

"• General     The general responsibilities of the role are XYZ • Physical Demands     The Physical Demands of this role are XYZ • Education Requirements     The education requirements for this role are • Bachelor's Degree • Appropriate Certification • Experience     5 years of experience is required"

 

Here, "General" is a heading, and the text that follows it is the description for that heading.

 

What I have done is used "•" as a delimiter, and replaced "     " with "_". (these are supposed to be line breaks I believe, but in the text I have they are represented by 5 spaces). I need to basically put these in their own columns, where I could have:

General_The general responsibilities of this role are XYZPhysical Demands_The Physical Demands of this role are XYZEducation Requirements_The education requirements for this role are • Bachelor's Degree • Appropriate Certification• Experience_5 years of experience is required

I then want to isolate the Heading to pull it out so I can have a Heading Column and a Description column. I wrote code to make a column for the text before the "_" and after the "_" to make these columns. However, the issue lies in the "•" being within a description, in this example, the text "• Bachelor's Degree • Appropriate Certification". 

 

Because of this, I receive the following result: 

General_The general responsibilities of this role are XYZPhysical Demands_The Physical Demands of this role are XYZEducation Requirements_The education requirements for this role are Bachelor's Degree Appropriate CertificationExperience_5 years of experience is required

Because of the bullet point, the Text-to-Columns tool creates a new column for Bachelor's Degree and Appropriate Certification, but I need it to be contained in the Education Requirements column.

 

I have been trying to find a way to delete or replace the "•" if the word/phrase it precedes does not have a "_" before the next "•". In other words, if there is a list of bullet points, but not an "_" (a line break) then I want to contain that in the same column, and not split that into its own column, as it should not be treated as a Heading. 

 

Is there a way to do this with REGEX? I am a novice in REGEX and nothing I have been trying has seemed to work. Please note that the Headings can be anything - not just General, Physical Demands, Education Requirements, and Experience. Thank you so much. 

3 REPLIES 3
BS_THE_ANALYST
14 - Magnetar

One way using RegEx parsing:

BS_THE_ANALYST_0-1676402240553.png

 

Note, this is only dealing with the fact there are 4 groups that you are seeking. 

 

ryanhartman32
7 - Meteor

Thank you, @BS_THE_ANALYST! That works perfectly for the above example - would you know what RegEx expression could be used to generalize this? Meaning, if there is an unknown number of headers and situations like the "• Bachelor's Degree • Appropriate Certification" situation can be anywhere in the cell? 

 

I think it could work if I could find a way to only recognize a delimiter if it is in the format "• Header_" - is there a way to do that with RegEx? 

 

Or, could I maybe do a RegEx replace where I can replace text in the format of "• Header_" with, say, "- Header_" ? Then I could use "-" as a delimiter and achieve that same result?

BS_THE_ANALYST
14 - Magnetar

@ryanhartman32 we now have a dynamic way! I pushed your problem further along the line. There's an indepth explanation about how it works using Positive Lookaheads. Definitely more complex RegEx needed for this particular problem. 

 

BS_THE_ANALYST_0-1676595190458.png

 

So your new dynamic worfklow is now attached. This will now account for more headers coming through:

BS_THE_ANALYST_1-1676595223775.png

 

 

 

Labels