Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

REGEX Help - String data and Text-to-Columns - Replace some delimiters based on condition

ryanhartman32
7 - Meteor

Hello, I have string data in the following format:

 

"• General     The general responsibilities of the role are XYZ • Physical Demands     The Physical Demands of this role are XYZ • Education Requirements     The education requirements for this role are • Bachelor's Degree • Appropriate Certification • Experience     5 years of experience is required"

 

Here, "General" is a heading, and the text that follows it is the description for that heading.

 

What I have done is used "•" as a delimiter, and replaced "     " with "_". (these are supposed to be line breaks I believe, but in the text I have they are represented by 5 spaces). I need to basically put these in their own columns, where I could have:

General_The general responsibilities of this role are XYZPhysical Demands_The Physical Demands of this role are XYZEducation Requirements_The education requirements for this role are • Bachelor's Degree • Appropriate Certification• Experience_5 years of experience is required

I then want to isolate the Heading to pull it out so I can have a Heading Column and a Description column. I wrote code to make a column for the text before the "_" and after the "_" to make these columns. However, the issue lies in the "•" being within a description, in this example, the text "• Bachelor's Degree • Appropriate Certification". 

 

Because of this, I receive the following result: 

General_The general responsibilities of this role are XYZPhysical Demands_The Physical Demands of this role are XYZEducation Requirements_The education requirements for this role are Bachelor's Degree Appropriate CertificationExperience_5 years of experience is required

Because of the bullet point, the Text-to-Columns tool creates a new column for Bachelor's Degree and Appropriate Certification, but I need it to be contained in the Education Requirements column.

 

I have been trying to find a way to delete or replace the "•" if the word/phrase it precedes does not have a "_" before the next "•". In other words, if there is a list of bullet points, but not an "_" (a line break) then I want to contain that in the same column, and not split that into its own column, as it should not be treated as a Heading. 

 

Is there a way to do this with REGEX? I am a novice in REGEX and nothing I have been trying has seemed to work. Please note that the Headings can be anything - not just General, Physical Demands, Education Requirements, and Experience. Thank you so much. 

3 REPLIES 3
BS_THE_ANALYST
14 - Magnetar
14 - Magnetar

One way using RegEx parsing:

BS_THE_ANALYST_0-1676402240553.png

 

Note, this is only dealing with the fact there are 4 groups that you are seeking. 

 

All the best,
BS

LinkedIN

Bulien
ryanhartman32
7 - Meteor

Thank you, @BS_THE_ANALYST! That works perfectly for the above example - would you know what RegEx expression could be used to generalize this? Meaning, if there is an unknown number of headers and situations like the "• Bachelor's Degree • Appropriate Certification" situation can be anywhere in the cell? 

 

I think it could work if I could find a way to only recognize a delimiter if it is in the format "• Header_" - is there a way to do that with RegEx? 

 

Or, could I maybe do a RegEx replace where I can replace text in the format of "• Header_" with, say, "- Header_" ? Then I could use "-" as a delimiter and achieve that same result?

BS_THE_ANALYST
14 - Magnetar
14 - Magnetar

@ryanhartman32 we now have a dynamic way! I pushed your problem further along the line. There's an indepth explanation about how it works using Positive Lookaheads. Definitely more complex RegEx needed for this particular problem. 

 

BS_THE_ANALYST_0-1676595190458.png

 

So your new dynamic worfklow is now attached. This will now account for more headers coming through:

BS_THE_ANALYST_1-1676595223775.png

 

 

 

All the best,
BS

LinkedIN

Bulien
Labels
Top Solution Authors