Help on splitting column data using Regex and Text to Column tools?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi there, I have been looking for a solution to split the column data I have into 5 different columns as shown in the image I have attached. It would be great if I can get some help on converting this string "GaaEe 2014 Death Only Z countries. Non smoker. Standard" into an expression that Alteryx understands to then be split into 5 columns. Any help would be greatly appreciated!
Solved! Go to Solution.
- Labels:
- Custom Tools
- Tips and Tricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Please provide relevant data to this use case, and kindly provide your criteria in as much detail as possible. If you have a workflow built halfway, kindly export that over as well.
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@jakemiroquai
I think we need more information as mentioned by @caltang .
And maybe we can use the period "." as delimeter with tool of Text to Column?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I have attached a sample workflow showing the 2 rows of data which I wish to separate into its relevant columns; Name: GaaEe 2014, Type: Death Only, Country Type: A Countries etc etc...If you believe there is a better method, do advise! My idea was to use Regex to identify the string before inputting it into the Text to Columns tool to output the data into its correct fields. Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Currently, I am able to use the expression "(\S+ \d{4}) (.+?\s.+?) (.+?\s.+?) (.+?\s.+) (.+?.+)" for some rows of data but when it comes to data that vary such as the ones with paratheses, it is unable to read the expression. I was thinking there has to be a more dynamic way such that it can capture this variety and output the field data accordingly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Due to the nature of your data, a straightforward solution via REGEX or Text-to-columns is not recommended.
I have done an example above where I had to go through many steps and processes to get the data in order.
It will help if you have a master list of variables from each column, then you can better prep the data.
Hope this gives you some ideas @jakemiroquai
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
This looks like something I could work with! It seems like I have to separate this 'separation of column data' process into different parts with the data that I have. Thank you for your insight!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
No problem! If it helps, kindly like / mark as accepted solution if it helps.
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Sorry I have got 1 more question, would it be possible to just match 'Smoker' or 'non smoker' from the data and output it to a new column? There is just too much variation in my data and I think I would want to just focus on certain key words
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Can you give some sample data? Make it more varied... then I can have a better solution. Please get it as closely as your data as possible...
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
