Cell value segregation
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Team,
I have a column named "Title" which has some text content where each row ends with some 5 character or 10 character or 15 character or 20 character code. I want to segregate the text alone from this code which comes at the end of the text.
Have attached the Sample input and required output.
INPUT | OUTPUT | |
Title | Title 1 | Title 2 |
Soap packet ADFGH | Soap packet | ADFGH |
Freshener AP3PP/PIYTR | Freshener | AP3PP/PIYTR |
Disinfectant AAQQ3/MNJKK/TT0UU | Disinfectant | AAQQ3/MNJKK/TT0UU |
Floor Cleaner KJYTR/WWWQA2/MNBBV/KK9KK | Floor Cleaner | KJYTR/WWWQA2/MNBBV/KK9KK |
Solved! Go to Solution.
- Labels:
- Data Investigation
- Datasets
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@Gsiva3 One way of doing this
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
If you're interested in trying RegEx, you could use an expression like the following in a regular Formula tool:
REGEX_Replace([Title], "(.*)\s(.*)", "$1")
Basically, it's looking for two sections of any length of any characters (.*), separated by a space \s. In this case, since we aren't specifying otherwise, it will use the last space found in the text. Then, it will keep the first group $1. This will get you Title 1.
Try this out and give it a go to find the Title 2 you're wanting.
Alternatively, if you're looking for a non-RegEx answer, you could use add a RecordID tool first then use the Text to Columns tool with \s as the delimiter, split to rows and keep only the last record.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
The regex is working fine. But the think is in my input in some of the cases there is no space at the end between the title and the code.
Please refer the below sample input. How to proceed in this case
INPUT |
Title |
Soap packet ADFGH |
Freshener AP3PP/PIYTR |
Disinfectant AAQQ3/MNJKK/TT0UU |
Floor Cleaner KJYTR/WWWQA2/MNBBV/KK9KK |
WasherMMNHJ |
OUTPUT | |
Title 1 | Title 2 |
Soap packet | ADFGH |
Freshener | AP3PP/PIYTR |
Disinfectant | AAQQ3/MNJKK/TT0UU |
Floor Cleaner | KJYTR/WWWQA2/MNBBV/KK9KK |
Washer | MMNHJ |
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@Gsiva3 Since we cannot use a space, what logic would you like to use to identify the separation that would work for all of your records? 3+ capital letters in a row, perhaps?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
We cab use 4+ Capital letters in a row
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @Gsiva3 just checking that this will work as in the Freshener example above, the code starts with AP3, only 2 capital letters in a row
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
So we might need 2 conditions
1. 2Capital letters + 3rd character as "Number"
2. 1st 4 characters as capital letters
The reason is I do have few title which has 3Capital letters in it which I dont want to seggregate
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Here is an updated expression that should get you the case for 4 upper case letters as the condition:
trim(REGEX_Replace([Title], "(.*?)([A-Z]{4}.*)", "$1",0))
You'll see this is pretty similar to the prior expression, but rather than the space \s, we have the second group as [A-Z]{4}.*
This where you're telling Alteryx to look for 4 capital letters in a row [A-Z]{4} plus anything after that .*
We must put the question mark in the first group to ensure that we get the first instance of this match for the second group.
I also added the ,0 at the end to let Alteryx know to be case sensitive.
Use this as a starting point and modify it slightly to create your other rule of 2 capital letters then a number. Note digits can be identified using \d.
