I have two sets of data I'm working with. Set #1 is as below, a "data" column with thousands of rows and multiple columns with one of 3 values, "yes," "no" or "-" as below:
Data | column 1 | column 2 | column 3 | column 4 | column 5 | column 6 | column 7 | column 8 | column 9 | column 10 |
aaa1 | yes | no | no | no | no | no | no | no | no | - |
aaa2 | yes | - | yes | yes | yes | no | yes | yes | no | no |
aaa3 | no | - | - | - | - | no | - | - | no | no |
aaa4 | - | no | no | no | no | yes | no | no | yes | no |
aaa5 | - | yes | yes | yes | no | - | yes | no | - | yes |
aaa6 | no | - | - | - | - | no | - | - | no | yes |
aaa7 | no | no | no | no | - | yes | no | - | yes | - |
aaa8 | yes | yes | no | no | no | - | no | no | - | no |
aaa9 | - | - | no | no | yes | no | no | yes | no | yes |
aaa10 | no | no | yes | yes | - | no | yes | - | no | no |
aaa11 | yes | no | - | yes | no | no | yes | no | no | - |
Set #2 is a simple key which will have the same column names and similar values as set#1 but with an added suffix after the yes/no value that will vary:
column 1 | column 2 | column 3 | column 4 | column 5 | column 6 | column 7 | column 8 | column 9 | column 10 |
yes ab | yes ba | yes aa | yes ab | yes ab | yes ab | yes ab | yes aa | yes ab | yes ab |
no ab | no aa | no aa | no aa | no aa | no ab | no aa | no aa | no ab | no aa |
So if set #1 shows "yes" in column 1 this should convert to "yes ab" from set #2, and a "yes" in column 2 converts to "yes ba," and so on.
I need a conversion so that columns 1-10 in set #1 will show the values from columns 1-10 in set #2. I need this to be a robust flow because there are way more than 10 columns, data #2 will periodically update with new suffixes, and at times the data in set #1 may increase from "yes, no, -" to something like "yes, no, maybe, sometimes, often, rarely, -". Set #2 would update accordingly so the values would mirror set #1 but again with variable suffexes. Appreciate any help, thanks.
Hi AGilbert, thanks for this. I'm weak on REGEX formulae, can you explain what's happening in the formula tool here:
REGEX_Replace([Value], "(\w+)\s(\w+)", "$1")
reason I ask is I oversimplified my sample data, the two data sets are better represented here:
Set #1:
Data | column 1 | column 2 | column 3 |
aaa1 | yes | no | - |
aaa2 | yes | - | no |
Set #2:
column 1 | column 2 | column 3 |
yes widget ab | yes widget ab | yes widget aa |
no widget ab | no widget aa | no widget aa |
Set #1 is the same but Set #2 has added characters after the "yes / no" and the REGEX formula is dropping the "widget" when it creates the key column, so it's returning "yes xx" and I need "yes widget xx."
The "_widget_" characters are consistant so the variable suffix is the "aa/ab/xx" characters. Can we alter the REGEX formula to get the desired return?
Sure thing. At their core, regex patterns are used to match string data. The basic elements can define what character to match and how many of those characters to match.
For example, 'a' will match an 'a' and 'b' will match 'b'. To match one of many letters character classes can be defined with square brackets (example: [ab] or [a-z]). Because these are so common that predefined classes such as \w, \d, and \s, were created as a shorthand to match any alpha, any digit, and any whitespace characters respectively.
As for the match quantity, the elements above can be followed by a + for "one or more", a * for "zero or more", or ? for "zero or one".
So, the expression I gave is looking for one or more letters, a space, and another set of one or more letters.
Wrapping certain elements in parenthesis is called grouping. These capture the text an element matches, to be returned with "$1" (indicating the first captured group).
You should try out https://regex101.com/ . It's a great tool which give immediate, visual feedback for an expression.
The regex rabbit hole goes deep, and you can make some very strict expressions. Alternatively, like the example I gave you, expressions can be somewhat forgiving. You just need to consider your use case and how variable your input text may be.
In this case, if the text and schema is standardized, you could try "(\w+)\s(\w+)\s(\w+)" where "$1" = yes/no, "$2" = widget, and "$3" = aa/bb/ab. But this is keeping things pretty simple.