Hi all,
Seeking guidance once again on how best to go about extracting a piece of information, I'm thinking with RegEx.
Below are a couple of examples of the sorts of strings that will be formatted. I'm unable to share the real sample but it will contain tens of thousand of rows. Each important piece of information is between brackets with the word "and" being the separator between these pieces.
My aim is extract all of the information relating to Safety Rating. Information on Safety Rating can appear twice such as in points 1-4 below. So in example 1 below, I would want it to take out:
In some strings information on Safety Rating will only appear once, as in points 5 and 6 below. So in example 5 below, I would only want to extract:
(Safety Rating (Low, Multi, F/I, Latest) is greater than 149)
Important to note, the Safety Rating piece can appear in a different position in different strings, e.g. not the final bracketed piece of information as in example 1.
1. (Housing Type in (Flat)) and (Country Of Property in (GERMANY)) and (Purchase Currency in (Euro)) and (Safety Rating (Low, Multi, F/I, Latest) = 149 or higher) and (Safety Rating (Low, Multi, F/I, Latest) = 350 or lower) and (Days Until Review Date is less than 10950)
2. (Housing Type in (House)) and (Country Of Property in (UK)) and (Purchase Currency in (Sterling)) and (Days Until Review Date = 7300 or lower) and (Safety Rating (Low, Multi, F/I, Latest) is greater than 150) and (Safety Rating (Low, Multi, F/I, Latest) = 349 or lower)
3. (Housing Type in (House)) and (Country Of Property in (UK)) and (Purchase Currency in (Sterling)) and (Days Until Review Date = 7300 or lower) and (Safety Rating (Low, Multi, F/I, Latest) = 149 or higher) (Safety Rating (Low, Multi, F/I, Latest) is less than 350)
4. (Housing Type in (House)) and (Country Of Property in (UK)) and (Purchase Currency in (Sterling)) and (Days Until Review Date = 7300 or lower) and (Safety Rating (Low, Multi, F/I, Latest) is greater than 149) and (Safety Rating (Low, Multi, F/I, Latest) is less than 350)
5. (Housing Type in (House)) and (Country Of Property in (UK)) and (Purchase Currency in (Sterling)) and (Days Until Review Date = 7300 or lower) and (Safety Rating (Low, Multi, F/I, Latest) is greater than 149)
6. (Housing Type in (Flat)) and (Country Of Property in (GERMANY)) and (Purchase Currency in (Euro)) and (Safety Rating (Low, Multi, F/I, Latest) = 350 or lower) and (Days Until Review Date is less than 10950)
Any help with this is greatly appreciated, please let me know if I can add any more colour to the above. Thank you!
Solved! Go to Solution.
Glad to help! Good luck with the testing.
This is a great use case for RegEx, and the structure in your examples is really helpful. You could use a pattern like the one below to extract all occurrences of the Safety Rating expressions regardless of their position:
This will match any string that begins and ends at the first closing parenthesis, capturing each instance even if there are two in one string.
If you're working in Alteryx, try using the RegEx Tool in Tokenize or Parse mode with that pattern. Set it to output multiple matches so it captures all relevant entries.
Let me know if you need help integrating that into your workflow!