Hi Alteryx experts,
I am noob on using regex to parse within strings and got the following issue when developing a workflow.
My sample input is as below. I know the patterns of strings are really messy and have multiple formats.
intervention_description |
Dose Level 1 = 1 x 10e6 T cells/kg; Dose Level 2 = 3 x 10e6 T cells/kg; Dose Level 3 = 1 x 10e7 T cells/kg. |
9 x 10^10 vector genomes (vg) of AAV-hAADC-2 in a single dose of 200 µL bilaterally infused over 4 striatal targets |
Participant assigned to a dose level of NK cells based on when joined study. Starting dose level of NK cells 1 x 10^7 NK cells/kg given by vein on Day 0. |
CERE-120 5.4 x 10^11 vg |
What i wanted to do is to parse out 10-related numbers from these strings. Desired output should be something like below:
intervention_description | Output |
Dose Level 1 = 1 x 10e6 T cells/kg; Dose Level 2 = 3 x 10e6 T cells/kg; Dose Level 3 = 1 x 10e7 T cells/kg. | 1 x 10e6,3 x 10e6,1 x 10e7 |
9 x 10^10 vector genomes (vg) of AAV-hAADC-2 in a single dose of 200 µL bilaterally infused over 4 striatal targets | 9 x 10^10 |
Participant assigned to a dose level of NK cells based on when joined study. Starting dose level of NK cells 1 x 10^7 NK cells/kg given by vein on Day 0. | 1 x 10^7 |
The only "common" thing i found across the strings are that my desired output is something relating to "10". So i was trying to write a syntax that has "10" in it and trying to parse out numbers around "10", not sure if this should be the right direction and how exactly the regex expression should look like. Any guidance would be really appreciated!
Thanks in advance for your help!
Best, XO
|
Gelöst! Gehe zu Lösung.
Hi @apoptoticbody,
Based on your desired output, it looks like the same REGEX_Replace expression can be used for each record except for your first one starting with "Dose Level". Thus, I think this formula should do the trick:
IIF(left([input], 10) = "Dose Level", REGEX_Replace([input], "(.*)(\d+\sx\s\d+e\d)(.*)(\d+\sx\s\d+e\d)(.*)(\d+\sx\s\d+e\d)(.*)", "$2" + ", " + "$4" + ", " + "$6"), REGEX_Replace([input], "(.*)(\d+\sx\s\d+\^\d+)(.*)", "$2"))
This formula is giving me these results for your sample data:
Note: If your data to be parsed ever includes decimals (e.g., "9.5 x 10^10" for record 2, for example), you would need to tweak the syntax a bit more to accommodate.
Attached the workflow as well for your reference. Hope this helps!
PS - I would highly recommend using regexr.com when building RegEx syntax for Alteryx functions. It lets you provide sample text to be parsed and instantly highlights the character(s) specified by your expression as you build it, making it a lot easier to see what you are doing.
Hi @joshbennett
Thank you a ton! You rock! and also thank you for the suggestion on checking on the regex.com for syntax. i am now trying them.