Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parsing a set of numbers from a string

apoptoticbody
7 - Meteor

Hi Alteryx experts,

 

I am noob on using regex to parse within strings and got the following issue when developing a workflow.

 

My sample input is as below. I know the patterns of strings are really messy and have multiple formats.

intervention_description
Dose Level 1 = 1 x 10e6 T cells/kg; Dose Level 2 = 3 x 10e6 T cells/kg; Dose Level 3 = 1 x 10e7 T cells/kg.
9 x 10^10 vector genomes (vg) of AAV-hAADC-2 in a single dose of 200 µL bilaterally infused over 4 striatal targets
Participant assigned to a dose level of NK cells based on when joined study. Starting dose level of NK cells 1 x 10^7 NK cells/kg given by vein on Day 0.
CERE-120 5.4 x 10^11 vg

 

What i wanted to do is to parse out 10-related numbers from these strings. Desired output should be something like below:

intervention_descriptionOutput
Dose Level 1 = 1 x 10e6 T cells/kg; Dose Level 2 = 3 x 10e6 T cells/kg; Dose Level 3 = 1 x 10e7 T cells/kg.1 x 10e6,3 x 10e6,1 x 10e7
9 x 10^10 vector genomes (vg) of AAV-hAADC-2 in a single dose of 200 µL bilaterally infused over 4 striatal targets9 x 10^10
Participant assigned to a dose level of NK cells based on when joined study. Starting dose level of NK cells 1 x 10^7 NK cells/kg given by vein on Day 0.1 x 10^7

 

The only "common" thing i found across the strings are that my desired output is something relating to "10". So i was trying to write a syntax that has "10" in it and trying to parse out numbers around "10", not sure if this should be the right direction and how exactly the regex expression should look like. Any guidance would be really appreciated!

 

Thanks in advance for your help!

 

Best,

XO

 

 

 
2 REPLIES 2
joshbennett
11 - Bolide
11 - Bolide

Hi @apoptoticbody,

 

Based on your desired output, it looks like the same REGEX_Replace expression can be used for each record except for your first one starting with "Dose Level". Thus, I think this formula should do the trick:

 

IIF(left([input], 10) = "Dose Level", REGEX_Replace([input], "(.*)(\d+\sx\s\d+e\d)(.*)(\d+\sx\s\d+e\d)(.*)(\d+\sx\s\d+e\d)(.*)", "$2" + ", " + "$4" + ", " + "$6"), REGEX_Replace([input], "(.*)(\d+\sx\s\d+\^\d+)(.*)", "$2"))

 

 

This formula is giving me these results for your sample data:

 

DextersLaboratory_0-1596537571769.png

 

Note: If your data to be parsed ever includes decimals (e.g., "9.5 x 10^10" for record 2, for example), you would need to tweak the syntax a bit more to accommodate. 

 

Attached the workflow as well for your reference. Hope this helps!

 

PS - I would highly recommend using regexr.com when building RegEx syntax for Alteryx functions. It lets you provide sample text to be parsed and instantly highlights the character(s) specified by your expression as you build it, making it a lot easier to see what you are doing. 

apoptoticbody
7 - Meteor

Hi @joshbennett 

 

Thank you a ton! You rock! and also thank you for the suggestion on checking on the regex.com for syntax. i am now trying them.

Labels