Search String for a Substring and return the preceding number value
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi,
I have a table in and two of the columns have a lot of text. I'm trying to get specific information from those columns that might come in different forms.
Here is an example of the text in 1 cell:
AHU-1 (Basement floor) - 4,280 CFM's w/VFD, The 15 HP VFD was added in 2014, 10HP on the return side. heating and cooling coils, ducted distribution, diffusers and plenum return. Poor access to unit.
I need to search for CFM and HP then I need the preceding number values. There may or may not be a space between the number and the acronym.
This particular example has two HP values. In this case I would want the larger number. So I would like two resulting fields, "CFM" with a value of 4,280 and "HP" with a value of 15.
Any help would be appreciated.
Thank you
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @ChrisB_dup_72
Great challenge here!
First step is replace commas with nothing (I did this with a formula tool. There are many other ways)
Then, we need to separate into columns what you'd like to extract. A RegEx tool, Tokenizing into rows, works really well here. I used the following as my statement: (\d+\s*(?:CFM|HP))
Find a number, which may or may not be followed by a space, then followed by CFM or HP. The ?: is an unmarked group, which is telling alteryx to look for either "CFM" or "HP" as a group of letters
With the example you gave, that gives 3 rows. Another RegEx tool extracts the number and text into separate columns. You could also use a formula tool and regex_replace to pull the numbers into one column, and the text into another. Just be sure to set the number column as Integer type.
A summarize tool finds the greatest number per text qualifier. If you'd like that to be in columns, add a Cross Tab tool afterwards.
Let me know if this helps.
Cheers!
Esther
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Ester,
I just got back on this workflow tonight. You're solution worked out beautifully. Thank you so much!
I'm new the RegEx tool. Is there a good source to interpret statements like (\d+\s*(?:CFM|HP))? I'd like to learn more about that because I'm going to need to use in on datasets similar to the one in this example.
Thanks again,
Chris
