Regex Special Criteria Extract
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I am trying to have 3 Columns built using Regex. but currently I can only have 2 at a time in one regex
as shown below
How Can I have all the 3 column together without having having the issue of the percentage column disappear when I add currency.
- Labels:
- Regex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Jake,
I'm hoping I've solved both problems for you, but I'm not certain as I've made some assumptions about the data.
Firstly, use the following regex in your tool: (\d+%)|((?:\d+,)?\d+\.\d{2}\s.*?)\s And change the name value to joint currency (or something else, or leave it the same, it's up to you). Next, add a split to columns tool on Joint Currency and select 2 as the number of columns. Finally use a select to create more sensible names.
Secondly, my assumptions are as follows:
1. the percentage appears in place of the currency information (i.e. you never have both a percentage and currency information)
2. the currency information is always NUMBER [SPACE] [CURRENCY CODE] (i.e. currency code appears AFTER the number)
Please see the attached workflow with my suggested configurations and let me know if the workflow doesn't solve the problem
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
For the first issue my code is working. I am just trying to make it more efficient by having only one regex tool instead of 2. I need help Combining the first 2 ( %, $) ---> (\d+%)|(\w\d,\d+.\d+|\s\d,\d+.\d+) and Currency column --> \d(\s\u+) to have 3 columns pop up without messing the data so I just need a way to add (\d+%)|(\w\d,\d+.\d+|\s\d,\d+.\d+) |\d(\s\u+) without currency picking up wrong criteria and without giving null for 45% in % column
How did you solve the second issue ?
can you break down explanation for this
(\d+%)|((?:\d+,)?\d+\.\d{2}\s.*?)\s
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
The image shows a Regular Expression (Regex) configuration for data extraction. The Regex pattern being used is:
Breakdown of the Regex Pattern:
(\d+%) → Captures percentage values (e.g., "45%").
(\d{1,3}(,\d{3})*(\.\d+)?) → Captures numeric values, allowing:
Thousands separator (e.g., "10,216.00").
Decimal values (e.g., "400.50").
(\s+) → Captures whitespace between the number and the currency.
(\w+) → Captures the currency (e.g., "USD", "Bcus").
Extracted Fields in the Output:
% → Percentage values.
value → The extracted numerical values.
currency → The extracted currency text.
Observations from Output Table:
The regex correctly extracts:
"45%" from "Bcus - Susie1 Bcus Plank 45% Alllllal".
"400.50 USD" from "Bcus - automobil Bcus, 400.50 USD Alllsali".
"10,216.00 USD" from "Bcus - submit Bcas, 10,216.00 USD Adddad: dfkl...".
Possible Enhancements:
Allow Spaces in Currency (if needed):
Modify (\w+) to ([A-Za-z\s]+) to accommodate multi-word currencies like "Hong Kong Dollar".
Handle Edge Cases Like Negative Values:
Update numeric pattern to (-?\d{1,3}(,\d{3})*(\.\d+)?) to capture negative numbers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Jake,
My approach makes 2 columns to ensure that 45% doesn't give null, and then I split the 2nd column by a space to create the 3rd column.
The Regex does the following:
(\d+%)|((?:\d+,)?\d+\.\d{2}\s.*?)\s
(\d+%) - 1st capture group is capturing the percentage
| - or
((?:\d+,)?\d+\.\d{2}\s.*?)\s - 2nd capture group, (?:\d+,)?\d+\.\d{2} has a non-capture group and the regex checks if the currency value has a comma in it, \s.*?)\s is matching everything in between the space after the currency and the next space that appears after that, so the 2nd capture group captures the currency value, a space, and the information after the space and before the next space.
