Converting multiple fields of string text containing multiple decimal places into double
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello!
I need some help with cleaning & prepping a dataset that has multiple decimal point symbols and that is formatted as string into float/double datatype.
I have many fields I need to clean in my dataset (over 50 different test types) and there are other fields which are formatted as strings which I want to keep (e.g. test ID)
Sample Dataset
pH | Test Assay 1 | [...] | Test Assay 20 |
'7.421 | '99.334234.222 | [...] | '99.643987.0 |
'10.288 | '102.326771.0 | [...] | '106.156298.0 |
'6.222 | '108.124 | [...] | '100.523 |
'9.231 | '98.19299 | [...] | '99.7144 |
What I have done for now is to put in a multifield formula with the expression, Replace(_CurrentField_,',','') to remove the commas leading the numbers. I think I am supposed to use RegEx or another multifield formula to select the columns where I want to perform this transformation and keep the numerals to 3 "decimal place" (they are still formatted as strings), use a select tool to convert the columns of interest into doubles then round to 2d.p. using another multifield formula tool.
I'm completely new to RegEx and can't seem to figure out how to do this. Could anyone help please? Thanks!
Expected output:
pH | Test Assay 1 | [...] | Test Assay 20 |
7.42 | 99.33 | [...] | 99.64 |
10.29 | 102.33 | [...] | 106.16 |
6.22 | 108.12 | [...] | 100.52 |
9.23 | 98.19 | [...] | 99.71 |
Solved! Go to Solution.
- Labels:
- Datasets
- Expression
- Preparation
- Regex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
You should be able to get rid of the replace function you currently have and use the below formula in a Multi-Field formula. Select all of the fields you want to replace at the top of the configuration. See the output below.
Regex_REPLACE([_CurrentField_],"^(')(\d+\.\d{2})(.+)",'$2')
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I just realized you also want to round the value in the string to the nearest hundredth. The below formula should fix that and round the number correctly. You can also update the Dropdown from V_String to Double if you want those fields to be converted over to a number.
round(tonumber(Regex_REPLACE([_CurrentField_],"^(')(\d+\.\d{3})(.+)",'$2')),.01)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you! I had a crack at it myself yesterday to try and understand RegEx a bit better.
I found that this works:
REGEX_Replace(_CurrentField_,"^([0-9]+)\.([0-9]+)\.(.*)","$1.$2")
Followed by a check tool and another multifield formula tool to round. Appreciate the solution as it provides what I want with fewer tools :)
