Hello!
I need some help with cleaning & prepping a dataset that has multiple decimal point symbols and that is formatted as string into float/double datatype.
I have many fields I need to clean in my dataset (over 50 different test types) and there are other fields which are formatted as strings which I want to keep (e.g. test ID)
Sample Dataset
pH | Test Assay 1 | [...] | Test Assay 20 |
'7.421 | '99.334234.222 | [...] | '99.643987.0 |
'10.288 | '102.326771.0 | [...] | '106.156298.0 |
'6.222 | '108.124 | [...] | '100.523 |
'9.231 | '98.19299 | [...] | '99.7144 |
What I have done for now is to put in a multifield formula with the expression, Replace(_CurrentField_,',','') to remove the commas leading the numbers. I think I am supposed to use RegEx or another multifield formula to select the columns where I want to perform this transformation and keep the numerals to 3 "decimal place" (they are still formatted as strings), use a select tool to convert the columns of interest into doubles then round to 2d.p. using another multifield formula tool.
I'm completely new to RegEx and can't seem to figure out how to do this. Could anyone help please? Thanks!
Expected output:
pH | Test Assay 1 | [...] | Test Assay 20 |
7.42 | 99.33 | [...] | 99.64 |
10.29 | 102.33 | [...] | 106.16 |
6.22 | 108.12 | [...] | 100.52 |
9.23 | 98.19 | [...] | 99.71 |
Solved! Go to Solution.
You should be able to get rid of the replace function you currently have and use the below formula in a Multi-Field formula. Select all of the fields you want to replace at the top of the configuration. See the output below.
Regex_REPLACE([_CurrentField_],"^(')(\d+\.\d{2})(.+)",'$2')
I just realized you also want to round the value in the string to the nearest hundredth. The below formula should fix that and round the number correctly. You can also update the Dropdown from V_String to Double if you want those fields to be converted over to a number.
round(tonumber(Regex_REPLACE([_CurrentField_],"^(')(\d+\.\d{3})(.+)",'$2')),.01)
Thank you! I had a crack at it myself yesterday to try and understand RegEx a bit better.
I found that this works:
REGEX_Replace(_CurrentField_,"^([0-9]+)\.([0-9]+)\.(.*)","$1.$2")
Followed by a check tool and another multifield formula tool to round. Appreciate the solution as it provides what I want with fewer tools :)