Hello,
I am trying to parse the data below into several fields based on a specific condition.
Data:
024084W ofnfj
004p hdhdjkhd
008078g kdjhdkkd
032055W kldf
I want to be able to say if a specific line of data follow a certain pattern then treat it a certain way. For example, if "024084W ofnfj" follows this pattern (\d\d\d\d\d\d[A-z]\s) then Field #1 = left([data],3) and Field #2 = substring([Data],3,3). If "004p hdhdjkhd" follows this pattern (\W\d\d\d[A-z]\s[A-z]) then Field #1 = left([Data],3) and Field #2 = null
Desired Results:
Data Field #1 Field #2
024084W ofnfj 024 084
004p hdhdjkhd 004 null()
008078g kdjhdkkd 008 078
Any advice on the issue would be greatly appreciated. Thanks in advance!
Solved! Go to Solution.
Here's another option using RegEx (modified a bit from the examples you provided in your submission)... I created the various RegEx expressions that needed to be matched to, then appended them to the original data... I then used RegEx_Match formula to determine which data matched which expressions, then created Fields 1 & 2 based on the expressions matched.
Hope that helps to have another option!
Cheers,
NJ
That worked perfectly, thank you so much! One more question: If there is a standard pattern that the data needs to follow, can I reformat the lines of data that do not follow that pattern based on the information within that line?
Example:
Proper pattern: (\d{6}[A-z]\s).*
Data: Match:
024084W ofnfj Y
004p hdhdjkhd N
008078g kdjhdkkd Y
032055W kldf Y
For the line that does not match, would I use a regex to reformat it? And if so, how would that look?
I found REGEX_Replace(String, pattern, replace) in the formula bar, but I don't know if using this would help
In the instance you have highlighted how would the resulting reformat look?
It would be very easy to create a tag column using the formula tool a formula like...
if left(field,3)="\d\d\d" and substring(field,3,3)="\d\d\d" then "Y" else "N" endif
Hi Ben,
In the instance of the line that doesn't match, I would want to reformat it to be like "004000P jhbkbsfbsk". I guess I could tag it like you said and use a regular formula to say:
if [tag]='Y' then left([field],3)+'000'+substring([field],4,30). This is not the most efficient way I'm sure, but if I am thinking about it correctly, this should result in what I am looking for.
Basically, what I want to accomplish from this is taking data that follows the appropriate format and parsing it into separate fields. The data that is not in the correct format needs to be fixed and then parsed into the fields specified earlier.
That is how I would approach it too!
I wouldn't worry about efficiency unless it's a super large dataset you are working with. Solve the problem in the way with which you think about it!
Ben