Alteryx Designer Desktop Discussions

lharris17 · ‎03-06-2018

Hello,

I am trying to parse the data below into several fields based on a specific condition.

Data:

024084W ofnfj
004p hdhdjkhd
008078g kdjhdkkd
032055W kldf

I want to be able to say if a specific line of data follow a certain pattern then treat it a certain way. For example, if "024084W ofnfj" follows this pattern (\d\d\d\d\d\d[A-z]\s) then Field #1 = left([data],3) and Field #2 = substring([Data],3,3). If "004p hdhdjkhd" follows this pattern (\W\d\d\d[A-z]\s[A-z]) then Field #1 = left([Data],3) and Field #2 = null

Desired Results:

Data Field #1 Field #2

024084W ofnfj 024 084
004p hdhdjkhd 004 null()
008078g kdjhdkkd 008 078

Any advice on the issue would be greatly appreciated. Thanks in advance!

BenMoss · ‎03-06-2018

Here is how I would approach the problem with the regex match function.

Ben

MarqueeCrew · ‎03-06-2018

Step 1:
Regex_replace([field],"(\d{3)\d{3})\w\s\w+",'$1|$2')

Step 2:
Text to columns

I don't know how to detect the null placement in front of the digits based on your post.

This should at least help.

Cheers,
Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.

NicoleJohnson · ‎03-06-2018

Here's another option using RegEx (modified a bit from the examples you provided in your submission)... I created the various RegEx expressions that needed to be matched to, then appended them to the original data... I then used RegEx_Match formula to determine which data matched which expressions, then created Fields 1 & 2 based on the expressions matched.

Hope that helps to have another option!

Cheers,

NJ

lharris17 · ‎03-06-2018

That worked perfectly, thank you so much! One more question: If there is a standard pattern that the data needs to follow, can I reformat the lines of data that do not follow that pattern based on the information within that line?

Example:

Proper pattern: (\d{6}[A-z]\s).*

Data: Match:

024084W ofnfj Y
004p hdhdjkhd N
008078g kdjhdkkd Y
032055W kldf Y

For the line that does not match, would I use a regex to reformat it? And if so, how would that look?

I found REGEX_Replace(String, pattern, replace) in the formula bar, but I don't know if using this would help

BenMoss · ‎03-06-2018

In the instance you have highlighted how would the resulting reformat look?

It would be very easy to create a tag column using the formula tool a formula like...

if left(field,3)="\d\d\d" and substring(field,3,3)="\d\d\d" then "Y" else "N" endif

lharris17 · ‎03-06-2018

Hi Ben,

In the instance of the line that doesn't match, I would want to reformat it to be like "004000P jhbkbsfbsk". I guess I could tag it like you said and use a regular formula to say:

if [tag]='Y' then left([field],3)+'000'+substring([field],4,30). This is not the most efficient way I'm sure, but if I am thinking about it correctly, this should result in what I am looking for.

lharris17 · ‎03-06-2018

Basically, what I want to accomplish from this is taking data that follows the appropriate format and parsing it into separate fields. The data that is not in the correct format needs to be fixed and then parsed into the fields specified earlier.

BenMoss · ‎03-06-2018

That is how I would approach it too!

I wouldn't worry about efficiency unless it's a super large dataset you are working with. Solve the problem in the way with which you think about it!

Ben

Alteryx Designer Desktop Discussions

Using RegEx to Parse Fields