Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Help Parsing Nondelimited Variable Length Data - Not All Records Parsing with RegEx

mwarsh
5 - Atom

Hello, I am trying to parse string data into multiple columns at various byte positions, but I cannot get all rows of data to parse.

 

Example data:

ABC   DEF   GHI          JK             LMNO      P

QRSTU VWX YZ

ABC   DEF   GHI          JK             LMNO      P

 

RegEx:

(.{3})(.{3})(.{3})(.{3})(.{2})(.{37})

 

My example data is 51 byte positions. The P is position 51 in the first and 3rd rows. These rows parse properly. The second row (red) parses into all Null values. I want it to parse the data that's there and any data that is not there (spaces/null) would not be parsed.

 

My parsed output should be:

 

Column 1     Column 2    Column 3   Column 4    Column 5    Column 6

ABC              _ _ _             DEF            _ _ _            GH               I          JK             LMNO      P

QRS              TU_               VWX          _YZ              null              null

ABC              _ _ _             DEF            _ _ _            GH               I          JK             LMNO      P

 

( _ denotes a space/blank)

 

But instead it's unable to parse that 2nd (red) row. I'm getting null for all columns anytime a row/string of data does not extend to the farthest byte position. So my row in red in this example, does not parse.

 

Thoughts?

3 REPLIES 3
Thableaus
17 - Castor
17 - Castor

Hi @mwarsh 

 

Try this in your REGEX:

 

(.{0,3})(.{0,3})(.{0,3})(.{0,3})(.{0,2})(.{0,37})

 

Cheers,

mwarsh
5 - Atom

Thanks so much. That did the trick. Do the zero's essentially mean, parse the record even if there's nothing there? 

Thableaus
17 - Castor
17 - Castor

@mwarsh 

 

Yeah, it basically says parse a character (represented by the dot (.)) 0 up to 3 times.

Since this is greedy, it will always try to parse as much as it can. So when it doesn't find as many characters as it should, it just leaves it blank.

 

Cheers,

Labels