Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Solution for Parsing Large Non-columnar Text Input

Swest4190
6 - Meteoroid

Good Morning, 

 

I'm in need of some assistance. I have a text input that is 19,012 characters long that does not have any line breaks. Screenshot provided below and example attached:

 

Capture.PNG

 

The words "FROM", "TO", & "FOR" need to be replaced with a column break and the word "HOURS" needs to be replaced with a line break in order to properly format this information. 

 

Problem #1

-How should i input this data? i have tried using the input tool but it always asks to parse the data first. I have tried ingesting as a csv with a \0 delimiter as suggested in some threads but i get a field truncation warning when i do this and it cuts a majority of my data out. 

 

Capture2.PNG

 

Problem #2 

-Im assuming the most efficient way of parsing the data is going to be REGEX. I am not very well versed in REGEX and am curious how one would go about properly parsing this data via REGEX. 

 

Thanks!

Steve

6 REPLIES 6
MarqueeCrew
20 - Arcturus
20 - Arcturus

@Swest4190,

 

i will take a look at this for you, but I'm sure that I'll have company.  I'll likely read it as CSV with \0 as a delimiter, but I'll also change the default 254 length to 20,000.  Then I will avoid RegEx and use replace() commands to get a PIPE in as a delimiter.  Then figure out how to handle line breaks.

 

If when I look at the data I find a simple fixed width solution, I'll switch to that path.

 

Cheers,

 

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
BenMoss
ACE Emeritus
ACE Emeritus
Problem 1 is resolvable really easy; on the input tool one of the options is the field length, which by default is set to 254, change this to 20000 and your data will stream in fine.

Problem 2 is a bit more complex and I dont have a laptop by my side atm so let's see if others can help with this one!

Ben
BenMoss
ACE Emeritus
ACE Emeritus
Problem 2 is a bit more simpler than I thought.

1. Replace the word hours with a unique definite, say |

Use the text to columns tool and set the | as the delimiter, then choose split to rows.

2. In your now many row list, replace your other key words with |; use the text to columns tool again but this time split to columns as is the default.

Ben
MarqueeCrew
20 - Arcturus
20 - Arcturus

@Swest4190,

 

I converted the terms to | and HOURS to ^. I was then able to simply use the text to columns tool.  You'll see that in the input tool I set the field length and set the first row to data.

 

capture.jpg

 

No RegEx formulas were used.

 

Cheers,


Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
MarqueeCrew
20 - Arcturus
20 - Arcturus

@BenMoss,

 

This time I have a computer handy to solve along with you.  Hopefully the pic is something that you can put your seal of approval on and we can share in this solve.

 

Cheers,

 

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
Swest4190
6 - Meteoroid

Excellent! this perfect. Thank you so much!

Labels