Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parsing string by character limit without breaking mid-word

jlwhittaker
6 - Meteoroid

Hello, 

 

I have a data set with customer names, some of them as long as 200 character lengths. I want to create 3 new columns from the customer name. Column 1 character limit 40, Column 2 character limit 40, Column 3 remaining characters. I've tried using a substring but i'm running into the issue of the columns breaking mid-word. 

 

For Example: 

Data Field: I have a data set with customer names, some of them as long as 200 character lengths

Column 1: I have a data set with customer names, (39 characters)

Column 2: some of them as long as 200 character (37 characters)

Column 3: lengths (remaining characters)

 

Is there a way to break on the delimiter (space) <40 characters? 

 

I would provide a better example of a field i'm actually using but for security purposes I cannot. Any help would be greatly appreciated. I've tried looking into the Regex tool but i haven't been able to wrap my head around it. 

 

Thanks,

 

J

8 REPLIES 8
ZacharyM
Alteryx Alumni (Retired)

Would it work to break on the delimiter (space), and then pare down the columns into the necessary lengths afterwards?

 

For example;

 

[50 character first name] [60 character last name] [15 remaining characters] -> Break into three columns based on the space

 

Then in step two, throw out 10 characters from the First name, 20 from the Last Name, and 12 from the 15 remaining?

jlwhittaker
6 - Meteoroid

How would I go about doing that? Apologies, I'm fairly new to Alteryx and watching the tutorials on Regex to see if I can find a solution. 

ZacharyM
Alteryx Alumni (Retired)

Please dont apologize! I've mapped it out in the attachment.

 

Basically you would use the Split to Columns to split into 2 columns, then you would use a combination of LENGTH(), LEFT(), and RIGHT() functions to take off anything over 40 and put it into another field for the leftovers.

 

See the attached and let me know if this is what you're looking for - if it is, please mark this as the accepted solution so people know you've been helped!

 

Cheers!

Zak

jlwhittaker
6 - Meteoroid

Thanks for this! This is getting me closer to my solution. My only issue is some of the customer names aren't your standard (First Name, Last Name). Ideally what I'm trying to do is take my main data and count 40 characters from the left. If 40 characters from the left is in the middle of a word, I'd like to stop at the previous space delimiter and create column output #1. Then count 40 characters from where it left off and create column output #2, same rule as if 40 characters from that spot lands in the middle of another text, stop at the previous space delimiter. Then take the rest of the text and put it into column #3, no character limitations. 

 

I'm having issues with this as I'm trying to analyze about 900 or so fields and apply a solution that wouldn't require me. But now that I think of it, I'm going to try and filter the data to records >40 characters and see if I can slim down the number of fields. That might make the task a little less daunting. 

 

Thank you for your help! 

jdunkerley79
ACE Emeritus
ACE Emeritus

This can be done using a Regex tool in Parse mode.

 

This expression should work

^(.{0,40}) (.{0,40}) (.*)$

I attach a sample with fake data.

It will fail if names are greater than 40 characters long as the regex can't then break it up.

 

The attached version will put the whole string in third field if it can't be parsed as first name > 40 chars

Claje
14 - Magnetar

I found a few issues/inconsistencies with regular expressions for names that had really long segments - at least based on how I interpreted the business rules.

 

My understanding:

If there is a name in the first or second name which is longer than 40 characters, that whole name should be in the third field.

 

As such, I built out a workflow (which can probably be optimized more than this is) which will split out the data.

To do this, I split the data into rows using space as a delimiter, and then concatenated these strings together, and calculated the length each step along the way.  I then assigned each of these a column number based on overall length, and took the LAST value per record based on column number.


I've attached a workflow for this.

 

Hope this helps!

jlwhittaker
6 - Meteoroid

Thanks! This worked perfectly. 

jlwhittaker
6 - Meteoroid

I tested both of these solutions and they worked on my data set. Thanks everyone! This was incredibly helpful and saved me tons of time and stress! 

Labels