Alteryx Designer Desktop Discussions

jlwhittaker · ‎12-05-2018

Hello,

I have a data set with customer names, some of them as long as 200 character lengths. I want to create 3 new columns from the customer name. Column 1 character limit 40, Column 2 character limit 40, Column 3 remaining characters. I've tried using a substring but i'm running into the issue of the columns breaking mid-word.

For Example:

Data Field: I have a data set with customer names, some of them as long as 200 character lengths

Column 1: I have a data set with customer names, (39 characters)

Column 2: some of them as long as 200 character (37 characters)

Column 3: lengths (remaining characters)

Is there a way to break on the delimiter (space) <40 characters?

I would provide a better example of a field i'm actually using but for security purposes I cannot. Any help would be greatly appreciated. I've tried looking into the Regex tool but i haven't been able to wrap my head around it.

Thanks,

J

ZacharyM · ‎12-05-2018

Would it work to break on the delimiter (space), and then pare down the columns into the necessary lengths afterwards?

For example;

[50 character first name] [60 character last name] [15 remaining characters] -> Break into three columns based on the space

Then in step two, throw out 10 characters from the First name, 20 from the Last Name, and 12 from the 15 remaining?

jlwhittaker · ‎12-05-2018

How would I go about doing that? Apologies, I'm fairly new to Alteryx and watching the tutorials on Regex to see if I can find a solution.

ZacharyM · ‎12-05-2018

Please dont apologize! I've mapped it out in the attachment.

Basically you would use the Split to Columns to split into 2 columns, then you would use a combination of LENGTH(), LEFT(), and RIGHT() functions to take off anything over 40 and put it into another field for the leftovers.

See the attached and let me know if this is what you're looking for - if it is, please mark this as the accepted solution so people know you've been helped!

Cheers!

Zak

jlwhittaker · ‎12-05-2018

Thanks for this! This is getting me closer to my solution. My only issue is some of the customer names aren't your standard (First Name, Last Name). Ideally what I'm trying to do is take my main data and count 40 characters from the left. If 40 characters from the left is in the middle of a word, I'd like to stop at the previous space delimiter and create column output #1. Then count 40 characters from where it left off and create column output #2, same rule as if 40 characters from that spot lands in the middle of another text, stop at the previous space delimiter. Then take the rest of the text and put it into column #3, no character limitations.

I'm having issues with this as I'm trying to analyze about 900 or so fields and apply a solution that wouldn't require me. But now that I think of it, I'm going to try and filter the data to records >40 characters and see if I can slim down the number of fields. That might make the task a little less daunting.

Thank you for your help!

jdunkerley79 · ‎12-05-2018

This can be done using a Regex tool in Parse mode.

This expression should work

^(.{0,40}) (.{0,40}) (.*)$

I attach a sample with fake data.

It will fail if names are greater than 40 characters long as the regex can't then break it up.

The attached version will put the whole string in third field if it can't be parsed as first name > 40 chars

Claje · ‎12-05-2018

I found a few issues/inconsistencies with regular expressions for names that had really long segments - at least based on how I interpreted the business rules.

My understanding:

If there is a name in the first or second name which is longer than 40 characters, that whole name should be in the third field.

As such, I built out a workflow (which can probably be optimized more than this is) which will split out the data.

To do this, I split the data into rows using space as a delimiter, and then concatenated these strings together, and calculated the length each step along the way. I then assigned each of these a column number based on overall length, and took the LAST value per record based on column number.

I've attached a workflow for this.

Hope this helps!

jlwhittaker · ‎12-05-2018

Thanks! This worked perfectly.

jlwhittaker · ‎12-05-2018

I tested both of these solutions and they worked on my data set. Thanks everyone! This was incredibly helpful and saved me tons of time and stress!

Alteryx Designer Desktop Discussions

Parsing string by character limit without breaking mid-word