Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

How to parse words from numbers

nrubner1
7 - Meteor

I have extracted a dataset from a PDF which leaves everything in the first cell of each line. The data can easily be delimited the text to columns function using a space a space as the delimiter. However, in the middle of the dataset we have the company name, and then further numbers. When I delimit using a space, it splits the company name up also so the numbers become unaligned. Is it possible to use the regex tool to parse a word from a number to prevent this from happening? Example below. Thanks!

 

100 200 300 Alteryx 400 500 
100 200 300 The Alteryx Community 400 500
100 200 300 Alteryx Resources 400 500
5 REPLIES 5
DavidP
17 - Castor
17 - Castor

If you know the 1st 3 and last 2 columns are numbers, you can use Regex parse like this:

 

regex parse.png

Claje
14 - Magnetar

Hi,


Try the following RegEx

 

[\d\s]*([^\d]+)[\d\s]*

This will look for any number of numbers and spaces, until it finds a non-numeric value.  Then it will include all characters until a digit is found again, and set this as a Marked Group.

Use the Parse output method in the RegEx tool to see if this works for your use case.

 


From there you have a few options:

 

You can make the regular expression more complicated (leverage the "Tokenize" method and build an expression that delimits on spaces only if they are next to a digit)

 

You can use the Replace function in a regular formula, to replace the company name with nothing, and then leverage text to columns like you were before on the remaining values.

 

Hope this helps!

nrubner1
7 - Meteor

Thanks for replying so quickly. Unfortunately this has created several new columns but they are all [Null]. Some of the numbers are in the thousands with , (e.g. 1,924) - could this affect the regex formula? Or does the field need to be in a certain data format to process perhaps?

Claje
14 - Magnetar

Yes - having other characters in the numeric values could impact how this script works.


Here's an example that also looks at commas and decimals as being part of numbers (as long as they are not within a company name)

 

[\d\s,\.]*([^\d]+)[\d\s,\.]*

the only change between this and the prior script I had suggested is the addition of ",\." before and after the company name parse.  the "\." says to look for a decimal character, as "." is a RegEx special character.

nrubner1
7 - Meteor

Thanks @Claje, with a couple of extra find/replaces and formulas to remove some punctuation it's pretty much working. Only thing left is to stop the word picking up the brackets of negative numbers that follow the company name, but hopefully I can fix this with something else!

 

Thanks very much :)

Labels