Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
ALTER.NEXT:

Join us on Dec 2 for a half-day virtual analytics + data science event!
US & CA customers only

SAVE YOUR SPOT
It's the most wonderful time of the year - Santalytics 2020 is here! This year, Santa's workshop needs the help of the Alteryx Community to help get back on track, so head over to the Group Hub for all the info to get started!

Data cleansing

Highlighted
7 - Meteor

Hi All,

 

I would need some guidance on the following question of data preparation.

 

I have a text file which is in the below format. How do i need to use the RegEx or anyother method to split them into 6 columns ? The Text files looks like tabbed format but doesnt read out Tab.

 

Example:

 

John,Rambo                   9999111119        100.12    12345 xx1111    123456

 

Thanks in advance.

 

 

Regards,

VR

Highlighted
12 - Quasar

@vrbravo While you could use regex to do this, there's no need. It appears the data is split by spaces, and so a text to columns tool would accomplish the same task using \s as the delimiter. Since you mentioned tabs, you could also include \t in the delimiter as well, just using \s\t.

 

 

Given there are multiple spaces below, I would also check the "ignore empty columns" option.

 

Good luck!

Highlighted
6 - Meteoroid

Attached workflow for your reference. I hope this helps

Highlighted
7 - Meteor

@neilgallen - Thank you for the response. It worked well until there are some names such as megan jr, fox. Then the columns are shifting to next columns. I think the name column is kind of random. Is there any other way to split columns ?

Highlighted
7 - Meteor

Thank you a lot Santhosh, This is a great solution. Could you please help me with the logic you have used ?

Highlighted
12 - Quasar

@vrbravo in that situation using RegEx may be your best option. There are different ways to go about this depending on the complexity of your input data.

 

Can you post a larger example input dataset?

Highlighted
6 - Meteoroid

We used RegEx “Parse” output method. Based on the input pattern, the below expression was built and each expression in bracket () will be separated into new fields and  the non-bracket expression will be ignored in this case space\s+

 

(\w*\W*\w+)\s+(\d+)\s+(\d+\W\d+)\s+(\d+)\s+(\w+)\s+(\d+)

 

  • \w - Matches any letter, digit or underscore. Equivalent to [a-zA-Z0-9_]
  • \s - Matches any space, tab or newline character
  • \W -Matches anything other than a letter, digit or underscore
  • \d -Matches any decimal digit. Equivalent to [0-9]
  • * - Matches zero or more consecutive characters
  • + Matches one or more consecutive `a` characters
Labels