We have extended our Early Bird Tickets for Inspire 2023! Discounted pricing goes until February 24th. Save your spot!

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer and Intelligence Suite.
SOLVED

Urgent Data cleansing

Anniefang
7 - Meteor
Hi there, I had an issue trying to cleanse a data for comparison. The data have multiple columns that contain concatenates, e.g. some rows in column C show “ Argentina|2010|March|Economy” and some rows in column D show “Spain|2023|June|growth”.. I did multiple text to column bit then it created a lot of blank fields, and sometimes the columns don’t match their respect headers (e.g need to delete some of the blank cells). Just wondering if there’s any quicker way of doing this cleansing in Alteryx?
11 REPLIES 11
NJT
11 - Bolide

Would need a bit more information to know if this would help you out, but if you're already using Text to Column I'm guessing there is more to the story here. Can you put up some sample data? For me using the example you describe here two text to column tools set to parse "|" in column C and then in Column D seems to do the trick if you're talking about the same row of data. 

Column C                                       Column D

Argentina|2010|March|Economy   Spain|2023|June|growth

If it's two rows of data then it gets more complicated and I think you can handle it with a multi-field formula tool or Regex, but I'm still trying to figure that one out.

Two text to column tools gives me

 

Capture.PNG

 

My guess is you're trying to get column C and D to parse into the same columns? (i.e. Country, Year, Month, Measure)

 

 

 

Anniefang
7 - Meteor

Hey!

 

Thanks for the reply! 

 

The data set was quite big, and I did three times of text to column as the concatenates are showing in both column B, C, D. However, as you have done, it ended up with many null cells. I was only looking for the year, month, country and the long number, so was wondering if I could extract these using Regex but am not sure how to use the expressions.

 

Uploaded samples with concatenates showing in each column. 

jasperlch
12 - Quasar

Hi @Anniefang

 

A combination of Transpose and Cross Tab together with Regex_Match function in Formula should do the job.

 

Capture.PNG

estherb47
15 - Aurora
15 - Aurora

Hi @Anniefang!

 

Assuming the first row is actually data, and not headings, here's my solution. Easiest way to deal with things being in different columns is to use the transpose tool to get them all into one column. A filter tool can get rid of the extra rows, including nulls, that result.
image.png

Cheers!!

 

Esther

Sushantmutreja1
7 - Meteor

Hi @Anniefang,

 

I just used the combination of Formula Tool and Text to Column Tool, please have a look, if this solves your problem

 

Data Cleansing.png

 

Thanks 

Sushant

Anniefang
7 - Meteor

@EstherB47

 

It worked! But it seems to have just parsed the very long column? Any way e.g. GDP/CPI/M/XD can be singled out as well?  

Anniefang
7 - Meteor

Thanks this worked too! 

 

Same question though - so this only parsed out the year, month and long number, any chance to extract the country and measure (e.g. GDP/M/CPI) as well? 

Sushantmutreja1
7 - Meteor

Yes you can, just need to increase the "Number of Columns" from 3 to the number you want your desired result (you just need to click on Text to Column tool and on the left side change the number 3 to what is required).

Anniefang
7 - Meteor

Worked amazingly! Thanks a lot! 

Labels