Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Best way to approach cleaning up a file with multiple similar columns

irmb
7 - Meteor

HI newbie here. Hello, team members. I am working on a file that should be easy to clean up but I am trying to find the most efficient way to do so.  The file has multiple columns but this are the critical ones (in this format):

Employee idlast namefirst nameCourse A title- statusCourse A title- completion dateCourse B title - statCourse B title- completion date Course C title- stCourse C- completion date
1234blueivycompleted12/8/19In progress   
1234blueivy  completed12/8/19  
1234blueivy    In progress 

 

 

 

I want it cleaned like this:

Employee idlast namefirst nameCourse titlestatuscompletion date

 

The biggest issue are these:

a) there are  about 10-15 columns with the completion status and an additional 10-15 columns with the title of the course and the completion date. The course title columns are not uniform. For example, sometimes the status column is named "Course title" stat, sometimes "Course title" status", sometimes the column has the course title but no "status" in the column's name. Same with the "completion" heading. I tried using Regex to first cleaning up the titles so they would be uniform and then join every thing using the column name. But that did not work because I could not figure out how to use regex to remove all unwanted characters and have all columns the same name.

 

b) The other issue is, there are several records for the one person. So, sometimes we end up with having about 30 records for the person to indicate their status for the 10 courses, because each record indicates the completion of the course.

 

I hope this is clear. I tried using transpose and formula using regex but I am a bit stuck now.

 

Thanks

 

15 REPLIES 15
MichalM
Alteryx
Alteryx

@irmb 

 

Would it be safe to assume that

 

  • hyphen is used consistently to separate a course title and the rest of the field (status/completion date)?
  • status is always followed by completion date for the given course?
MichalM
Alteryx
Alteryx

If the above is the case, this is how I'd go around approaching the problem - workflow attached

 

course-data.png

dho111
7 - Meteor

I created Record ID for each row and did not remove empty rows.

 

course_solution.JPG

irmb
7 - Meteor

Thanks, I think the last approach may work. It is similar to what I started. Let me try it out.

estherb47
15 - Aurora
15 - Aurora

Hi @irmb 

 

OK, so with the similar columns, I wanted to create a workflow that can easily adjust based on number of Courses. So transpose is definitely the way to go. You can also leverage the dynamic tools to help you. I separated the course status from the completion dates to easily summarize, and remove where there are too many rows per course respondent.

 

image.png

In the dynamic rename tool, I used a Regex Replace to take only the "Course X" string from the header. Assuming that the pattern is always course title status column, followed by the completion date, this will rename the status column as Column X, and the completion date column as Column X2.

Dynamic Select separates the fields representing status from completion date, so that we can transpose and summarize, and return only one row for each course for each respondent (if someone was in progress and then completed, the formula tool changes the value to Completed. I threw in the Data Cleansing tool to standardize case in the status, and clean up extra whitespace.

 

Join everything back together, rename your columns, and hopefully this all works for you.

 

image.png

 

Let me know if this helps. I tried to make it as dynamic as possible.

 

Cheers!

Esther

irmb
7 - Meteor

HI guys, thanks for your great ideas. I was testing some of the ideas out and realized something.  The column names were not standardized "course name - status" "course name - completion date" because when the Excel is imported into Alteryx, some of the column names are truncated. How do I adjust the input tool so that it does not truncate the column names? I searched on the group and saw that I need to change the field length. Looks pretty straightforward but I don't get that option when I see the input tool's options. Screenshot below.

inputtool.JPG

MichalM
Alteryx
Alteryx

@irmb 

 

I don't think this is possible with Excel files. Also, the field length applies to the actual values rather than Header names. Could you try moving the headers into the first row of data by selecting the option below? This will hopefully prevent them from being truncated after which you can process them and move them back into the original position by Dynamic Rename

 

field-names.png

 

 

estherb47
15 - Aurora
15 - Aurora
Hi @irmb

If the pattern of headers is status followed by date, the method I proposed will work even if the column names are truncated. It’s only looking for the word course and the corresponding letter.

Please do try it and see if it solves your challenge.

Cheers!
Esther
irmb
7 - Meteor

@MichalM  I'm testing out your solution and it seems to work pretty well except using the - delimiter. Unfortunately, some of the course names use the - several times and not only to separate the status or completion names from the course name. Is there a way to set the delimiter to be - followed by the initial letters of the words status and completion?

Labels