Challenge #35: Data Cleansing Practice
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
The link to the solution for last challenge #34 is HERE. For this challenge let’s practice some data cleansing.
Use Case: There is a series of data cleansing processes we need to do on our data. Please solve each per the instructions.
Objective:
- Remove leading zeroes
- Trim leading zeroes and/or descriptive text at the end
- If the data value ends with ID, remove the ID
- If more than 8 chars, remove anything after 8. If only 6, add “SC” to the front.
- Labels:
- Basic
- Core
- Data Analysis
- Parse
- Preparation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Here's a solution:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
My solutions.
1. Remove Leading Zeros.
For this, given the codes we are looking for are 4 digit numeric codes, I simply converted to a numeric field. This automatically drops the leading zeros:
2. Isolate 4 digit numeric code.
Here, I simply used Text to Columns to split the columns on ":". I then dropped the superfluous columns:
3. Remove "SD" from the end of any records on which it appears.
This was achieved by simply using the "Trimright()" function to remove any instances of "SD":
4. Return 8 character string and add "US" to the start of strings that are shorter than 8 characters.
I used a simple IF statement to determine the records that were shorter than 8 characters, and if so, add the characters "US" to the beginning. This was coupled with a left() function to return the 8 character string:
5. Return 5 digit code rather than descriptions on given codes.
Here, I used the join tool to join "SBU Code" in the input data stream to the "Description" field in the source table. The left output returns the correct three codes. The output of the join shows those records where the code is in the description field. Rename this output and union back to the input stream to return all records correctly:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Here's my solution for this challenge.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Syfer,
You're close.
I realigned your columns in the union join which returned the correct result.
- Matt
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator