nav[aria-label="Primary Navigation"] { padding: 0; & ul { list-style: none; width: 100%; display: flex; flex-direction: row; justify-content: start; align-items: start; gap: 30px; padding: 0; & li { margin: 0; } & ul li { list-style: none; } } }

Challenge #2: Preparing Delimited Data

GeneR

We hope you enjoyed last week's challenge. The solution has been posted here. For the second challenge lets look at removing characters and splitting data into columns based on delimiters.

Many products will export textual data with delimiters such as quotes. This is done so that strings can contain delimiters or control characters within them. Having more than one type of delimiter can be hard for ETL programs to interpret. In the input text file, there are two different delimiters (double quotes, single quotes) and they surround different data types.

Use Alteryx to strip out the delimiters as superfluous and format the data as represented in the output.

You may notice that we have started classifying the exercises into beginner, Intermediate and advanced. This classification is used by Alteryx internally to sequence exercises as users advance.

Update 11/23/2015:

The solution has been uploaded.

challenge_2_start_file.yxmd

challenge_2_solution.yxmd

Parse

Preparation

Data Preparation

Beginner

Accepted answers

All comments

kevinbird15

Interesting solution. I did something similar, but I used the DateTime converter to get into a date format and parsed out all of the quotes and commas on the front end. Is it possible to have people post solutions one the solution is put up so that we can see how other people are solving the issues?

TaraM

Great idea @kevinbird15! Our ultimate goal with these exercises is to give users the ability to try to solve a myriad of problems and practice using Alteryx. We will have a training section soon that will encourage more participation and collaboration. In the meantime, feel free to post in one of the discussion boards (this one would fit in Data Prep and Blending) and link to the article you are working on. I am sure other users in the community would be very interested in how their peers are solving these problems.

marapatricia

Hi! I cannot open this file because I'm using a previous version. Do you have other versions of this file? Can you post it too? Thanks!

TaraM

@marapatricia - what version are you on? I'll see what we can do.

marapatricia

It's 9.5.11. Attaching the image. Thanks so much Tara! Really appreciate!

mix_pix

Hi,

I've used the Text to Columns tool before and definitely see why it makes sense for this exercise. How would you configure the Text to Columns tool if there weren't a defined number of fields you wanted to produce? In other words, in this example we know that there are three distinct components of the text string we want to parse out and those three components are separated by commas. What if there were a variable number of components in each row (i.e. one had 3, one had 4, etc)? Is there a way for Alteryx to determine how many components exist in each row (based on a known delimiter) and then feed that number into the # of Columns parameter?

-Mike

MarkN

Just did this exercise. I believe that by definition if you have a delimited file, all of the delimiters should be present for each row. If you have a delimited file that does not have all of its delimiters per row, that's an issue that would occur when exporting or creating the data.

unknown

Can Alteryx generate a new file with special delimiters?

I have an excel file with lots of characters , URL's, apostrophes, parenthesis, commas, etc. When I send to others, some people said their system can not read my file because their system take these characters as delimiters, so they suggest me to create a new file with delimiters? How to use Alteryx do that?

TaraM

Hi @Inactive User - this article explains using alternative characters as delimiters:

http://community.alteryx.com/t5/Alteryx-Knowledge-Base/The-How-to-Guide-to-Writing-Delimited-Files-comma-pipe-other/ta-p/31641

unknown

@TaraM thank you, the article is very useful!

KOBoyle

I don't see how this solution retains delimiters or control characters that are present within a field. If I change the first input from "Mary had a little lamb whose fleece was white as snow" to "Macy's, Inc. had a little lamb whose fleece was white as snow", where an apostrophe (text qualifier delimiter) and comma (field separator delimiter) are present in the text, the output is not correct. Can someone clarify how to implement a comparable solution when the delimiters are not superfluous, and are used for their intended purpose? Does Alteryx make a distintion between what spreadsheet software refers to as text qualifiers and field separators? Thanks.

JoeM

@KOBoyle, that's a great thought. In your example, where we transform the underlying data value: "Mary had a little lamb whose fleece was white as snow" to "Macy's, Inc. had a little lamb whose fleece was white as snow", the solution will fail. However, there are some options in the text to columns tool to help with cases like this. In the advanced options, we can choose when to ignore the specified delimiter when it is in 1) quotes, 2) single quotes 3) parentheses and 4) brackets. These options essentially operate as text qualifiers via the text to columns tool. If you reconfigure your solution in the text to column tool to appear as the following, your example will work:

Also, note that this functionality is available for certain file types in the input data tool. For example, if I import a .txt file, I will have the following as a configuration option:

kconner

Great exercise. I really like these challenges to help learn the tool and get familiar with its abilities.

My solution is below. Basically I ran Text to columns on the ',', then stripped the quotes from the two fields. Converted the date and then selected the values for the final display.

solution display

mceleavey

A simple solution:

Spoiler

I used the Text to Columns tool to seperate on commas, ensure "Ignore delimeters in quotes" is checked. This will ignore the commas in the text.
Then simply remove the quotes and convert the date.

challenge_2_solution.yxmd

Max06270

Pretty simple, thank you for the challenge!

Spoiler

Solution attached!

Thank you for the exercise - mostly the same as the other responders except:

Spoiler

- Used the new Alteryx 11 custom date capability in the DateTime component just to see how it works
- Used select tools throughout to remove unused data at each step (not needed for 2 rows, but good habit
- Added a simple tester to do a field-for-field check on the results vs. expectations

challenge_2_SeanSolution.yxmd

JoshK_dup_167

Going for gold.

challenge_2_my_solution.yxmd

MarqueeCrew

@JoeM,

2 badges today!

challenge_2_MarqueeCrew.yxmd

NicoleJohnson

My solution.

Spoiler

challenge_2_NicoleJohnson.yxmd

jjc42

Mostly the same as everyone else. I could have eliminated an extra step.

challenge_2_start_file_complete.yxmd

IJH34

Laurap1228

Challenge #2 complete!

Spoiler

challenge_2_LP.yxmd

SeanAdams

Hey @Laurap1228 - it's so exciting to see someone starting on the journey of getting all the weekly challenges done in order. There's a few of us who are on this journey, and quite a few have gone through 30 or even 50 of them, and for me this was one of the best ways to learn the toolset!

Keep on cracking through these - the learning is invaluable, especially if you take a look (only after you've completed your version :-)) at the solutions from some of the community greats and see how they have tackled them. There's so many ways to solve a problem, and by looking at other folk's approaches to each challenge (and trying to replicate them if they strike you as new learning) it really does accelerate learning.

:-) 2 more to go (since you've already done numbers 1-3) and you get your Khumbu Icefall badge (for 5 completed challenges) !

Good luck @Laurap1228 - it's a very exciting expedition you've undertaken.

Anurag

Nice Challenge,

A. I used 3 text to column tool in series with different delimiters: (,), ("), and ('). After this step I used Select Tool to rename and select the required columns.

B. The solution provided in the challenge is better and shorter as compared to my solution.

Challenge #2 solved

Spoiler

similar to other solutions. This one really helped me understand the DateTimeParse function

MarvinPinto

I used the Reg Ex , Date time converter and Parse tool to resolve this challenge which was slightly different than the solution provided. Does my solution have any potential disadvantage over the solution provided by Alteryx?

WC2_Preparing delimited data.yxmd

DE0413

Late to the game, but attempting to complete the challenges without reviewing the solutions. Attached is my solution.

Best wishes,

Denise

AlteryxWeeklyChallengeWeek2.yxmd

Phil_L57

Hi GeneR,

Used the data cleansing tool to clean the punctuation from the first 2 columns then used the following custom date format in a datetime tool:

'dd-Mon-yy'

Phil

challenge_2_start_file vPL.yxmd

LordNeilLord

Great opportunity to use the Date Time Tool

Spoiler

ydmuley

Here we go for the second one. I found this one very easy. Except had to research on date format, a new learning for me. Also, I have used the Data Cleansing tool which is just awesome and easy one. Comments? Suggestions?

2 Down....

challenge_2_start_file_Yug_Solution.yxmd

mceleavey

So far so good.

The data cleansing tool saves a lot of time.

SeanAdams

nicely done Yug - I've not used the "StripQuotes" formula before, so that's fun to see.

2 down, 67 to go :-)

sagarb

Late to the party!!

challenge_2_SB_Solution file.yxmd

LordNeilLord

workflow

challenge_2_LNL.yxmd

JORGE4900

Kuddos @Max06270; your approach really simplified the solution.

santiesteban

For some reason the " are not deleted from the text with the text to columns tool. Had to use a formula

challenge_2_start_file.yxmd

Kdpalmer

This was very helpful -- thank you!

Question - There are a lot of plausible solutions to this (Formula with Trim v. Data Cleansing Tool v. RegEx or DateTime function v. Formula tool on date) and also potential order of functions (e.g. selecting/renaming before/after tackling character or date formatting). Is there an optimal solution to this for computational effort? Or generally a list on which functions in Alteryx are more taxing on the system than others?

Thank you!

CC @LaurenU

gdell

I really like this solution, but how would you change the Poem_Read_Date column from a string to a date if that was required?

You can ignore this message, I worked it out.

garthn555

First attempt at a challenge since passing the Core exam - I get an error Font Lato does not support style Regular

mbogusz

Spoiler

challenge_2_solution_mbogusz.yxmd

jamiebassett

Definitely could do less steps in the future when I get better

challenge_2_JB.yxmd

Sntrada

Here is my solution. I used text to columns to parse, then the DateTime tool for the date conversion, and then a couple of select tools to do some housekeeping.