Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Input errors caused by excessive quotes in data

hasi82
6 - Meteoroid

Dear community

 

I am standing in front of a problem which I don't seem to able to fix with the existing tips I found in the forum.

I still feel like this should be a common problem though which should be fixable without editing all the data manually. Maybe someone here is able to help me.

 

My problem is being caused by extra quotes in text fields (or semicolons). These make Alteryx unable to determine the end of a field correctly. As result, the data is moving to other columns in some places.

I hope the following example is understandable.

 

My input in csv format looks like this:

"Product";"Number";"Description";"Amount";"Unit";"Date"
"Product";"1234";"NA";"250";"PCS";"20181212"
"Product";"1234";"Screen 24"";"250";"PCS";"20181212"
"Product";"1234";"Screen 28"";"250";"PCS";"20181212"
"Product";"1234";"NA";"250";"PCS";"20181212"
"Product";"1234";"NA";"250";"PCS";"20181212"
"Product";"1234";"Book "Alteryx for dummies"";"250";"PCS";"20181212"
"Product";"1234";"Game "Tetris"      .";"250";"PCS";"20181212"
"Product";"1234";"NA";"250";"PCS";"20181212"

 

When being imported with delimiter ";" the result looks like this:

 

RowProductNumberDescriptionAmountUnitDate
1Product1234NA250PCS20181212
2Product1234Screen 24;"250PCS20181213[NULL]
3Product1234Screen 28;"250PCS20181213[NULL]
4Product1234NA250PCS20181212
5Product1234NA250PCS20181212
6Product1234Book "Alteryx for dummies";"250PCS20181213[NULL]
7Product1234Game "Tetris"    .250PCS20181218
8Product1234NA250PCS20181212

 

As you can see in rows 2, 3 and 6 the data doesn't end up in the correct column.

For row 7 Alteryx still gives a warning but the data is in the correct column.

 

I have tried some workarounds I found here but sadly didn't get the desired result.

 

Reading in the data as one column with delimiter \0 and then replacing two double quotes ("") with one double quote doesn't work as the data may have trailing whitespaces separating the quotes. If I remove the whitespaces before, blank fields will be represented by two double quotes aswell and also be replaced with a single double quote.

 

I have also tried the different options in the "Ignore Delimiters in" setting. If I choose "None" I will get the error "too many fields in row XXX" as there are also text fields that contain a semicolon in the data.

 

I am kind of lost at moment and thankful for any advice.

Thank you!

11 REPLIES 11
BenMoss
ACE Emeritus
ACE Emeritus

Unfortunately the sample data you have provided seems to fail this test

 

I have also tried the different options in the "Ignore Delimiters in" setting. If I choose "None" I will get the error "too many fields in row XXX" as there are also text fields that contain a semicolon in the data.

 

As in, this error is not produced if I put none. Could you perhaps provide another line which would produce this kind of error and then we can look t work around it.

 

I have some experience with this type of issue before so hopefully I can help you resolve the issue!

 

Ben

Thableaus
17 - Castor
17 - Castor

Hi @hasi82 

 

I know this might not be the best solution, but it would work fine:

 

Delimiters1.PNGMulti-Field.PNG

 

You could just set to "None" that Ignore Delimiters in quotes and remove quotes using RegEX with Multi-Field formula (so it would apply to all fields).

 

I know the RegEX approach is never so friendly, but it's the first thing I thought. Hoping to see a different solution here too so we can share ideas.

 

Cheers,

Thableaus
17 - Castor
17 - Castor

@hasi82 

 

I found this cool function StripQuotes that can help you even better.

 

Cool thing I had never used it. That's an excellent use case of this function.

 

Stripquotes.PNG

 

Cheers,

estherb47
15 - Aurora
15 - Aurora

Hi @hasi82 

Here's a solution without any RegEx (although I do love RegEx!)

image.png

For the input file, I set no delimiters. Add a record ID so that you have unique identifiers for the rows. 

Next, parse into rows on the semicolon with a text to columns tool. That way if you're working with a file with even more columns, you don't have to fiddle with the number of columns in the text to columns tool

 

You can use a formula to strip out the beginning and ending quote marks: Substring([Field_1],2,length([Field_1])-2) will give you everything but the opening and closing quotes

Use a Tile tool to create a sequence of numbers within each Record ID. The Crosstab tool uses this as the column headers (group on Record ID, Tile Sequence as column headers, Field_1 as values)

finally, a Dynamic Rename pops the first row into the headers.

Please let me know if this solves your challenge. This was a fun one!

 

Cheers!

Esther

hasi82
6 - Meteoroid

Thank you @all for trying to help!

 

@BenMoss

I added another field to make my problem more clear:

 

"Product";"Number";"Description";"Amount";"Unit";"Date";"Notes"
"Product";"1234";"NA";"250";"PCS";"20181212";""
"Product";"1234";"Screen 24"";"250";"PCS";"20181212";""
"Product";"1234";"Screen 28"";"250";"PCS";"20181212";""
"Product";"1234";"NA";"250";"PCS";"20181212";""
"Product";"1234";"NA";"250";"PCS";"20181212";""
"Product";"1234";"Book "Alteryx for dummies"";"250";"PCS";"20181212";"received;13.01.2019"
"Product";"1234";"Game "Tetris"      .";"250";"PCS";"20181212";"received;13.01.2019"
"Product";"1234";"NA";"250";"PCS";"20181212";""


@Thableaus the problem here is I also have fields that contain semicolons. If I try to do your approach, the rows with semicolons in the data get completely replaced by [NULL] after the Input.

 

input screen1.PNG

 

@EstherB47 thank you for taking on this challenge!

The problem with your approach is that it also doesn't take semicolons in the data into account. I am sorry that I didn't make this clear in the beginning.

Your approach produces an extra field if there is a semicolon in the data. Weirdly it also deletes the characters surrounding the semicolon, see the screenshot

 

input screen 2.PNG

 

Maybe a combination of approaches will work but I don't seem to be able to find it.

Any help is greatly appreciated!

Thableaus
17 - Castor
17 - Castor

 @hasi82 

 

Not such an easy case to solve, but this workflow will help you certainly:

InputSolution.PNG

- Input tool with \0 delimiter - None quotes ignored - First Row does not contain field names

- Formula to Field_1 created to replace ";" by "|". This is just to avoid confusion when interpreting the delimiter

- Record ID to keep track of rows

- Text to Columns with | delimiter - but split to rows so we can dynamize number of columns

- Formula Tool with StripQuotes function to take off the unnecessary quotes

- Filter RecordID = 1 - this is just so we can split column names from rows

- Tile Tool and Record ID again to create a sequence for matching respective columns to each record

- Join Tool to do that matching - this is to append columns to respective values

- Cross-Tab Tool to rearrange columns and rows - grouping by Record ID, so the rows are organized.

- Select Tool to put columns in order and uncheck Record ID.

 

I know this flow got bigger due to this delimiter confusion, but it works.

 

Flow attached.

 

Cheers,

hasi82
6 - Meteoroid

@thableaus it seems to work indeed! thank you so much. replacing ";" with something else is actually such an easy solution. Now I just need to rearrange the columns after the cross tab, but I think I have seen some solutions to this already.

 

again, thanks to all of you!

ChrisTX
16 - Nebula
16 - Nebula

Great solution, @Thableaus 

I learned a lot by reviewing your approach.

 

Here's an alternative that produces the same result, using the sample data:

 

Import Data tool:
  use delimiter ;
  un-check option for "First Row Contains Field Names"
  set Ignore Delimiters In = None

 

Multi-field formula tool:

  select all Text fields
  un-check box for "Copy Output Fields Add"
  Expression = stripquotes([_CurrentField_])

 

Dynamic Rename tool: Take field Names from First Row of Data

 

Capture.PNG

artzee1556
8 - Asteroid

@Thableaus . Have the same situation - multiple fields with unmatched quotes that I need to strip. Unfortunately my file is delimited by a tab character which is not visible like a comma or semicolon. Any idea what I should specify as the character to replace? I tried \t but it wasn't detected. Thanks much in advance, Art.

Labels