community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Upgrade Alteryx Designer in 10 Steps

Debating whether or not to upgrade to the latest version of Alteryx Designer?

LEARN MORE

Correct the data spliting

Asteroid

Hi,

Is there a way to correct the data splitting caused due to an extra delimiter in the file?

For example: 

 

Name,Employee,Date,Day
Time,1001,27,Tuesday
Harry,1223,23,Friday
Susan,K,1211,02,Monday
Josh,1432,22,Thursday
Katrina,2342,12,Mon,day
Roy,3211,01,Tuesday

 

"," is extra in the Name - "Susan K" and Day - "Monday"

I can check the number of fields needed to be using the Text to Column tool. By using it I will know that 4 rows contains 4 fields however 2 row (3rd and 5th row) contains 5 fields. 

I can then use the Filter tool to filter out the row containing 5 fields.

But now is there a to correct it in alteryx?

 

 

Magnetar

Hi @Akash01 

 

If you import the data all in one string (\n delimiter), you can then use regex to parse out what you're looking for

 

WF.png

 

From here it's a matter of cleaning up the 1st and 4th fields

 

Dan

Alteryx Certified Partner

Hi @Akash01 

 

It really depends on how the pattern of your data is structured.


In this case, I found a workaround - (if I consider that extra commas will always be present on words, I can use RegEX to parse the columns how exactly they should be.

 

Ex:

Solution.PNG

In this case I know there will be digits in the second column. So I need everything until the comma before the first digit for the first column.

Then I can Parse based on the comma division.

 

Cheers,

Magnetar

@Thableaus 

 

Again with the same(close enough) answer

 

Dan

Asteroid

The extra delimiter "," can also be between digits or anywhere else and not just words.

 

For example:

 

Name,Employee,Date,Day,Description
Time,1001,27,Tuesday,Worked in Accounts
Harry,1223,23,Friday, Experience in sales
Susan,K,1211,02,Monday, Sales
Josh,1432,22,Thursday, Engineering
Katrina,2342,12,Mon,day, Team player
Roy,32,11,01,Tuesday, 15 year experience in Sales,Accounts and Audit

 

There is an extra comma in the last row in the fields "Employee" and Description.

Alteryx Certified Partner

@Akash01 Does the employee Number have always 4 digits?

 

Cheers,

Magnetar

@Akash01 

 

What I have been able to determine so far about your valid data

1. a string of characters that gets assigned to the name field

2. a sequence of 4 digits that is the employee number

3. a sequence of 2 digits that is the day

4. a string of characters that can be either one of 2 fields, day or description.   Day is standard so you can write a function to parse it out and stick the rest in the description.

 

Based on this I'd say

1. Load the data in as a single string 

2. use regex to split the string in 2 with everything up to the last digit in the 1st part 

3. Remove all the "," from the 1st part and split it into text, 4 digits and 2 digits

4. Take the second part from step 2 and remove all the commas until you get to the first "y"

5. split all to the 1st "y" to Day and all the rest to Description

 

Dan

  

 

 

 

Asteroid

No! Any column can have any number of characters. :/

Magnetar

HI @Akash01 

 

You can assume the day field will always have 2 digits, so use this as a starting point. Edit:  Possibly a rash assumption.  If not, then this coupled with varying number of digits in the Employee field mean that you can't parse it automatically. 

 

1. the last 2 digits go into the day

2. all the digits before that go into the Employee

3. all the starting text goes into name

 

Continue on with the remaining steps from my previous post

 

An other option is based on the idea that most of your data will be correct.  In this case put a recordID on all the rows and then pull out all that have exactly 4 "," and process these automatically.     Send all the others to an error output to be corrected manually and re entered.  Use the recordID to order the combined output

 

Dan

Labels