Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parse txt file by rows

pgensler
8 - Asteroid

I have a text file(.txt), which is delimited with the following format, with the records formatted as such:

 

Order/ID : IN-2013-77878

Order/Date : 2/5/13

Ship/Mode: Second Class

product/id : TEC-AC-10003033

 

Order/ID : IN-2013-77878

Order/Date : 2/8/13

Ship/Mode: Second Class

product/id : FUR-CH-10004050

 

My file has quite a bit more records( probably over 500,000), and I'm not sure how to begin reading this data in with Alteryx.

Is there an easy way to transpose this data into a tidy format (or loop over it?) What tools can I use in Alteryx to get this type of data into a tidy form? Thanks so much.

 

9 REPLIES 9
JohnJPS
15 - Aurora

Hi @pgensler

The "Text TO Columns" tool can separate on the colon in each line; then you can apply a RecordID and convert it using a Formula to count "every four rows" ... at that point you can apply a cross tab to produce the table I think you may be after.  See attached workflow for an example.

Capture.PNG

 

 

Federica_FF
11 - Bolide

Hi!

 

Just in case "every four rows" is not a fixed rule in your data (sometimes you have 4 rows of data, sometimes 5, 6, 10...) you can make the ID calculation more dynamic using the MultiRow formula instead of RecordID tool+Formula tool:

 

RecordID = If startswith(Field_1), "Order/ID :") then [Row-1:RecordID]+1 else [RecordID] endif

 

Record#1 starts with "Order/ID", Alteryx will assign the value of the previous row (which doesn't exist, its value is 0) and add +1 = 1

Record#2 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 1

Record#3 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 1

Record#4 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 1

Record#5 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 1

Record#6 starts with "Order/ID", Alteryx will assign the value of the previous row (1) and add +1 = 2

Record#7 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 2

Record#8 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 2

Record#9 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 2

Record#10 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 2

and so on....

Joe_Mako
12 - Quasar

using the file you attached, I created the following:

 

- RecordID tool to generate a Line Number field, this is so later we can see consecutive lines as being a single record

- Filter to keep only records that contain the ":" character (this logic may need to be more sophisticated or additional prep tools depending on actual data)

- Multi-Row Formula to create a RecordID based on consecutive Line Number values

- Regex Parse to split the raw data into a field name and value pair, you can use https://regex101.com/ to check for details on what

([^\s]*)\s*:\s*(.*)[\\}]

does. Basically it is going to return two fields, first field is the last set of consecutive non-space characters before the colon, and the second field is all character after the colon but before a \ or } character.

- Cross Tab to reshape the data, using RecordID as key

 

reshape data.png

 

 

adm510
11 - Bolide

@Joe_Mako great work! I copied your example and instead of using regex, I used a text to columns tool with the colon ":" as the delimiter.  I also threw in a data cleansing tool to remove any leading/trailing white space.

Joe_Mako
12 - Quasar

@adm510 just a text to columns tool is not enough. take a look at your output. Instead of a single "Order_ID" field you have "_cf0_Order_ID_"and "_f0_fs24__cf0_Order_ID_"

adm510
11 - Bolide

@Joe_Mako hmm, it worked for me, welp then I recommend using the regex version :)

data.png

 

 

 

 

Joe_Mako
12 - Quasar

Did you use the in-line text or the data.txt file that was attached?

 

{\rtf1\ansi\ansicpg1252\cocoartf1504\cocoasubrtf820
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\margl1440\margr1440\vieww10800\viewh8400\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0

\f0\fs24 \cf0 Order/ID : IN-2013-77878\
Order/Date : 2/5/13\
Ship/Mode: Second Class\
product/id : TEC-AC-10003033\
\
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0
\cf0 Order/ID : IN-2013-77878\
Order/Date : 2/8/13\
Ship/Mode: Second Class\
product/id : FUR-CH-10004050}

pgensler
8 - Asteroid

Is there an easy way to spread the data into columns based on key-value pairs, like the spread function from the tidyr package in Alteryx?

BenMoss
ACE Emeritus
ACE Emeritus

Hi Petergensler,

 

The cross-tab tool should be your way to go here.

 

In terms of setting it up you would use your key as the 'new column headers' field, your value as the 'value for new columns', and you can use the group by configuration at the top of the panel if you have multiple instances say...

ID | Key | Value

 

1 | First | Ben
1 | Second | Moss
2 | First | Dave
2 | Second | Smith

Would leave you with a table looking like (with ID being a group field)...

ID | First | Second

1 | Ben | Moss
2 | Dave | Sminth

Ben

Labels
Top Solution Authors