Alteryx Designer Desktop Discussions

pgensler · ‎04-13-2017

I have a text file(.txt), which is delimited with the following format, with the records formatted as such:

Order/ID : IN-2013-77878

Order/Date : 2/5/13

Ship/Mode: Second Class

product/id : TEC-AC-10003033

Order/ID : IN-2013-77878

Order/Date : 2/8/13

Ship/Mode: Second Class

product/id : FUR-CH-10004050

My file has quite a bit more records( probably over 500,000), and I'm not sure how to begin reading this data in with Alteryx.

Is there an easy way to transpose this data into a tidy format (or loop over it?) What tools can I use in Alteryx to get this type of data into a tidy form? Thanks so much.

JohnJPS · ‎04-13-2017

Hi @pgensler

The "Text TO Columns" tool can separate on the colon in each line; then you can apply a RecordID and convert it using a Formula to count "every four rows" ... at that point you can apply a cross tab to produce the table I think you may be after. See attached workflow for an example.

Federica_FF · ‎04-13-2017

Hi!

Just in case "every four rows" is not a fixed rule in your data (sometimes you have 4 rows of data, sometimes 5, 6, 10...) you can make the ID calculation more dynamic using the MultiRow formula instead of RecordID tool+Formula tool:

RecordID = If startswith(Field_1), "Order/ID :") then [Row-1:RecordID]+1 else [RecordID] endif

Record#1 starts with "Order/ID", Alteryx will assign the value of the previous row (which doesn't exist, its value is 0) and add +1 = 1

Record#2 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 1

Record#3 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 1

Record#4 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 1

Record#5 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 1

Record#6 starts with "Order/ID", Alteryx will assign the value of the previous row (1) and add +1 = 2

Record#7 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 2

Record#8 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 2

Record#9 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 2

Record#10 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 2

and so on....

Joe_Mako · ‎04-19-2017

using the file you attached, I created the following:

- RecordID tool to generate a Line Number field, this is so later we can see consecutive lines as being a single record

- Filter to keep only records that contain the ":" character (this logic may need to be more sophisticated or additional prep tools depending on actual data)

- Multi-Row Formula to create a RecordID based on consecutive Line Number values

- Regex Parse to split the raw data into a field name and value pair, you can use https://regex101.com/ to check for details on what

([^\s]*)\s*:\s*(.*)[\\}]

does. Basically it is going to return two fields, first field is the last set of consecutive non-space characters before the colon, and the second field is all character after the colon but before a \ or } character.

- Cross Tab to reshape the data, using RecordID as key

adm510 · ‎04-21-2017

@Joe_Mako great work! I copied your example and instead of using regex, I used a text to columns tool with the colon ":" as the delimiter. I also threw in a data cleansing tool to remove any leading/trailing white space.

Joe_Mako · ‎04-21-2017

@adm510 just a text to columns tool is not enough. take a look at your output. Instead of a single "Order_ID" field you have "_cf0_Order_ID_"and "_f0_fs24__cf0_Order_ID_"

adm510 · ‎04-21-2017

@Joe_Mako hmm, it worked for me, welp then I recommend using the regex version :)

Joe_Mako · ‎04-21-2017

Did you use the in-line text or the data.txt file that was attached?

{\rtf1\ansi\ansicpg1252\cocoartf1504\cocoasubrtf820
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\margl1440\margr1440\vieww10800\viewh8400\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0

\f0\fs24 \cf0 Order/ID : IN-2013-77878\
Order/Date : 2/5/13\
Ship/Mode: Second Class\
product/id : TEC-AC-10003033\
\
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0
\cf0 Order/ID : IN-2013-77878\
Order/Date : 2/8/13\
Ship/Mode: Second Class\
product/id : FUR-CH-10004050}

pgensler · ‎04-29-2017

Is there an easy way to spread the data into columns based on key-value pairs, like the spread function from the tidyr package in Alteryx?

BenMoss · ‎04-30-2017

Hi Petergensler,

The cross-tab tool should be your way to go here.

In terms of setting it up you would use your key as the 'new column headers' field, your value as the 'value for new columns', and you can use the group by configuration at the top of the panel if you have multiple instances say...

ID | Key | Value

Alteryx Designer Desktop Discussions

Parse txt file by rows