I have a text file(.txt), which is delimited with the following format, with the records formatted as such:
Order/ID : IN-2013-77878
Order/Date : 2/5/13
Ship/Mode: Second Class
product/id : TEC-AC-10003033
Order/ID : IN-2013-77878
Order/Date : 2/8/13
Ship/Mode: Second Class
product/id : FUR-CH-10004050
My file has quite a bit more records( probably over 500,000), and I'm not sure how to begin reading this data in with Alteryx.
Is there an easy way to transpose this data into a tidy format (or loop over it?) What tools can I use in Alteryx to get this type of data into a tidy form? Thanks so much.
Solved! Go to Solution.
Hi @pgensler
The "Text TO Columns" tool can separate on the colon in each line; then you can apply a RecordID and convert it using a Formula to count "every four rows" ... at that point you can apply a cross tab to produce the table I think you may be after. See attached workflow for an example.
Hi!
Just in case "every four rows" is not a fixed rule in your data (sometimes you have 4 rows of data, sometimes 5, 6, 10...) you can make the ID calculation more dynamic using the MultiRow formula instead of RecordID tool+Formula tool:
RecordID = If startswith(Field_1), "Order/ID :") then [Row-1:RecordID]+1 else [RecordID] endif
Record#1 starts with "Order/ID", Alteryx will assign the value of the previous row (which doesn't exist, its value is 0) and add +1 = 1
Record#2 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 1
Record#3 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 1
Record#4 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 1
Record#5 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 1
Record#6 starts with "Order/ID", Alteryx will assign the value of the previous row (1) and add +1 = 2
Record#7 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 2
Record#8 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 2
Record#9 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 2
Record#10 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 2
and so on....
using the file you attached, I created the following:
- RecordID tool to generate a Line Number field, this is so later we can see consecutive lines as being a single record
- Filter to keep only records that contain the ":" character (this logic may need to be more sophisticated or additional prep tools depending on actual data)
- Multi-Row Formula to create a RecordID based on consecutive Line Number values
- Regex Parse to split the raw data into a field name and value pair, you can use https://regex101.com/ to check for details on what
([^\s]*)\s*:\s*(.*)[\\}]
does. Basically it is going to return two fields, first field is the last set of consecutive non-space characters before the colon, and the second field is all character after the colon but before a \ or } character.
- Cross Tab to reshape the data, using RecordID as key
@adm510 just a text to columns tool is not enough. take a look at your output. Instead of a single "Order_ID" field you have "_cf0_Order_ID_"and "_f0_fs24__cf0_Order_ID_"
@Joe_Mako hmm, it worked for me, welp then I recommend using the regex version :)
Did you use the in-line text or the data.txt file that was attached?
{\rtf1\ansi\ansicpg1252\cocoartf1504\cocoasubrtf820
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\margl1440\margr1440\vieww10800\viewh8400\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0
\f0\fs24 \cf0 Order/ID : IN-2013-77878\
Order/Date : 2/5/13\
Ship/Mode: Second Class\
product/id : TEC-AC-10003033\
\
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0
\cf0 Order/ID : IN-2013-77878\
Order/Date : 2/8/13\
Ship/Mode: Second Class\
product/id : FUR-CH-10004050}
Is there an easy way to spread the data into columns based on key-value pairs, like the spread function from the tidyr package in Alteryx?
Hi Petergensler,
The cross-tab tool should be your way to go here.
In terms of setting it up you would use your key as the 'new column headers' field, your value as the 'value for new columns', and you can use the group by configuration at the top of the panel if you have multiple instances say...
ID | Key | Value
1 | First | Ben
1 | Second | Moss
2 | First | Dave
2 | Second | Smith
Would leave you with a table looking like (with ID being a group field)...
ID | First | Second
1 | Ben | Moss
2 | Dave | Sminth
Ben