Parse txt file by rows

pgensler

I have a text file(.txt), which is delimited with the following format, with the records formatted as such:

Order/ID : IN-2013-77878

Order/Date : 2/5/13

Ship/Mode: Second Class

product/id : TEC-AC-10003033

Order/ID : IN-2013-77878

Order/Date : 2/8/13

Ship/Mode: Second Class

product/id : FUR-CH-10004050

My file has quite a bit more records( probably over 500,000), and I'm not sure how to begin reading this data in with Alteryx.

Is there an easy way to transpose this data into a tidy format (or loop over it?) What tools can I use in Alteryx to get this type of data into a tidy form? Thanks so much.

data.txt

Transformation

Preparation

Iterative Macro

Accepted answers

JohnJPS

Hi @pgensler

The "Text TO Columns" tool can separate on the colon in each line; then you can apply a RecordID and convert it using a Formula to count "every four rows" ... at that point you can apply a cross tab to produce the table I think you may be after. See attached workflow for an example.

ParseTextMultirowRecords.yxmd

All comments

JohnJPS

Hi @pgensler

ParseTextMultirowRecords.yxmd

Federica_FF

Hi!

Just in case "every four rows" is not a fixed rule in your data (sometimes you have 4 rows of data, sometimes 5, 6, 10...) you can make the ID calculation more dynamic using the MultiRow formula instead of RecordID tool+Formula tool:

RecordID = If startswith(Field_1), "Order/ID :") then [Row-1:RecordID]+1 else [RecordID] endif

Record#1 starts with "Order/ID", Alteryx will assign the value of the previous row (which doesn't exist, its value is 0) and add +1 = 1

Record#2 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 1

Record#3 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 1

Record#4 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 1

Record#5 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 1

Record#6 starts with "Order/ID", Alteryx will assign the value of the previous row (1) and add +1 = 2

Record#7 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 2

Record#8 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 2

Record#9 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 2

Record#10 doesn't starts with "Order/ID", Alteryx will assign the value of the previous row = 2

and so on....

Joe_Mako

using the file you attached, I created the following:

- RecordID tool to generate a Line Number field, this is so later we can see consecutive lines as being a single record

- Filter to keep only records that contain the ":" character (this logic may need to be more sophisticated or additional prep tools depending on actual data)

- Multi-Row Formula to create a RecordID based on consecutive Line Number values

- Regex Parse to split the raw data into a field name and value pair, you can use https://regex101.com/ to check for details on what

([^\s]*)\s*:\s*(.*)[\\}]

does. Basically it is going to return two fields, first field is the last set of consecutive non-space characters before the colon, and the second field is all character after the colon but before a \ or } character.

- Cross Tab to reshape the data, using RecordID as key

reshape data.png

reshape data.yxmd

adm510

@Joe_Mako great work! I copied your example and instead of using regex, I used a text to columns tool with the colon ":" as the delimiter. I also threw in a data cleansing tool to remove any leading/trailing white space.

reshape data.yxmd

Quick Links

This months top contributors

atcodedog05 19598

Qiu 15867

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703