Alert: There is a planned Community maintenance outage October 16th from approximately 10 - 11 PM PST. During this time the Alteryx Community will be inaccessible. Thank you for your understanding!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Splitting a text document that has no delimiters into columns

icemanlindelof
5 - Atom

Hi All,

 

My data is all contained in a single column and I want to split it into columns. The structure looks like this -

 

2021 507982 GINKOZAC 30 CAPSULE P909 Cash RONZAC HEALTH & WELLNESS NUTRICEUTICALS FOOD SUPPLEMENT BLOOD CIRCULATION GINKOBILOBA Cash 0 90914429750198 2021-04-11 00:00:00.000 Yes 2.00000 300.00000 -150.00000 0.00000 150.00000

 

I know the order of columns contained but they all have varying lengths. For e.g., the first 4 characters are year 2021, the next 6 characters after the space is product ID. However, beyond that, the next column (product description) etc. will be varying length. I am unable to come up with a unique identifier to split this data into my desired set of columns.

 

Year | Item ID | Item description | Store ID | Customer type | Item type | Category 1 | Category 2 | Category 3 | Category 4 | Category 5 | Customer type breakdown | 0 | Transaction ID | Date | Loyal? | Qty | Gross sales | Promo | Discount | Net sales

 

Any support on how I can achieve the structure above will be much appreciated.

8 REPLIES 8
DataNath
17 - Castor

If it was just a single row (or not all in one field) then, it's not ideal, but it could've been bruteforced with RegEx or text to columns. However, as you say there are others with varying lengths/no consistent patterns, I'm struggling to think of a way you'll be able to make this dynamic. Where does this data come from? Is there any way you can make changes upstream to add these delimiters or something that can be used to identify or separate each entity within the string?

icemanlindelof
5 - Atom

I received this data from the client - not sure if their system can help output the data with a consistent delimiter.

binuacs
21 - Polaris

@icemanlindelof Can you manually split the given sample records based on the header and provide the input file?

DataNath
17 - Castor

If there aren't any consistent patterns or obvious delimiters then as far as I'm aware it'll be impossible to split these dynamically and so the best course of action would be to try and have something changed upstream that would allow this. If not then unless I'm mistaken you'll likely have to resort to manual intervention before adding to your flow.

Martyn
9 - Comet

There are parts of the string that are easy to breakdown, and others that are not so easy. Looks very much like a Regex type approach will be best.

 

Year | Item ID are simple enough

Item description | Store ID are a bit trickier as these may be multi-word and there may not be a way to logically imply the split between the two. 

Customer looks to be from a distinct set of possibilities which can be coded for in Regex

Item type hopefully this can be picked out as just being between Customer and Category 1

Category 1 | Category 2 | Category 3 | Category 4 | Category 5 if these have a distinct set of possibilities then this can be coded in Regex

Customer type breakdown looks to be from a distinct set of possibilities which can be coded for in Regex

0 is easy to pick out in Regex

Transaction ID | Date | Loyal? | Qty | Gross sales | Promo | Discount | Net sales all look to be space separated - so Text to Columns or Regex to parse.

 

If you're new to Regex then this will take some time.

Perl Regular Expression Syntax - 1.68.0 (boost.org) is a good resource for understanding Regex.

icemanlindelof
5 - Atom

Below is a sample split of two records.

 

YearItem IDItem DescriptipnStore IDCustomer typeItem typeCategory 1Category 2Category 3Category 4Category 5Customer type breakdown0Transaction IDDateBrandLoyalty?QtyGross salesPromoDiscountNet sales
2021101375MARVELON 1 X 21 TABP549CashLOCALPHARMA-RXRX-ENDOCRINECONTRACEPTIONORAL CONTRACEPTIVEETHINYLESTRADIOL,DESOGESTRELCash05.49E+1300:00.0MARVELONNo114.60014.6
2021507982GINKOZAC 30 CAPSULEP909CashRONZACHEALTH & WELLNESSNUTRICEUTICALSFOOD SUPPLEMENTBLOOD CIRCULATIONGINKOBILOBACash09.09E+1300:00.0 Yes2300-1500150
icemanlindelof
5 - Atom

Attached is the sample data. Please let me know if I can help any other way.

binuacs
21 - Polaris

@icemanlindelof one way of doing this with the help of regex tool. You need to mention the position of each fields. If you find some of the fields are not fitting adjust the size mentioned inside the regex

 

binuacs_0-1653655459667.png

 

Labels