Alteryx Designer Desktop Discussions

icemanlindelof · ‎05-27-2022

Hi All,

My data is all contained in a single column and I want to split it into columns. The structure looks like this -

2021 507982 GINKOZAC 30 CAPSULE P909 Cash RONZAC HEALTH & WELLNESS NUTRICEUTICALS FOOD SUPPLEMENT BLOOD CIRCULATION GINKOBILOBA Cash 0 90914429750198 2021-04-11 00:00:00.000 Yes 2.00000 300.00000 -150.00000 0.00000 150.00000

I know the order of columns contained but they all have varying lengths. For e.g., the first 4 characters are year 2021, the next 6 characters after the space is product ID. However, beyond that, the next column (product description) etc. will be varying length. I am unable to come up with a unique identifier to split this data into my desired set of columns.

Any support on how I can achieve the structure above will be much appreciated.

DataNath · ‎05-27-2022

If it was just a single row (or not all in one field) then, it's not ideal, but it could've been bruteforced with RegEx or text to columns. However, as you say there are others with varying lengths/no consistent patterns, I'm struggling to think of a way you'll be able to make this dynamic. Where does this data come from? Is there any way you can make changes upstream to add these delimiters or something that can be used to identify or separate each entity within the string?

icemanlindelof · ‎05-27-2022

I received this data from the client - not sure if their system can help output the data with a consistent delimiter.

binuacs · ‎05-27-2022

@icemanlindelof Can you manually split the given sample records based on the header and provide the input file?

DataNath · ‎05-27-2022

If there aren't any consistent patterns or obvious delimiters then as far as I'm aware it'll be impossible to split these dynamically and so the best course of action would be to try and have something changed upstream that would allow this. If not then unless I'm mistaken you'll likely have to resort to manual intervention before adding to your flow.

Martyn · ‎05-27-2022

There are parts of the string that are easy to breakdown, and others that are not so easy. Looks very much like a Regex type approach will be best.

Year | Item ID are simple enough

Item description | Store ID are a bit trickier as these may be multi-word and there may not be a way to logically imply the split between the two.

Customer looks to be from a distinct set of possibilities which can be coded for in Regex

Item type hopefully this can be picked out as just being between Customer and Category 1

Category 1 | Category 2 | Category 3 | Category 4 | Category 5 if these have a distinct set of possibilities then this can be coded in Regex

Customer type breakdown looks to be from a distinct set of possibilities which can be coded for in Regex

0 is easy to pick out in Regex

If you're new to Regex then this will take some time.

Perl Regular Expression Syntax - 1.68.0 (boost.org) is a good resource for understanding Regex.

icemanlindelof · ‎05-27-2022

Below is a sample split of two records.

Year

Item ID

Item Descriptipn

Store ID

Customer type

Item type

Category 1

Category 2

Category 3

Category 4

Category 5

Customer type breakdown

0

Transaction ID

Date

Brand

Loyalty?

Qty

Gross sales

Promo

Discount

Net sales

2021

101375

MARVELON 1 X 21 TAB

P549

Cash

LOCAL

PHARMA-RX

RX-ENDOCRINE

CONTRACEPTION

ORAL CONTRACEPTIVE

ETHINYLESTRADIOL,DESOGESTREL

Cash

0

5.49E+13

00:00.0

MARVELON

No

1

14.6

0

14.6

2021

507982

GINKOZAC 30 CAPSULE

P909

Cash

RONZAC

HEALTH & WELLNESS

NUTRICEUTICALS

FOOD SUPPLEMENT

BLOOD CIRCULATION

GINKOBILOBA

Cash

0

9.09E+13

00:00.0

Yes

2

300

-150

0

150

icemanlindelof · ‎05-27-2022

Attached is the sample data. Please let me know if I can help any other way.

binuacs · ‎05-27-2022

@icemanlindelof one way of doing this with the help of regex tool. You need to mention the position of each fields. If you find some of the fields are not fitting adjust the size mentioned inside the regex

Alteryx Designer Desktop Discussions

Splitting a text document that has no delimiters into columns