Alteryx Designer Desktop Discussions

StockMarket · ‎12-04-2020

Hello again.

There is a field named “Ticker” inside my csv data file, which has got multiple details inside it in the form of a “Large String Value”, which is actually a combination of 6 different parts combined together into one. My purpose is to break this field down into those 6 parts again.

The details of these 6 individual parts is as follows –
1. STRING – This is actually the NAME - WITHOUT ANY SPACES
2. DD – First two digits of the Date
3. MMM – First three characters representing the Month
4. YY- Year in two digits format.
5. STRING – This is a NUMBER which MIGHT HAVE DECIMAL VALUES as well in some cases.
6. CE or PE – This is the prefix, which will either have “CE” or “PE” Values at the very end

To give PRACTICAL EXAMPLE, the Original Ticker Value of ACC26NOV201600CE will get broken down into these 6 New Fields -

1. ACC

2. 26

3. NOV

4. 20

5. 1600

6. CE

BAJAJ-AUTO31DEC203000PE, will get broken down into these 6 New Fields -

1. BAJAJ-AUTO

2. 31

3. DEC

4. 20

5. 3000

6. PE

And so on, for all other values.

Can someone please suggest which method to use for breaking up the Original Field “Ticker” into 6 New Fields, as described above?

To make it easy for you, I have attached the sample data "20201123.csv" file on which this flow needs to be run. Although the actual data file is quite big, having millions of rows in it, but I have made a very small sample out of it for sharing on the forum. All this data has a constant schema throughout.

I have also attached the Alteryx Worflow Package .yxzp file, so that you can easily open it up on your machine. I have used the latest version of Alteryx for creating this workflow. You may please have a look.

0 Links

MarqueeCrew · ‎12-04-2020

I'd use a regex tool and set it to PARSE.

The expression would be:

(\D+)(\d{2})(\u{3})(\d{2})([0-9\.,]+)(\u+)

as long as the name doesn't have numbers, that expression should work. If it can have numbers, then change \D to . (Decimal dot).

cheers,

mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.

Qiu · ‎12-04-2020

@StockMarket
Hope this is what you need.

StockMarket · ‎04-12-2021

Thank you @Qiu and @MarqueeCrew

The solution provided by Qiu works well for the data that I posted in the Original Example. But that formula fails if there is an additional DECIMAL VALUE at the end.

For example -

It works well for breaking down -

ACC26NOV201600CE

BAJAJ-AUTO31DEC203000CE

But it does not works for -

ACC26NOV201600.5CE

BAJAJ-AUTO26NOV203100.5CE

Please notice the additional "Decimal .5" just before CE

Can someone please suggest the modification in the formula suggested by Qiu -

(.+?)(\d{2})(\w{3})(\d{2})(\d+)(\w{2})

so that it works for those cases as well which might have Decimal before CE/PE

In some cases the decimal is present, while in other cases there is no decimal, so I need a solution which will work for both the cases.

I have attached additional data for testing this decimal issue.

Thanks a lot

StockMarket · ‎04-12-2021

And how do I possibly FILTER OUT those particular data rows, for which the parsing has not happened correctly? I want to separate the output into 2 parts.

One for which parsing happened successfully.

And Two for which parsing did not happen successfully.

How to add this additional step to the above workflow?

Thank you

Qiu · ‎04-12-2021

@StockMarket
Wow I almost forgor this one.

Just modified the RegEx to adapt your new data sample.

And for the checking purpose, I use a Formula tool to put the pieces back together then check the length again the length of original string.

Alteryx Designer Desktop Discussions

Breaking down a BIG String into 6 smaller parts

[Sharing] Custom Formula Function 'IsPrime' to Jud...

Re: What’s the best way to save each day’s working...

Re: Alteryx Load Time

Re: Accounting for Zero Records on Email Output

Re: Ideas on how to control sequence of gallery wo...