Bring your best ideas to the AI Use Case Contest! Enter to win 40 hours of expert engineering support and bring your vision to life using the powerful combination of Alteryx + AI. Learn more now, or go straight to the submission form.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Breaking down a BIG String into 6 smaller parts

StockMarket
8 - Asteroid

Hello again.

 

There is a field named “Ticker” inside my csv data file, which has got multiple details inside it in the form of a “Large String Value”, which is actually a combination of 6 different parts combined together into one. My purpose is to break this field down into those 6 parts again.

 

The details of these 6 individual parts is as follows –
1. STRING – This is actually the NAME - WITHOUT ANY SPACES
2. DD – First two digits of the Date
3. MMM – First three characters representing the Month
4. YY- Year in two digits format.
5. STRING – This is a NUMBER which MIGHT HAVE DECIMAL VALUES as well in some cases.
6. CE or PE – This is the prefix, which will either have “CE” or “PE” Values at the very end

 

 

To give PRACTICAL EXAMPLE, the Original Ticker Value of ACC26NOV201600CE will get broken down into these 6 New Fields -

 

1. ACC

2. 26

3. NOV

4. 20

5. 1600

6. CE

 

 

BAJAJ-AUTO31DEC203000PE, will get broken down into these 6 New Fields -

 

1. BAJAJ-AUTO

2. 31

3. DEC

4. 20

5. 3000

6. PE

 

 

And so on, for all other values.

 

Can someone please suggest which method to use for breaking up the Original Field “Ticker” into 6 New Fields, as described above?

 

To make it easy for you, I have attached the sample data "20201123.csv" file on which this flow needs to be run. Although the actual data file is quite big, having millions of rows in it, but I have made a very small sample out of it for sharing on the forum. All this data has a constant schema throughout.

 

I have also attached the Alteryx Worflow Package .yxzp file, so that you can easily open it up on your machine. I have used the latest version of Alteryx for creating this workflow. You may please have a look.

 

Workflow.png

 

 

 

Sample Data.png

5 REPLIES 5
MarqueeCrew
20 - Arcturus
20 - Arcturus

I'd use a regex tool and set it to PARSE.  

 

The expression would be:

 

(\D+)(\d{2})(\u{3})(\d{2})([0-9\.,]+)(\u+)

 

as long as the name doesn't have numbers, that expression should work. If it can have numbers, then change \D to .   (Decimal dot). 

 

 cheers,

 

 mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
Qiu
21 - Polaris
21 - Polaris

@StockMarket 
Hope this is what you need.

1205-StockMarket.PNG

StockMarket
8 - Asteroid

Thank you @Qiu and @MarqueeCrew 

 

The solution provided by Qiu works well for the data that I posted in the Original Example. But that formula fails if there is an additional DECIMAL VALUE at the end.

 

For example -

It works well for breaking down -

ACC26NOV201600CE

BAJAJ-AUTO31DEC203000CE

 

But it does not works for -

ACC26NOV201600.5CE

BAJAJ-AUTO26NOV203100.5CE

Please notice the additional "Decimal .5" just before CE

 

Can someone please suggest the modification in the formula suggested by Qiu -

(.+?)(\d{2})(\w{3})(\d{2})(\d+)(\w{2})

so that it works for those cases as well which might have Decimal before CE/PE

In some cases the decimal is present, while in other cases there is no decimal, so I need a solution which will work for both the cases.

 

I have attached additional data for testing this decimal issue.

 

Thanks a lot

StockMarket
8 - Asteroid

And how do I possibly FILTER OUT those particular data rows, for which the parsing has not happened correctly? I want to separate the output into 2 parts.

One for which parsing happened successfully.

And Two for which parsing did not happen successfully.

 

How to add this additional step to the above workflow?

 

Thank you

Qiu
21 - Polaris
21 - Polaris

@StockMarket 
Wow I almost forgor this one.

Just modified the RegEx to adapt your new data sample.

And for the checking purpose, I use a Formula tool to put the pieces back together then check the length again the length of original string.

String Work-r2.PNG

Labels
Top Solution Authors