Hello again.
There is a field named “Ticker” inside my csv data file, which has got multiple details inside it in the form of a “Large String Value”, which is actually a combination of 6 different parts combined together into one. My purpose is to break this field down into those 6 parts again.
The details of these 6 individual parts is as follows –
1. STRING – This is actually the NAME - WITHOUT ANY SPACES
2. DD – First two digits of the Date
3. MMM – First three characters representing the Month
4. YY- Year in two digits format.
5. STRING – This is a NUMBER which MIGHT HAVE DECIMAL VALUES as well in some cases.
6. CE or PE – This is the prefix, which will either have “CE” or “PE” Values at the very end
To give PRACTICAL EXAMPLE, the Original Ticker Value of ACC26NOV201600CE will get broken down into these 6 New Fields -
1. ACC
2. 26
3. NOV
4. 20
5. 1600
6. CE
BAJAJ-AUTO31DEC203000PE, will get broken down into these 6 New Fields -
1. BAJAJ-AUTO
2. 31
3. DEC
4. 20
5. 3000
6. PE
And so on, for all other values.
Can someone please suggest which method to use for breaking up the Original Field “Ticker” into 6 New Fields, as described above?
To make it easy for you, I have attached the sample data "20201123.csv" file on which this flow needs to be run. Although the actual data file is quite big, having millions of rows in it, but I have made a very small sample out of it for sharing on the forum. All this data has a constant schema throughout.
I have also attached the Alteryx Worflow Package .yxzp file, so that you can easily open it up on your machine. I have used the latest version of Alteryx for creating this workflow. You may please have a look.
Solved! Go to Solution.
I'd use a regex tool and set it to PARSE.
The expression would be:
(\D+)(\d{2})(\u{3})(\d{2})([0-9\.,]+)(\u+)
as long as the name doesn't have numbers, that expression should work. If it can have numbers, then change \D to . (Decimal dot).
cheers,
mark
@StockMarket
Hope this is what you need.
Thank you @Qiu and @MarqueeCrew
The solution provided by Qiu works well for the data that I posted in the Original Example. But that formula fails if there is an additional DECIMAL VALUE at the end.
For example -
It works well for breaking down -
ACC26NOV201600CE
BAJAJ-AUTO31DEC203000CE
But it does not works for -
ACC26NOV201600.5CE
BAJAJ-AUTO26NOV203100.5CE
Please notice the additional "Decimal .5" just before CE
Can someone please suggest the modification in the formula suggested by Qiu -
(.+?)(\d{2})(\w{3})(\d{2})(\d+)(\w{2})
so that it works for those cases as well which might have Decimal before CE/PE
In some cases the decimal is present, while in other cases there is no decimal, so I need a solution which will work for both the cases.
I have attached additional data for testing this decimal issue.
Thanks a lot
And how do I possibly FILTER OUT those particular data rows, for which the parsing has not happened correctly? I want to separate the output into 2 parts.
One for which parsing happened successfully.
And Two for which parsing did not happen successfully.
How to add this additional step to the above workflow?
Thank you
@StockMarket
Wow I almost forgor this one.
Just modified the RegEx to adapt your new data sample.
And for the checking purpose, I use a Formula tool to put the pieces back together then check the length again the length of original string.
User | Count |
---|---|
107 | |
82 | |
72 | |
54 | |
40 |