Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Most Efficient formula for “Find and Replace” type of requirement

StockMarket
8 - Asteroid

Hello

 

I am new to Alteryx and I have to design a workflow, which perform a simple “Find and Replace” task on a single field within the csv file. But there are 2 main problems that I am facing, the first one is that there are “ almost 60 comparisons” that have to be made for this find and replace task and secondly, this flow needs to be run on csv files, which have Millions of Rows inside them.

 

As you can see in the attached sample data file, there is a field named “Ticker” which has got multiple details inside it in the form of a “Large String Value”, which also including the DATE in this format –
“YYMMM” where YY stands for Year in short form like 19 for the year 2019, and MMM stands for 3 digits of the Month, like FEB stands for February and so on for all other months.

 

Now, my requirement is that I have to actually REPLACE this 5 character “Date” from these large string values with the new 7 character date, which will also include the actual 2 digits of date inside them. Instead of just telling YYMMM, they will actually show the complete date in this format – DDMMMYY, where DD is the date, MMM is the month and YY is the year.

 

In short, “YYMMM” needs to be replaced with “DDMMMYY”, leaving all other information inside that string value as it is.


For example, the original Ticker value of

 

“ACC19JAN88260PE” will get converted to “ACC31JAN1988260PE”

 

means “19JAN got replaced by 31JAN19” ,

 

and the original Ticker value of “ACC19FEB88260PE”

 

will get converted to “ACC28FEB1988260PE”

 

means “19FEB got replaced by 28FEB19”

 

and so on for all different field values. I hope you understand the requirement.

 

 

 

Now, for adding this “DD” part, we have a table of values, which tells exactly what value there should be for a particular “YYMM” as given in the original data. Please have a look at the attached sample data file. We can’t just use any random date like the first date of a month or the last date of month etc. instead we have to replace the specific “YYMMM” portions with their corresponding “DDMMMYY” as given in the attached table.

 

Although this requirement is quite simple and it could be achieved by either using a formula for “Find and Replace” or by using “Refex” or any other method like that. However, I have to run this flow on csv data files, which are quite large, having around 4 million rows per csv file and I have to run this flow onto hundreds of such files.

 

So I am looking for an EFFICIENT SOLUTION for this problem, so that the PC does not remain in the HANG STATE for large duration of time because of this Extensive Load on the CPU and RAM etc.


Can someone please suggest the most efficient way of achieving this objective for large data sets? Any help or guidance is most welcome.

 

To make it easy for you, I have attached the sample data 20190130.csv file on which this flow needs to be run. Although the actual data file is quite big, having millions of rows in it, but I have made a very small sample out of it for sharing on the forum. All this data has a constant schema throughout.

 

I have attached the Excel File named "Date Table for FIND AND REPLACE REQUIREMENT .xlsx" which has to be used for creating the “Find and Replace Formula”.

 

I have also attached the Alteryx Worflow Package .yxzp file, so that you can easily open it up on your machine. I have used the latest version of Alteryx for creating this workflow. You may please have a look.

 

Thanks a lot

 

PS: Please note that although we are working with DATES throughout this workflow, but actually the original field “Ticker” is a STRING and it has to be kept as a string only, while doing all these steps, because it not only contains this date information, but a lot of additional information as well, which is all combined together within this string. Therefore, we cannot change this string to date format in order to make use of those formulas, which are specifically designed for working with dates. We have to treat this Ticker Field as a string only. You will understand this situation easily, when you look at the actual data.

 

Sample Data.png

 

Workflow.png

 

11 REPLIES 11
MarqueeCrew
20 - Arcturus
20 - Arcturus

I've got a cleanse macro brewing but no new cars. 

cheers,

 

 mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
StockMarket
8 - Asteroid

@MarqueeCrewSuch an Amazing Reply. I totally agree with @TuvyL and @RussJD You are at another level! I didn't expect someone would do such a detailed analysis for EFFICIENCY ASPECT. I am reading your post multiple times in order to grasp it. As I am new to Alteryx, it will take me some more time and effort to totally grasp the details and to start making use of this info practically into my workflows.

 

I deeply appreciate the time that you took to do this experiment and share these insights with the community.

 

If you or anyone else is aware of more such links, which tells how to make the workflow as efficient and generic as possible, then please share those links in this thread, or message me directly. That would be extremely helpful to me, as I am in my initial stage of learning this amazing tool, and such tricks will help me a lot in all my workflows, going ahead.

 

Thanks and Best Regards

Labels