Hello all,
First off I’m sorry about posting my second question on such a short time but I can’t seem to find a solution to my problem and if anyone can give me some hints I would be truly thankful.
I have a very large dataset with Client ID’s, date and Items they purchased. It looks like this:
Client Code | Date | Item Purchased |
7788ii | 201207 | A |
7788ii | 201208 | A |
7788ii | 201209 | A |
7788ii | 201210 | A |
7788ii | 201211 | |
7788ii | 201212 | |
7788ii | 201301 | |
7788ii | 201302 | |
7788ii | 201303 | |
21easad | 201207 | |
21easad | 201208 | |
21easad | 201209 | B |
21easad | 201210 | B |
21easad | 201211 | B |
21easad | 201212 | B |
21easad | 201301 | A |
21easad | 201302 | A |
21easad | 201303 | A |
upoo9 | 201207 | C |
upoo9 | 201208 | C |
upoo9 | 201209 | C |
upoo9 | 201210 | |
upoo9 | 201211 | |
upoo9 | 201212 | |
upoo9 | 201301 | |
upoo9 | 201302 | C |
upoo9 | 201303 | C |
4r32rrer | 201207 | |
4r32rrer | 201208 | |
4r32rrer | 201209 | B |
4r32rrer | 201210 | B |
4r32rrer | 201211 | B |
4r32rrer | 201212 | C |
4r32rrer | 201301 | |
4r32rrer | 201302 | |
4r32rrer | 201303 |
Now I would like to create a new column with a classification for each customer per date. There are five possible classifications:
An example of the output I’m trying to achieve:
Client Code | Date | Item Purchased | Classification |
7788ii | 201207 | A | Re-Buying |
7788ii | 201208 | A | Re-Buying |
7788ii | 201209 | A | Re-Buying |
7788ii | 201210 | A | Re-Buying |
7788ii | 201211 | Lost customer | |
7788ii | 201212 | ||
7788ii | 201301 | ||
7788ii | 201302 | ||
7788ii | 201303 | ||
21easad | 201207 | ||
21easad | 201208 | ||
21easad | 201209 | B | New Customer |
21easad | 201210 | B | Re-Buying |
21easad | 201211 | B | Re-Buying |
21easad | 201212 | B | Re-Buying |
21easad | 201301 | A | Switch |
21easad | 201302 | A | Re-Buying |
21easad | 201303 | A | Re-Buying |
upoo9 | 201207 | ||
upoo9 | 201208 | C | New Customer |
upoo9 | 201209 | C | Re-Buying |
upoo9 | 201210 | Lost Customer | |
upoo9 | 201211 | ||
upoo9 | 201212 | ||
upoo9 | 201301 | ||
upoo9 | 201302 | C | Regained Customer |
upoo9 | 201303 | C | Re-Buying |
4r32rrer | 201207 | ||
4r32rrer | 201208 | ||
4r32rrer | 201209 | B | New Customer |
4r32rrer | 201210 | B | Re-Buying |
4r32rrer | 201211 | B | Re-Buying |
4r32rrer | 201212 | C | Switch |
4r32rrer | 201301 | Lost Customer | |
4r32rrer | 201302 | ||
4r32rrer | 201303 |
Is it possible to create a function or use a tool to classify automatically in a large dataset? I’ve been trying to use both Multi Row and Multi Field formulas to write a nested IF function that can fit all of those conditions, but I can’t seem to specify the correct cells or even write the function correctly. If anyone knows how to do this, I would really appreciate some help!
Thank you.
Solved! Go to Solution.
The attached workflow seems to do the classification using a combination of the multi-formula tool and the formula tool. The majority of the classification is handled by the second multi formula tool using a semi complex if statement. There were a few classifications that I was struggling with, such as New Customer and Regained customer, so I ended up calculating the first purchase date and then assigning these values based on comparison to the first purchase date. This is handled in the final formula tool.
I don't think this will have a huge performance impact if used on a large dataset besides maybe the sort tool. If you are worried about the cleanliness of your workflow or need this classification in multiple places, you could turn the workflow into a macro so it would be one tool in your overall workflow as opposed to all of the steps included in the attached workflow. You would only have to change the final browse tool to a macro output tool and the text input tool to a macro input tool.
Hope that helps and let me know if you have any questions!
Thank you very much, it works perfectly!
I was just wondering, if in the future I would like to change the criteria for the "Lost Customer" formula (a customer would need to not purchase for a full 3 months before being labeled as a "Lost Customer" - if he only stopped for 1 or 2 months it should just show "Re-Buying"), what changes should I do in the if argument?
You should be able to handle these changes within the Multi-Row Formula tool. You would change the settings to allow you to look at the previous 3 rows (There is a up down arrow near the top of the configuration for controlling this) and then edit the formula accordingly. The if statement for Lost Customer would now look something like this:
IF !ISNULL([row-3: Item Purchased]) AND ISNULL([row-2: Item Purchased]) AND ISNULL([row-1: Item Purchased]) AND ISNULL([Item Purchased]) THEN "Lost Customer" ELSEIF..... ..... ENDIF
Once again, it works wonders. Thank you!