Conditional Median Calculation
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello,
So I'm working with a data set where every record represents the price of a product in a particular store. Each record also has other identifying information such as DMA, Area Director, Operator, Product Category, etc..
From here I am trying to create 2 fields calculating 2 things:
1. A simple median using the summarize tool grouping by Product/DMA/Operator/Etc.
2. A conditional "Rest of records median" where I want to know the median of the selection of records if I group by all of the same criteria as above with the exclusion of 1 criteria. See below for a simplified sample of data - Ex. In essence I want to know for each of the records with Operator =A what is the median price in Tampa for all Non-A stores, rinse repeat for all the other records in a similar fashion and output that in a new field for each record.
Appreciate the help and thank you in advance!
Solved! Go to Solution.
- Labels:
- Common Use Cases
- Expression
- Parse
- Preparation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Instead of group by in summarize, why don't you use a filter before summarize.
So in your example, use a filter with Operator = A, then take the median of the true output and the median of the false output.
Let me know if this solution will not work in your use case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
How about the attached?
This treats Product and DMA Combination as the Bucket and Operator as the Individual:
- Summary to get aggregation value at bucket and individual level
- Join on the Bucket fields to bring all underlying data back in
- Filter to remove matching individual records
- Summarize to get aggregation of "Other" records in the bucket for the individuals
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Awesome, this is perfect.
Even after a couple months I'm still learning to forget the well trodden excel-centric way of thinking and miss these oh so simple yet elegant solutions. Thank you again!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I have some data based on Month, Activity and the Absolute difference between say 22 values
month activity Absolute Diff
201704 13477 1236
201705 14713 175
201706 14538 524
201707 15062 251
201708 14811 361
201709 14450 1149
201710 15599 70
201711 15669 1117
201712 14552 1428
201801 15980 1058
201802 14922 1010
201803 15932 802
201804 15130 1971
201805 17101 310
201806 17411 721
201807 18132 944
201808 17188 797
201809 16391 1592
201810 17983 226
201811 17757 730
201812 17027 1174
201901 18201 Null
This is really easy in Excel =Median(D8:D29) and is done in a few seconds
However i am struggling to do this in Alteryx. Can anyone suggest how i can do this.
In addition i also need to Calculate Sigma (Standard Deviation) along with Upper and Lower Control Limits of this dataset.
Please advise how i would construct a workflow to enable me to do this ?
