Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Creating buckets for large amount of values

HadiElmo
5 - Atom

Hi Everyone, I have a variable called SUM which has around 16 million entries I need to assign them to buckets for e.g.

 

SUM Bucket
9000050,001 - 100,000 
350000 - 50,000

 

I know that something like this can be quickly done with an ifelse statement however I have a large number of values and they need to go into several buckets. The bucket breakdown is as follows:

 
0 - 50000
50001 - 100000
100001 - 250000
250001 - 500000
500001 - 1000000
1000001 - 1500000
1500001 - 2000000
2000001 - 2500000
2500001 - 3000000
3000001 - 3500000
3500001 - 4000000
4000001 - 4500000
4500001 - 5000000
5000001 - 6000000
6000001 - 7000000
7000001 - 8000000
8000001 - 9000000
9000001 - 10000000
10000001 - 11000000
11000001 - 12000000
12000001 - 13000000
13000001 - 9999999999

 is there a more efficient way then using ifelse statements? I was thinking of using Tiles function but wasn't sure if that would work here. 

 

Thanks in advance!

5 REPLIES 5
cpet13
11 - Bolide

@HadiElmo you could use an Input Text tool, with the Group Start and Group End dates specified for all groups you need, then use a filter tool to bring back the row where the value falls in between the group start and end values. A formula tool could then be used to put the group start and group end columns into one Group column. See the attached example.

cpet13
11 - Bolide

If you are doing this for all 16 million rows, however, the Append Fields option would create 16 million rows x the number of groups, which would make the workflow take forever to run.

Yoshiro_Fujimori
15 - Aurora

Hi @HadiElmo ,

 

To @cpet13 's point, here is a sample workflow with Append Fields tool. Hope this helps.

 

Workflow

1317407_workflow.png

apathetichell
19 - Altair

the problem is - your buckets are arbitrary. Sometimes the range is 50000, sometimes its 100,000 sometimes it's 1,000,000. This makes sense to you - but putting it into logic is complicated - it is a pain. Your solution is exactly what you think it is - a massive if statement or a batch macro which matches a value. if you create a 50k threshold you can take the ceil of the number and divide by 50k. You can then adjust your values as needed and further parse it out based upon that value (if it it's value 1, take a ceil by 10k or whatever) and if it's value 13-19 you can turn it into value 12, and 20+ can become X - whatever your logic is.

 

you can use the tile tool in manual mode - but that will still require some work. you can do something nifty like bring in your range as a datasource, use summarize tool to turn it into xml - edit the xml of the manual part of the tile tool to automatically create your thresholds. It's not intuitive but it's possible.

 

and here's an alternative append fields way...   we make some modifications to the ranges to have them split by | - we then crosstab. we append a single field. we then use multi-field formula with regex to bin. We overwrite those values into true/false range grid - which would actually work if you were doing one hot encoding.... If that's the goal - hey! You/re done. But if that's not the goal - then I transpose and add a filter.

Yoshiro_Fujimori
15 - Aurora

@HadiElmo ,

 

Please have a look at another solution with Dynamic Replace Tool, which looks simple and "dynamic".😀

 

Workflow

1317407_workflow_2.png

Expression in Formula Tool

[New Field] = "SUM"

[Expression] = 

  "[_CurrentField_] >= " + GetWord([Bucket], 0) +
  " AND [_CurrentField_] <= " + GetWord([Bucket], 2)

 

Dynamic Replace configuration

  Field Name Field : New Field

  Boolean Expression Field : Expression

  Output Value Field : Bucket

 

Output

SUMBucket
00 - 50000
10 - 50000
500000 - 50000
5000150001 - 100000
5000250001 - 100000
10000050001 - 100000
100001100001 - 250000
Labels
Top Solution Authors