Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

GroupBy/Count on many fields, seperately, to create fill rate table

trevorwightman
8 - Asteroid

Hello,

 

I have 100+ columns of data where I need to create a fill rate table that gives me a breakdown of how many times each element within that field is present. For example, if I have a field called "Vehicle Make" I will want to know how many times "Honda", "Mazda", "Ford" and etc appears. If I just had to do this on one field it would be super simple and I would just need the one summarize tool. However, I need to do this on all 100+ fields and do not want to have 100+ different streams of data grouping the different fields. I was thinking a macro might be able to accomplish this where I tell the macro to group the Nth field each each iteration (Iteration 1 group/count the 1st field, iteration 2 group/count the 2nd field and so on).

12 REPLIES 12
Qiu
20 - Arcturus
20 - Arcturus

@trevorwightman 
It will better for us to understand the isue if you can provide simple data input and output 😁

 

trevorwightman
8 - Asteroid

Sure, absolutely, please see below:

 

In this example I have three columns of data. I want to group column number one and count how many of each element exists. I want to do the same for column 2 and 3 as well. Finally, I want to stack the results.

 

However, I do not want to use three seperate summarize tools (this would be very easy) because unlike this dummy dataset my real dataset has 100+ columns. So, I need to find a way to iterate through all of my columns get the appropriate grouping for each.

 

trevorwightman_0-1645600560340.png

 

atcodedog05
22 - Nova
22 - Nova

Hi @trevorwightman 

 

Here is how you can do it.

Workflow:

atcodedog05_0-1645600809502.png

 

Hope this helps : )

 

trevorwightman
8 - Asteroid

While this solution does work, I am not sure if it will work for my dataset (please let me know if I am wrong). My dataset is 250m records by 300 columns so if I did it this way I would transpose the data into 75 billion rows and then I would have to perform a group by on that. Are there any other options to use a macro to just perform 300 seperate iterations of grouping the data one column at a time?

SoccerTil
8 - Asteroid

not sure of it's efficiency, but the Frequency widget seems made for this.

 

SoccerTil_0-1645630776955.png

 

atcodedog05
22 - Nova
22 - Nova

Hi @trevorwightman 

 

As I know what i mentioned is the best possible way. Only way you can batch between columns is using select tool and selecting specific columns to process.

 

Hope this helps : )

SoccerTil
8 - Asteroid

I don't have a very large data set, but I had a file with 8,000 records and 15 columns that ran the workflow in 3 seconds.

Ben_H
11 - Bolide

Hi @trevorwightman,

 

As @atcodedog05 suggested the other way to do this is with a batch macro that runs on a column at a time.

 

I've knocked up an example but it could likely be made simpler. I've tested it on a file with 35 columns and 300,000 rows which took 30 seconds. vs. 9 seconds on the non-batched method. Obviously that file is a fraction the size of your input though.

 

Regards,

 

Ben

 

trevorwightman
8 - Asteroid

Hi @Ben_H ,

 

What you created is what I was thinking would solve my problem. Do you mind walking me through the logic of how this macro is working? I have some of your logic figured out but not all of it.

 

In the main workflow:

1. This is our parameter control where you are submitting just the Column Names

2. This is our original data. 

 

Since we are submitting these two streams into the macro does that mean that on the first iteration only the first parameter (or column) will be processed in the macro?

 

In the macro:

1. This is where our parameter comes in with the Column Name

2. This is where our data comes in.

3. How does this portion of the workflow work?

4. How does this portion of the workflow work?

 

As an aside, I was surprised that the macro version takes longer than just using transpose/summary on all of the data.

 

trevorwightman_0-1645635938269.png

 

Labels