Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Turn Null to 0 only if data exists in the column

LinhNguyen
8 - Asteroid

Hi everyone!

 

I'm creating a data cleaning workflow to populate a table similar to the one below.

 

When there is data in the brand A column and there're only a few nulls, I want to replace it with zeros to use in other calculation.

However, when the whole column is null, I have formula to calculate it from the other columns. The condition for these later formula is that one of these three columns is all null, so I don't want to just turn all null values to 0.

 

Is there a formula I can use for this situation? What do you think is the fastest, most simple way to do it?

I used this to turn Null to 0 before moving on with the other formula, but I would like to know if there's a better way:

 

IF Average([Brand A]) >0 and [Brand A] = Null() then 0 else [Brand A] endif 

 

ProductBrand A Other brand Total
1123 
2243 
3364 
4 5 

 

 

ProductBrand A Other brand Total
1 315
2 327
3 440
4 55

 

 

4 REPLIES 4
PeterGoldey
11 - Bolide

You could separately run a summarize tool with a count (or count distinct) on the columns you want to check and then append the counts back into you dataset so you can check the fill rate as part of your IF statement.  (if there are a lot of rows in your dataset, this might not be the best solution).

 

Alteryx has a built in tool rather than configuring a regularly summary tool.

 

Data Investigation -> Field Summary

 

It will give you null and not null counts as well as min, max, etc. for any fields you check and you can pipe the results into your workflow (running it on a lot of columns in a large dataset can take a few seconds).

 

I use this all the time during data investigation, but also to automate constructing data dictionaries for our team and evaluating data changes over time.

Sol_O
6 - Meteoroid

@LinhNguyen , you can do the following steps (may not be the most straight forward but might work):

1.) Use the data cleansing tool to change all nulls in brand a to 0.

2.) Use the summarize tool to sum column Brand A

      a.) If the column is all null, sum(brand a) = 0, otherwise it should be >0 --- this is assuming that there is no "zero" value in Brand A if it's not null

3.) Append to the original table

4.) Formula tool - if [brand a]>0 then your calculation (for non-null column) else your other calculation (all null column) endif

 

Again #2 above only works if the non-null values in Brand A will not equal to zero.

 

Hope that helps.

danilang
19 - Altair
19 - Altair

Hi @LinhNguyen 

 

Here's a dynamic solution that will handle any number of columns

 

WF.png

 

It transposes to get all the columnar data into a Name and a Value column. You can then apply summary functions to all the column data at once.

 

It transforms this input data

 

Input.png

into these results

 

Results.png

replacing nulls with zeros only if there is other data in that column

 

Also:  The Average() function that you have below

 


@LinhNguyen wrote:

...

IF Average([Brand A]) >0 and [Brand A] = Null() then 0 else [Brand A] endif 

...

 

only looks at the current row.  It doesn't perform an average of all the Brand A in all the rows, so you can't use it to to determine if there are nulls in the column

 

Dan

LinhNguyen
8 - Asteroid

This is exactly what I was looking for. Thank you @danilang !

Labels