Hi everyone!
I'm creating a data cleaning workflow to populate a table similar to the one below.
When there is data in the brand A column and there're only a few nulls, I want to replace it with zeros to use in other calculation.
However, when the whole column is null, I have formula to calculate it from the other columns. The condition for these later formula is that one of these three columns is all null, so I don't want to just turn all null values to 0.
Is there a formula I can use for this situation? What do you think is the fastest, most simple way to do it?
I used this to turn Null to 0 before moving on with the other formula, but I would like to know if there's a better way:
IF Average([Brand A]) >0 and [Brand A] = Null() then 0 else [Brand A] endif
Product | Brand A | Other brand | Total |
1 | 12 | 3 | |
2 | 24 | 3 | |
3 | 36 | 4 | |
4 | 5 |
Product | Brand A | Other brand | Total |
1 | 3 | 15 | |
2 | 3 | 27 | |
3 | 4 | 40 | |
4 | 5 | 5 |
Solved! Go to Solution.
You could separately run a summarize tool with a count (or count distinct) on the columns you want to check and then append the counts back into you dataset so you can check the fill rate as part of your IF statement. (if there are a lot of rows in your dataset, this might not be the best solution).
Alteryx has a built in tool rather than configuring a regularly summary tool.
Data Investigation -> Field Summary
It will give you null and not null counts as well as min, max, etc. for any fields you check and you can pipe the results into your workflow (running it on a lot of columns in a large dataset can take a few seconds).
I use this all the time during data investigation, but also to automate constructing data dictionaries for our team and evaluating data changes over time.
@LinhNguyen , you can do the following steps (may not be the most straight forward but might work):
1.) Use the data cleansing tool to change all nulls in brand a to 0.
2.) Use the summarize tool to sum column Brand A
a.) If the column is all null, sum(brand a) = 0, otherwise it should be >0 --- this is assuming that there is no "zero" value in Brand A if it's not null
3.) Append to the original table
4.) Formula tool - if [brand a]>0 then your calculation (for non-null column) else your other calculation (all null column) endif
Again #2 above only works if the non-null values in Brand A will not equal to zero.
Hope that helps.
Hi @LinhNguyen
Here's a dynamic solution that will handle any number of columns
It transposes to get all the columnar data into a Name and a Value column. You can then apply summary functions to all the column data at once.
It transforms this input data
into these results
replacing nulls with zeros only if there is other data in that column
Also: The Average() function that you have below
@LinhNguyen wrote:...
IF Average([Brand A]) >0 and [Brand A] = Null() then 0 else [Brand A] endif
...
only looks at the current row. It doesn't perform an average of all the Brand A in all the rows, so you can't use it to to determine if there are nulls in the column
Dan
This is exactly what I was looking for. Thank you @danilang !