We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

How do you handle the nulls? Any better way to handle nulls? Please help :)

rively90
8 - Asteroid

Hello,

 

 

Because I did remove all of the nulls, my data had some skew for the female.

 

Female contains 67% of the data while male contains 33%. This is not good for analysis.

 

What should I do with the nulls? How do I make this look like 50%/50% of the data or something similar like that?????????? or any other better solution?

 

YXZP is 100mb ish. So I need to upload to my personal dropdox. Please download it from there!

https://tinyurl.com/eeocdataset2

 

Thank you!

 

8 REPLIES 8
caltang
17 - Castor
17 - Castor

That is the process of feature engineering your data. Your statement on having more females than males skews the data seems off... what if the data is random already and that is by default the % breakdown? 

 

Maybe let's take a step back first and understand what kind of analysis are you trying to do?

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
caltang
17 - Castor
17 - Castor

In addition, can you export your workflow instead?

image.png

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
rively90
8 - Asteroid

@caltang  I think I may have made some mistakes. I don't know. I already tried my best though.   the data seems to have more female after cleaning the nulls.

I'm trying to compare male vs female group by sector. 

Is male dominating the one of the  industries (construction) ? is female dominating the education industry?

It does not make sense to have 67% women in the workforce??

 

 

rively90
8 - Asteroid

@caltang yxzp is in : I had to upload it to dropbox because it's 100+mb
https://tinyurl.com/eeocdataset2

caltang
17 - Castor
17 - Castor

You're right. After exploring the data set more, this is more so the issue of the data than your choices in removing nulls. 

 

The level of granularity changes each year of the dataset, with the common dimension being that the data is centred on the USA. If you don't mind the nulls, you can work with just the country dimension, and drop the rest with a Select Tool. Then work on your analysis from there - it becomes a nationwide study rather than a targetted one.

 

If you insist on requiring a targetted one, then you will have that trade off once you remove null values.

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
rively90
8 - Asteroid

What should I do? Is there any way to know which years that have a  lot of nullls

rively90
8 - Asteroid

@caltang have we found a solution yet??

caltang
17 - Castor
17 - Castor

I appreciate the enthusiasm, but this is something that you have to decide as the Data Analyst. 

What you can do is add a filter for each year, and make it into an analytical app so that you can see your results on a cross sectional basis. 

In addition, the previous workflow that I built for you is not used in your current model - I am not sure what you are trying to do or achieve.

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
Labels
Top Solution Authors