Hello,
Because I did remove all of the nulls, my data had some skew for the female.
Female contains 67% of the data while male contains 33%. This is not good for analysis.
What should I do with the nulls? How do I make this look like 50%/50% of the data or something similar like that?????????? or any other better solution?
YXZP is 100mb ish. So I need to upload to my personal dropdox. Please download it from there!
https://tinyurl.com/eeocdataset2
Thank you!
That is the process of feature engineering your data. Your statement on having more females than males skews the data seems off... what if the data is random already and that is by default the % breakdown?
Maybe let's take a step back first and understand what kind of analysis are you trying to do?
In addition, can you export your workflow instead?
@caltang I think I may have made some mistakes. I don't know. I already tried my best though. the data seems to have more female after cleaning the nulls.
I'm trying to compare male vs female group by sector.
Is male dominating the one of the industries (construction) ? is female dominating the education industry?
It does not make sense to have 67% women in the workforce??
@caltang yxzp is in : I had to upload it to dropbox because it's 100+mb
https://tinyurl.com/eeocdataset2
You're right. After exploring the data set more, this is more so the issue of the data than your choices in removing nulls.
The level of granularity changes each year of the dataset, with the common dimension being that the data is centred on the USA. If you don't mind the nulls, you can work with just the country dimension, and drop the rest with a Select Tool. Then work on your analysis from there - it becomes a nationwide study rather than a targetted one.
If you insist on requiring a targetted one, then you will have that trade off once you remove null values.
What should I do? Is there any way to know which years that have a lot of nullls
@caltang have we found a solution yet??
I appreciate the enthusiasm, but this is something that you have to decide as the Data Analyst.
What you can do is add a filter for each year, and make it into an analytical app so that you can see your results on a cross sectional basis.
In addition, the previous workflow that I built for you is not used in your current model - I am not sure what you are trying to do or achieve.