How do I clean this data? [newbie, please help :(]


Hi all,


I'm quite new to Alteryx and wondering if someone can help by providing guidelines, examples, yxmd, etc, on how to clean this dataset




There are also some labels under page 6-12

My team already tried their best, but I don't know if this is the correct one. I got a lot of nulls and we could not load all of the data for those years stated. What should I do?

My goal is to make data that can be transferred to tableau under 

EEOC Explore


Thank you for your time.



In opening the 2021 file, I see that there are indeed a lot of null/blank cells - you will have to decide if those are valid blanks or if you need to fill them in. To fill those in, your team will have to agree on what methodology to use - there are several imputation approaches to choose from, and they have their own pros and cons, depending on the nature of your final analysis. Once you have that in mind, we can then design that flow in Alteryx.


I also see a lot of asterisks (*) in columns that I assume are meant to be numbers. The presence of those asterisks forces Alteryx to treat those columns as strings, not numbers. I'm not sure if the asterisks have any real meaning in the context of the dataset, but if you want to remove them, you might designate those columns as numeric fields (e.g. Int64 or Double) using the Select Tool, or use the Multi-Field Formula Tool (with the Regex_Replace formula) to remove those asterisks across all those fields in one go.


hI @Peachyco  What imputation would you recommend? Would you mind providing the yxmd so I can take a look at the clean data? thanks


Hi @Peachyco  could you please help this? I already tried my best to amputate the nulls, but causing some skewness.
