Hi, I am working on survey responses which contains around 35000 columns. These surveys are of different industries with different location. To clean this data we need to remove some specific columns with few key words and also need to remove few rows. Apart from removal of rows we need to add one more column for current date. This can easily be done if we have to remove same types columns across different industries but this is not the case. For every industry type we have to remove different columns according to different key words. Also in some of industry survey we don't have to remove rows also.
So, if anyone of you can suggest me some approach how we can regualarize the workflow across different industries without human interference.
We are able to do this with python. In python we created a reference file in which we have put industry type and their corresponding cleaning values like for column removal we have written different key words according to different industry.
Thanks in Advance !
Hello @HHV ,
Did you check the dynamic select tool? This may help you in your requirements. You can through a formula select which columns to select.
To remove the rows would be something more complicated but there are many ways (use simple filter tool, use multi field formula to create new columns and then filter if the results based on the results you get, create a database with all requirements and apply them after with a join tool to remove the desired rows, etc).
Hope this helps you to get to your solution 🙂
Regards
Hi @HHV
You can replicate this process in Alteryx like this, assuming the files are in a single directory tree.
Dan