Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Automation of cleaning across diferent survey

HHV
5 - Atom

Hi, I am working on survey responses which contains around 35000 columns. These surveys are of different industries with different location. To clean this data we need to remove some specific columns with few key words and also need to remove few rows. Apart from removal of rows we need to add one more column for current date. This can easily be done if we have to remove same types columns across different industries but this is not the case. For every industry type we have to remove different columns according to different key words. Also in some of industry survey we don't have to remove rows also.

So, if anyone of you can suggest me some approach how we can regualarize  the  workflow across different industries without human interference. 

We are able to do this with python. In python we created a reference file in which we have put industry type and their corresponding cleaning values like for column removal we have written different key words according to different industry.

 

Thanks in Advance !

2 REPLIES 2
afv2688
16 - Nebula
16 - Nebula

Hello @HHV ,

 

Did you check the dynamic select tool? This may help you in your requirements. You can through a formula select which columns to select.

 

To remove the rows would be something more complicated but there are many ways (use simple filter tool, use multi field formula to create new columns and then filter if the results based on the results you get, create a database with all requirements and apply them after with a join tool to remove the desired rows, etc).

 

Hope this helps you to get to your solution 🙂

 

Regards

danilang
19 - Altair
19 - Altair

Hi @HHV 

 

You can replicate this process in Alteryx like this, assuming the files are in a single directory tree.  

 

  1. Use a Directory tool set to scan subdirectories to get all the file paths
  2. Pass the list of files as the control parameter to a batch macro.  The reason that you need to treat the files one by one is that they'll have different schema so there's no way to read them all in a single input tool.
  3. Within the macro, read the file, determine it's industry type and preform the manipulations based on your template file.  The Dynamic Select that @afv2688 mentioned will help to remove columns and the Dynamic Rename to rename existing ones. Filter tools can be used to remove rows
  4. The data read from all the files will be unioned into a single data set that will be passed back to your main workflow to be analyzed 

Dan

Labels