Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Dynamically update Workflow based on Input Schema

YULteryx
5 - Atom

Hi everyone!

 

We're currently looking at hundreds of files and trying to "figure out" what column is most probably the Primary Key.

I have created a simple workflow which will read/write .CSV files from/to HDFS. Step by step, I am:

 

  1. Summarizing all the columns from the input to count unique values ("Count distinct")
  2. Transposing to obtain 2 columns (column_name + count_Distinct)
  3. Appending the total number of rows for each and calculating the % of unique values
  4. Sorting by % descending, selecting relevant columns and outputting the result.

This works well for a single file, but it would be amazing if we could automate the process for our +- 300 files tables (with different schema/size):

 

  1. For the input, I'm thinking of a batch macro for all files and Auto configure by name to adapt for the schemas
  2. For the output, I'm thinking about taking the name of the .CSV from a field (1 output per table name)
  3. My issue is dynamically updating the "summarize" tool, which is crucial for the workflow. The tool would need to "Select All columns" and "Count Discount" for all, no matter how many columns are within the table. I thought about modifying the inner XML with an action tool, but I'm unsure if specifying "select all + count discount" is do-able.

 

Thanks in advance!

4 REPLIES 4
LukeM
Moderator
Moderator

Hi @YULteryx ,

 

Instead of using the Summarize tool to Count Distinct, could you:

 

  1. Transpose all your data to just Name and Value fields
  2. Use a Unique tool to get just distinct Name and Value pairs
  3. Add a new field which is just a 1 for every row
  4. Use the Summarize tool to 'Group By' Name and 'Sum' the new 1 field, creating a count.

 

This should replicate the Count Distinct but in a more dynamic way for your Batch Macro.

 

Hope this helps.

 

Luke

Claje
14 - Magnetar

Hi,

 

There may be a performance reason not to do this, but have you tried transposing the data first?

Then you can Group By the Name field, and take a Count of Value and a Count Distinct of value at the same time, which will let you perform your calculations from there.

You might need to filter out NULL values as well.

LukeM
Moderator
Moderator

Love the finding the Primary Key solution by the way!

YULteryx
5 - Atom

Thank you both for your answers! It clearly shows how Alteryx offers several ways to obtain the same output.

I will try to implement your approach and let you know how it goes.

 

Cheers.

 

[EDIT] - it works perfectly. Appreciate your support!

Labels