Alteryx Designer Desktop Discussions

YULteryx · ‎04-23-2019

Hi everyone!

We're currently looking at hundreds of files and trying to "figure out" what column is most probably the Primary Key.

I have created a simple workflow which will read/write .CSV files from/to HDFS. Step by step, I am:

Summarizing all the columns from the input to count unique values ("Count distinct")
Transposing to obtain 2 columns (column_name + count_Distinct)
Appending the total number of rows for each and calculating the % of unique values
Sorting by % descending, selecting relevant columns and outputting the result.

This works well for a single file, but it would be amazing if we could automate the process for our +- 300 files tables (with different schema/size):

For the input, I'm thinking of a batch macro for all files and Auto configure by name to adapt for the schemas
For the output, I'm thinking about taking the name of the .CSV from a field (1 output per table name)
My issue is dynamically updating the "summarize" tool, which is crucial for the workflow. The tool would need to "Select All columns" and "Count Discount" for all, no matter how many columns are within the table. I thought about modifying the inner XML with an action tool, but I'm unsure if specifying "select all + count discount" is do-able.

Thanks in advance!

LukeM · ‎04-24-2019

Hi @YULteryx ,

Instead of using the Summarize tool to Count Distinct, could you:

Transpose all your data to just Name and Value fields
Use a Unique tool to get just distinct Name and Value pairs
Add a new field which is just a 1 for every row
Use the Summarize tool to 'Group By' Name and 'Sum' the new 1 field, creating a count.

This should replicate the Count Distinct but in a more dynamic way for your Batch Macro.

Hope this helps.

Luke

Claje · ‎04-24-2019

Hi,

There may be a performance reason not to do this, but have you tried transposing the data first?

Then you can Group By the Name field, and take a Count of Value and a Count Distinct of value at the same time, which will let you perform your calculations from there.

You might need to filter out NULL values as well.

LukeM · ‎04-24-2019

Love the finding the Primary Key solution by the way!

YULteryx · ‎04-24-2019

Thank you both for your answers! It clearly shows how Alteryx offers several ways to obtain the same output.

I will try to implement your approach and let you know how it goes.

Cheers.

[EDIT] - it works perfectly. Appreciate your support!

Alteryx Designer Desktop Discussions

Dynamically update Workflow based on Input Schema

Re: Is there any way the computer vision tools can...

Re: Batch Macro

Re: How to get cell reference address from excel

Re: Replacing Forecast columns with Actual Data

Re: Row creation