Just arrived in the Alteryx community - Happy to be here!
I am attempting to build a data profiling tool for an SQL database, that can read a variable amount of tables with an uneven number of columns in each table. The workflow should then be able to run a set of operations on each column (count blanks, count distinct, etc...) and output the results for each column into an Excel spreadsheet or similar.
I have done some testing with the input and dynamic input tools, and arrived at the workflow underneath. As you can see, there is an input node that retrieves the user-defined field and table-names in the database, and feeds them into a dynamic input node that loops through all fields and tablenames. However, when I try to link this up to anything else, I get trouble with several error messages, as well as the fact that I cannot find a way to dynamically feed the output from my dynamic input node into a formula node.
Is there anything I have overlooked in how to do this, or any alternatives that would be more sensible to make my dynamic profiling tool?
Hi,
Solved! Go to Solution.
Interesting idea - will play with at Lunch
Think you might need to be careful around different field types. I think you might need a couple of different paths for each of the type.
Attached my lunchtime playing.
You need to cast everything a fixed type then you are ok.
I cast both TableName and ColumnName in SQL to a nvarchar(256).
The value I cast to a 2048 block string. One optimsation would be to read max length needed from the input columns and set that.
The template query I used was:
Select CONVERT(NVARCHAR(256), '[GLOBAL_Matter]') TableName, CONVERT(NVARCHAR(256), '[CaseCategory_ID]') ColumnName, CONVERT(NVARCHAR(2048), [GLOBAL_Matter].[CaseCategory_ID],120 ) Value From [GLOBAL_Matter]
I adjusted the SQL table column query to:
select '[' + TABLE_SCHEMA + '].[' + TABLE_NAME +']' TableName, '[' + COLUMN_NAME + ']' ColumnName, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH, NUMERIC_PRECISION from INFORMATION_SCHEMA.COLUMNS
By surrounding with [] I think safer for this one.
Hope this helps!
Thanks for your advice! This seems a good step closer to a solution. However, I keep getting an error message in relation to the [Global_Matter]. Any thoughts?
Was just the sample table I used.
Have amended to:
select CONVERT(NVARCHAR(256), 'INFORMATION_SCHEMA.COLUMNS') TableName,
CONVERT(NVARCHAR(256), 'COLUMN_NAME') ColumnName,
CONVERT(NVARCHAR(256), COLUMN_NAME, 120) Value
from INFORMATION_SCHEMA.COLUMNS
You will need to change connection strings in both input and dynamic input (as points to my laptops local SQL Server)
Otherwise should work out of the box now I hope