This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
It's the most wonderful time of the year - Santalytics 2020 is here! This year, Santa's workshop needs the help of the Alteryx Community to help get back on track, so head over to the Group Hub for all the info to get started!
assume I have a unstructured source database with over 1.000 tables and up to hundreds of columns per table. Some tables are completely empty, some columns within tables are also completely empty. Some columns contain the same value for every row, thus have likely no particular meaning.
I am trying to use Alteryx to better understand the source data. Some ideas:
1. Filter empty tables
2. Filter empty columns within a table
3. Filter columns with only 1 unique value
4. Correlate columns to show redundancy
Do you have any ideas, workflows, resources on this topic in general or one of the tasks above in particular?
E.g. is there a practical way to even handle a large amount of tables within Alteryx?
First and foremost, there probably isnt a single way to perform this exercise, but my recommendation would be the following;
1. Get a list of the tables in a DB, probably directly from the management studio interface
2. Feed this list into Alteryx through the Dynamic Input tool, which would connect to each table (though you'll have to break this into some kind of Batch macro because each table's structure is different)
3. Then use the Data Investigation tools, specifically the Basic Data Profile would work, to enumerate which columns have 0 distinct non-null values, which would allow you to filter these out of the data set.
4. From there other data investigation tools would be able to be used to give you high level descriptives on each table's set of columns
Exploring unstructured/undocumented data is always interesting, but Alteryx should definitely have tools to help with the process!
Do you need the data to stay in those tables? This makes a huge difference in how this is approached. If you can disambiguate the data from the original tables and just extract the data we can do some interesting things.