Hi everyone,
This is something odd I'm seeing with the Basic Data Profile. I'm realizing my output from it isn't matching up my actual input source. For instance, in a certain field, the Basic Data Profile will show my Maximum date as 2018-12-31 and my Minimum date as 2018-06-01. However, if I use the Summarize tool to group by that date field, I see my date ranges going from 2018-01-31 to 2020-02-29. I have similar discrepencies with fields like Counting non-NULL in a particular field. The Basic Data Profile tool will show a field as having 57,692 Non_Nulls. However, using the Summarize tool for that field my Non-Null count is actually 62,457,983. My input file has 62,457,983 records in it in total - I'm not sure if there's a size limit to how many records the Data Profile tool actually looks at? Is there a way to change that configuration so it actually looks at the entire file?
If not - what are some other quick/easy ways to perform these similar checks across 20 different data sources?
Hi @hydrogurl01
You may need to explore the Field Summary building block which is part of "Data Investigation" suite.
Just locate the tool and create on open example to see how it could be used. Cheers!
Hi @christine_assaad ,
I was taking a look at the Field Investigation tool - it's helpful - but also takes much longer. I have ~20 sources I'm trying to do this for and union together for one report. Do you know of any quicker way to get some basic data quality stats like these provide for each of them? I'm assuming the last resort would be to do a Summarize tool on each and every field and what I want to summarize on and union those together but would like to avoid that.
I've seen similar discrepancies. Did you ever get this figures out?
I've learned that the Basic Data Profile tool has a memory limit. That looks to be the issue I am having.