So I'm brand new to Alteryx (2nd day with it), and I'm hoping to be able to streamline a process we have in place for consuming data from customers.
The customer sends us 5 files, each file contains a different set of data, but there are some common fields among the files. The task at hand is to analyze the 5 files and ensure that the data matches among the files. For example, the file files are:
Customer info
Manager info
Location info
Schedule info
Group info
Each file has a different number of fields and different amounts of data. For example, one of our customers might send us a Customer file with only the bare minimum of fields (name, dob, gender, address), which others send us much more info on each customer. They can send us this info in a multitude of formats, with or without headers, and with columns in different orders (regardless of the templates and requests we send for the data). Handling the fact that the data can come to us in a variety of formats I assume will be a much more complex issue, so for this I will focus on the customer that send us the data as we ask for it: tab delimited text file with headers.
In that case, I need to input the 5 files, then run some analysis to ensure that the Customer ID in the customer file is the same as the customer ID in the schedule and location files, as sometimes we get customers who send us the file and one will have a leading 0 and the other will not. I also need to be able to identify any occurrences of an ID in one file that do not appear in the other from both directions.
I've been reviewing posts on this forum for a while, and it seems that all the solutions for inputting multiple files require that the files contain the same data as they will be unioned into one data set. This doesn't match our current process, and while there most certainly could be a better way of doing what I need, I'm trying to recreate and streamline our current process which is:
Import all files into Access as 5 individual tables (which takes quite a few clicks to set the import up for each file for each customer), then create match queries looking to ensure that the IDs among the files are the same, that we have the data we need to have and that when we do the big match that joins all 5 files into one dataset, we come out with the expected number of records.
So, all that being said, I would welcome any thoughts on the process, but the bottom line is, I'm trying to create a workflow that can be used for all of our customers without having to redefine the data input tool for the 5 fields for each customer.
Hope that makes some semblance of sense, this is a new world to me, so I'm not even sure I have all of the vocabulary down yet.