Inputting multiple uniquely structured files at once

Question

So I'm brand new to Alteryx (2nd day with it), and I'm hoping to be able to streamline a process we have in place for consuming data from customers.

The customer sends us 5 files, each file contains a different set of data, but there are some common fields among the files. The task at hand is to analyze the 5 files and ensure that the data matches among the files. For example, the file files are:

Customer info

Manager info

Location info

Schedule info

Group info

Each file has a different number of fields and different amounts of data. For example, one of our customers might send us a Customer file with only the bare minimum of fields (name, dob, gender, address), which others send us much more info on each customer. They can send us this info in a multitude of formats, with or without headers, and with columns in different orders (regardless of the templates and requests we send for the data). Handling the fact that the data can come to us in a variety of formats I assume will be a much more complex issue, so for this I will focus on the customer that send us the data as we ask for it: tab delimited text file with headers.

In that case, I need to input the 5 files, then run some analysis to ensure that the Customer ID in the customer file is the same as the customer ID in the schedule and location files, as sometimes we get customers who send us the file and one will have a leading 0 and the other will not. I also need to be able to identify any occurrences of an ID in one file that do not appear in the other from both directions.

I've been reviewing posts on this forum for a while, and it seems that all the solutions for inputting multiple files require that the files contain the same data as they will be unioned into one data set. This doesn't match our current process, and while there most certainly could be a better way of doing what I need, I'm trying to recreate and streamline our current process which is:

Import all files into Access as 5 individual tables (which takes quite a few clicks to set the import up for each file for each customer), then create match queries looking to ensure that the IDs among the files are the same, that we have the data we need to have and that when we do the big match that joins all 5 files into one dataset, we come out with the expected number of records.

So,  all that being said, I would welcome any thoughts on the process, but the bottom line is, I'm trying to create a workflow that can be used for all of our customers without having to redefine the data input tool for the 5 fields for each customer.

Hope that makes some semblance of sense, this is a new world to me, so I'm not even sure I have all of the vocabulary down yet.

KaneG · Accepted Answer

Hey @serendipitytech,

Jus a quick response to try to keep you moving. There is probably a better response depending on just how different the data could be, howeverwhat you are looking for is automation of a process that is currently quite manual. I point that out as the best answer I have is a batch macro for validation (probs a little advanced for day 2 of your alteryx journey).

It's hard to give a simpler but comprehensive answer without seeing the data and then it becomes a consulting exercise... but you should be able to hook something up that identifies the file format (not filetype) using the FieldInfo tool, Summarise/CrossTab Tools and Formulas. If you need the fieldtype determined then an auto-field tool will get you there and if you need to see what files are in the directory then the Directory Input Tool is what your after. The below notes should assist as well:

* Dynamic Input will input a file from the directory tool
* FileGetExt() will get the Filetype
* Field Info Tool (with text input and Join) can be used to check all the field names/types

I hope this helps to move you to the next step

Kane