All,
I'm importing multiple excel files in a data set. I've found that when I import all files it ignores some saying that they have a different schema than the first one. I bring up the first and one that is rejected and the only difference I see is that the rejected one has several rows of empty data. I've include some sample files that fails with same message. Any idea of something I'm not seeing?
I've tried to include the files but not sure whether they made it or not.
Solved! Go to Solution.
Hey @jeffv
So, it's not just the column headers that need to match it's also the field types that also need to match.
If the first few lines are null then Alteryx will read the field type as "string" but on the next file the field might be populated with a date...so Alteryx reads this field type as "date" hence an data mismatch.
In the example you sent, the one file was completely empty....is that ever the case or was it just this example?
I'm thinking, using your new found Excel SQL knowledge...you could filter out all the null ID's on import?
Part time Tableau, Part Time Alteryx. Full Time Awesome
I assume you're using the asterisk "*" in the file name to import multiple files from one location? If that's the case, then what @LordNeilLord said is correct. Schema includes field names and data types, which must be identical for the wildcard trick to work on the Input tools.
If the field names are always the same and always exist, I usually create a small batch macro to handle situations like this. Inside the macro use the select node to set a fixed type for each of the fields, so that when they output from the macro they will all have the same type and be unioned together. Then you just use the folder tool to pass all of the file paths into the batch macro and it will go through, open each file, set the types of the fields, and output each file. It only takes a couple of tools, as below.
It appears that the rejection of file happens before the where clause takes affect, ie even if I say where `Serial Number` is not null the file is rejected. I'm a novice at batch macros but maybe what
@david_fetters is suggesting with a macro approach may take it a step forward. I thank you greatly for your feedback!
That's good to know!
Then batch macro it is :)
Thanks for idea of trying a batch macro. I'm still trying to understand batch macros (actually all macros). So with your suggested approach... I pass a list of files and the macro imports the data and changes the datatypes and passes back the data. Does the driver program does something in particular to merge the data from the various macro outputs?
@jeffv the outputs of each run (i.e. each row of the input into the control parameter) of a batch macro are unioned together, just like the result you would get if you use the asterisk on the input file. The trick here is using the select node to force them to be identical schemas and you'll get the exact same output you would have received had your original files all possessed the same schemas. Does that make sense?
Hmmm... I got it to work.... (which is surprising for me) but brings up fundamentals of Alteryx which I may have missed in starting with the app.
The first icon gets a listing of files in a directory and passes one at a time a file from the list to the macro. In my mind from how this works is the union waits for all output from macro, combines it then proceeds to next step. Slightly differently than I anticipated but great now that I understand it. Note that I have a select after union (post macro but inside macro I do have a select to make sure all fields have same datatype). Inside the macro the select does not give me an option to a field it to a date (likely because there were no rows in file) so internal to macro field is set to string and after macro I set it to date. I ran it and it almost works. The problem remaining is apparently format of date field must be in different formats. It complains about 3/19/2018 not being in valid format. I tracked down the excel file and the column format is listed as general. I suspect it would correct itself if I change the format to date. Is it best practice to run a preprocess to ensure all the column formats are correct before importing to Alteryx or is there a function that takes and tries with multiple format to convert to date format?
You can have a look at the Date Time functions in the help manual
Use a formula tool to create a new field to hold the properly formatted date, and set the format to Date or Datetime. For the formula use:
DateTimeParse([YOUR_FIELD],"%m/%d/%Y")
Instead of the field [YOUR_FIELD], put the column containing the date from your original data. This function will take data in a mm/dd/yyyy format and parse it into alteryx's datetime format. If you have multiple date time formats (e.g. some are mm/dd/yyyy, some are mm/dd/yy) it will make it much easier to get them into the same format before loading into Alteryx.
As a note, you do not need a union tool after your batch macro, as the batch macro itself will union the resulting data from all of its batches before outputting it into your data stream. Otherwise you're doing great! Building a batch macro and getting it to run isn't a 100 level course topic.