Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Alteryx skipping input files because they don't match the schema...but they do!

SergeantJack
8 - Asteroid

Hi.  I have a workflow set up and connected to a macro.  The macro makes the workflow cycle through every .xlsx file in a directory and pull out data from five different tabs in the Excel files.  These tabs have different schema.  Each tab's information is deposited into a separate workstream, which is then united later in the workflow.

 

My problem is this.  Currently I have four Excel files in the directory.  The first three Excel files have no data in Column X on one tab, and this is perfectly acceptable; however, the fourth file does have data in Column X.  When Alteryx reads in the data files, it determines that the fourth file has a different schema than the first three, and it skips that fourth file.  From what I've read in other threads, Alteryx is assigning a different data type to the fields with nulls than it is to the fields in the fourth that have actual data, which presents itself as different schema, but I can't figure out a way to verify this or correct it.

 

How do I get it to stop doing this?  I have fairly complicated (to me) setup for reading in multiple files and pulling data from multiple tabs, and I haven't been able to figure out a way to make Alteryx handle this problem correctly.

19 REPLIES 19
Felipe_Ribeir0
16 - Nebula

Hi @SergeantJack 

 

Yes, the case probably is that for some files some column is understood as string (because there are just numbers on some rows and strings on others) and for some files this same column is understood as number, because there are just numbers on all rows.

 

Just use the macro that is attached on this topic instead of using the dynamic input tool, this macro will accept the different schemas https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/The-Ultimate-Input-Data-Flowchart/t...

 

binuacs
20 - Arcturus
DataNath
17 - Castor

Hey @SergeantJack - I recently saw this exact scenario come up (where null fields default to double data types) and offered a solution here - part of the batch macro forces said fields into the same data type in order to handle this specific situation:

 

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Dynamic-Input-Multiple-Sheets-On-One-E...

SergeantJack
8 - Asteroid

@DataNath I'm not sure how to implement this.  I have a separate Analytic App that connects to the macro in my main workflow.  So for each data stream, here's how it's set up:

 

Analytic App --> Control Parameter --> (here's where it splits into different streams) Action --> Input Data tool

 

I don't seem to have the ability to add tools between the Action and Input Data tools; if I did, I would simply use a Filter to discard any records that had null values in Column X.  Any tools I add after the Input Data tool don't work, since at that point, file 4 has already been discarded for mismatched schema.

SergeantJack
8 - Asteroid

@Felipe_Ribeir0 and @binuacs This isn't quite my situation.  I have multiple files with what should be identical schema, but each file has five different tabs, and the tabs have different schema from each other.  I already have a batch macro set up to iterate through each data file in the directory, but each tab has to be brought in separately and processed separately downstream.  So, to explain:

 

The macro will look through all files in a directory, and for each file, it will pull in data from Tab A and deposit it in data stream A.  Then all data from each file's Tab B will be pulled in and deposited in data stream B, etc.

 

My problem is that, in this example, Tab D in files 1,2, and 3 have Column X that is all nulls, so Alteryx decides the data type for that column must be double.  Then file 4 is processed, and it actually has string data, so Alteryx discards that file since it doesn't match the schema that was already imputed when the first three files were processed.

 

I need a way for Alteryx to do one of two things:

1. Skip any row from Tab D that has null values in Column X (preferred), or

2. Specify that Column X is, was, and always will be data type V_WString (which is the case in this example).

DataNath
17 - Castor

@SergeantJack are you able to share what you've done so far, as well as the expected outcome? Screenshots or a copy of your workflow (with some mocked up data if your live data is too sensitive) would be great - makes it a lot easier to help diagnose the issue. I do think this will be based around a batch macro in order to avoid Alteryx discarding your files with null fields.

SergeantJack
8 - Asteroid

@DataNath Sure thing.  Here is how data is ingested:

 

001.PNG002.PNG

 

In the first data stream, marked '1 - Data Items' above, things are going just fine.  Files 1,2,and 3 all have legitimate data in them, and so Alteryx selects the correct data type and is able to bring in file 4.  In the example below, F6 is the data field in question.  The first three files end in ...101, ...102, and ...103, respectively, and the fourth file is named "Jack_Dummy_File".  Even though the example shows nulls for the ...103 file, it had non-null data previously.

003.PNG

 

In this data stream, marked 'Survey Questions' in the pic above, things are not correct:

004.PNG

In that stream, F7 is the target column, and it's all nulls for the first three files.  In that data stream, Alteryx decides that the data type must be double, and when it looks at Jack_Dummy_File, it sees actual data there with a V_WString format, and since it doesn't match what it already decided was a double data type, it doesn't include that file in the data import.  You can see in the pic above that it stops with file ...103.  Here is the error message for that particular tool:

 

005.PNG

 

DataNath
17 - Castor

Thanks for that @SergeantJack - really useful! Where are you getting this error though? I'd expect your workflow to error at the highest level i.e. Alteryx throws an error from whatever you've called your batch macro, rather than the Input Data tool. Are you using a wildcard or something in the Input Data tool and getting this error when running the workflow that makes up your batch macro?

SergeantJack
8 - Asteroid

@DataNath It's not actually an error, it's a warning, which means the workflow will run, but it will discard any data from my dummy file since its data doesn't match the schema of the first files, which had the null data.  You can see from the last pic I posted above that it's happening on the Input Data tool.  The wildcard you mentioned is specified in the Action tools that come before each Input Data tool.  In the Action tool configuration window, the action specified is "Replace a specific string:" and the wildcard I have in there is *.xlsx.

Labels