Alteryx Designer Desktop Discussions

Winston · ‎07-27-2021

I'm working on a project to eliminate some Access database files. I'm trying to read in 9 files for a total of approximately 4 million records, union the data together, and then export it back out in one of three ways, either an Oracle database, an Alteryx database file, or both.

I'm seeing 2 issues:

1. In the image below, and the attached file, the top workflow reads the database files in and makes a couple data changes, then based on a workflow constant determines where to export to. If I choose to have it write to an Alteryx database file it generates 14 output files, when I would expect 1 file. The bottom workflow just reads the data in and dumps it back out and generates 1 file. Obviously, I doing much more in the top workflow, but I'm not seeing what would create 14 files. I've added a stop and wait till last record is received and I have also set the max records per file to 20 million, but neither of these changes help.

2. When it goes through the process of checking the workflow constant to see where the data should be written to, it has 4 options, Oracle, Alteryx, Both, anything else. So only three words are used as switches, and anything else entered is ignored. What I'm seeing is when I choose Alteryx for the constant it still tries to write to oracle which generates an error. I would expected that if the data doesn't go down that path, it wouldn't get fired. But maybe I just don't know exactly how Alteryx would handle this.

Any suggestions on what is causing it to generate so many files or a better way of accomplishing this?

Thanks!

Treyson · ‎07-27-2021

I'll take a stab at this one.

1) I imagine what is happening is that when you use the datetimenow() formula, it's taking more than a second to run through this function and there are 14 different values for datetimenow() and then you get 14 different filenames. What I would suggest is putting a summarize tool after Formula (207) and see how many values you have for the 'File_Name_DateTime_Suffix" field. If there is more than one, and they each have a different value in seconds, that is your problem. The two solutions I would offer is if just the date is fine, use datetimetoday(), otherwise add the datetimenow tool (in/out toolset) and append that to the records before your formula and then use that datetime value.

2) I would love to see your workflow in action. Since I cannot replicate this with your workflow, I am unsure why the filter isn't working the way you would want it to. I am also not 100% sure how it's dedicated as Alteryx, EDH or Both. So maybe check that logic and something that should only go to Oracle is flagged as "both"?

Treyson Marks
Senior Analytics Engineer

apathetichell · ‎07-27-2021

Can you provide a sample of two different file names? ie I'm curious to see the time difference. Since datetimenow() should provide nothing smaller than second - I'm a bit surprised that you are getting that level of lag between executions. - I'd also like to see the settings of your [User.File_DateTime_Suffix_Format] field- in a pinch depending upon your level of detail perhaps you can omit the seconds parameter?

Winston · ‎07-28-2021

@Treyson

Bingo! That was the missing thought process I couldn't think of and then work around it.

So as suggested I added a summarize tool and confirmed the different time stamps were being created which corresponded to the files being generated.

So I tried moving the stop processing tool to before the formula to create the file name. This resulted in the same issue. Each time I ran it, I would get a different number of files generated. One run generated a file with just one record!

To work around this I changed some tools and the order of processing to what is below. Now it won't attempt to create the file name until after the data is completely read in. And it just creates the one file name and appends it to the whole dataset.

The issue with trying to write to the database even though I didn't want it to was an issue with the connection to the database. I forgot that Alteryx does an initial check of sources and outputs for validity. So the error was being thrown before the process really started. To simplify things for testing I changed it to write an Alteryx file instead of writing to a database. It now seems to work as intended with the workflow constant determining which path the data takes.

Thanks for the input and getting me thinking in the right direction.

Winston · ‎07-28-2021

@apathetichell

As you can see in the image below, the seconds were only off by 1.

My workflow constant list is below. You are right I could drop the seconds from the suffix, but the issue could still exist. With the example file names there is roughly a range of 10 seconds. So if I ran the workflow and the first time stamp was 162059 the next one would be 162100. This would bring the file count down to 2 in this particular scenario but if there was a bigger lag for some reason it could generate more time stamps.

Thanks for the suggestion.

Treyson · ‎07-29-2021

@Winston Glad this all worked out! I forget about the workflow constants a lot of the time, so I am glad you found a solution there.

Treyson Marks
Senior Analytics Engineer

Alteryx Designer Desktop Discussions

Data output generating to many files