This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Everything that I do in Alteryx starts from selecting data from SPSS, based on a list of fields.
So my SPSS file would contain "respondent, field1, field2, field3" and I would have a list containing "respondent, field1".
That way I know I can drop field2 and field3.
My conventional selection method (dynamic rename + dynamic select) with some sample data is attached .
Even though technically this works fine, it makes the workflow really slow since the original SPSS file can easily contain 50K fields and I think the entire file is loaded into memory?
At the moment, every time I add or change a tool, I have to wait for like half a minute.
I was wondering if I can query into my spss file without uploading the entire thing and then dropping fields.
I think the dynamic input is what I need, but is it correct that I have to build the "modify sql query" dynamically based on the input fields (like below) and then add this to a batch macro where I update the XML of the dynamic input tool? Could someone help me set this up?
And is it correct that this is more efficient than doing the full input + dynamic rename + dynamic select?
I'm not sure if there is an answer to this or not, to create this sort of functionality however I can offer advice how I might go about this.
I would place an output data tool after this initial 'ingest and selection' process has taken place and write the data to a .yxdb file, I would then place the ingest and selection piece in a tool container which I would disable, before then bringing the .yxdb file onto the canvas to perform the development work on.
Which you can then easily swap out for the true connection if the workflow needs to be scheduled or goes into a production state.
It's also worth noting that in Alteryx 2018.3 they introduced caching, where you can right click on a tool and hit 'catche' and it will cache the data to that point, essentially it does what I suggested above but it's all automated and without contains.
So basically you would perform the data selection outside of the normal process?
The only problem with this is that I actually need to perform this query a number of times.
I want to perform roughly 20 different analysis, and each analysis requires a certain data selection from my spss file.
My current flow looks like the picture below (the comment box is then a macro where I do my analysis).
Goal of this flow is basically to filter the right data for each analysis and then join all analysis back together.
Is it really the most efficient way to do all those 20 selections, drop the original file and then continue with the filtered datasets? It just feels a little sloppy to me (though it might very well be the best approach).
Well, in that case the first thing I would do is just write the spss file straight out to a yxdb and then see if this one change, which connects to your 20 selections gives you enough performance. I imagine that sod be way more efficient than reading in 20 different yxdb files as you point out.