This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
While In-db tools are very helpful and cut down the time needed to write complex SQL , there are some steps that are faster by directly writing SQL like window functions- OVER (PARTITION BY .....). In Alteryx, we need to create multiple joins and summaries to perform a window function. It would be immensely helpful if there was a SQL editor tool for in-db workflows where we can edit the SQL code at any point in the workflow, or even better, if they can add an "edit" function to every in-db tool where we can customize the SQL code generated and then send to the next tool.
This will cut down the time immensely and streamline the workflow to make Alteryx a true contender for the ETL solution space.
I work with data where milliseconds is my saviour when I count distinct the datetime to get number of events. Alteryx ignores the millisecond part (as lots of other BI tool providers - I don't know what is going on with this idea that milliseconds are not needed). Yes I can convert it to string but it's not the best practice to create duplicate fields just so that I have date part for date-related calculation (plotting, time difference) and on the other hand string value for quick and easy counting..
This feature isn't a must - but would definitely be a nice to have.
Similar to the excel having a tab with key figures like average, count and sum
It would be a really good idea to do something similar within Alteryx just to have a quick glance on key figures/functions (example attached - apologise for the bad paint job but definitely would look good with Alteryx colour scheme)
It would be great if Alteryx developed an option to keep data transformations and additions already ran through the module. After adding new tools to the module, then the module would keep all of the data already transformed or added up to that point and would only spend time running the data through any new tools added after that point.
It would save the analyst a lot of time when developing big and complex modules.
It would be nice if we could change the tool numbers and effect which one gets done first. If I'm working on a module and I later add some more analysis upstream it won't get done or pulled first since it has a greater number. a simply ability to change the numbers would really make things a lot easier and it will allow the flow of work to flow smoothly visually
Can you add the flexibility to access fields based on its position or index like a, a,... a[n]. a being the first column. Also an option to get max[a] can give the last column and min[a] give the first column. In this way, we can easily subset the dataset. Most cases, we are handling survey data which has 1000s of columns and when we need to select certain columns, we have to manually select the column checkbox and its painful to select 100s of columns. It will be nice if there is an option to select based on ID or index. It will also be useful while doing multi-field formula with more number of fields, because currently there is no option to write formulaes based on field Name column in it.
Tile tool or Multi-field Binning tool for completing same task as Tile tool on multiple fields, splits the variables by 5 methods;
Equal Records or Intervals or Sums
Unfortunately "equal something" binnings are bad idea, as the values are categorized "blindly" irrespective of the effects on the predictive power of the models.
What to do:
What's needed is to bin both numerical and categorical variables optimally such that the Weights of Evidences (WoE) should present a monotone increasing or decreasing pattern. Maybe at most a V or U shaped "convex" structure.
Without constraining ourselves with monotonicity or convex cases, the easiest practice would be running a C4.5 or CHAID tree algorithm (produces multiple splits rather than binary splits in CART) for a single variable and select the target as the dependent variable and all the resulting nodes will be the bins we are looking for. Doing this for multiple variables at once is the key to the tool to be generated.
This capability is sought by risk management departments building robust, stable Basel compliant models in financial industry, especially by banks.
Currently if I receive a file where fields are empty rather than Null(), the summarize won't count them with the CountNull option. Fairly easy to put a formula right before when there is just a couple of fields and change with an IF IsEmpty() but with a file with a large number of fields and large file size, a multiple field tool eats up a lot of time converting. Not sure if it would just be trading processing speed in the summarize to do basically the same thing.
Can you add some additional options to the running sum tool? I know this can be done in the Multi-Row formula, but it just takes too long to program and Alteryx is designed for speed and I know it can be done. Can you add options to average, standard deviation, etc. also with running sum? And then add a rolling time window on it. That would be great!
Hi , Today i stuck in one position where my current module gaves an error because it doesnt found the Fields name. I define the field name in Formula tool for validation and harmonization. So as my Fields changes formula is also changes. But i donot want to make any changes in my Module.
So what i am thinking it will be better that we can define a formula in any file format like (.xlsx or .csv) and take the Input in formula tool. So we do not have to change the module again and again. We just need to change the mapping file against the lates file coming. So we can check the file and define the formula in mapping file.
I have a process that joins 3 data sets to identify a specific group of data and apply certain ruling. From this created file, I need to extract the data (not the headings) from specific columns and insert into an already existing template. The template has formatting that needs to remain in order for it to function.
I was just thinking... they might not need to fully build out a python ide, but could still reach the same objective.
You should be able to keep a python file on its own and call it in r. By doing this, you might be able to have the json/xml handling of python with the visual/stats power of R while it being nicely bundled in your workflow. This uses base functions in r and does a good job turning a pandas dataset to an r dataframe you can move along your workflow.
You could always just use this same idea to write a file somewhere and once it's written, your workflow will continue. If you do, the code is literally 1 line in r... Anyway, let me know your thoughts!
The community could benefit from easier integration of splitting and applying functions to grouped data. The summarize tool is great for splitting your data and applying summary statistical functions. It would be super useful to take that block just one step further, and allow users to apply any other (aggregate) function to their grouped data instead of just the built-in functions in the summarize tool. I would envision that aggregate function either being a custom function that is a combination of existing user-specified functions within Alteryx (e.g. in the formula tool) and/or even an interface that allows you to use other Alteryx macros on the grouped data.
Apply user-defined functions, or other powerful Alteryx macros to grouped and data is a very common operation in the data analyst's daily workflows and being able to apply them without reverting to batch/iterative macros in a seamless manner would be naturally helpful.
I think that the sample tool should have a T or F port.
Lets say I want to keep first N records but would like to stream the rest of the data (the not sampled one) somewhere else in my workflow, its possible but it would be easier to have that in the sampler.