This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
There are a handful of ways to search for a particular string within a data field. If you want to perform a query, identifying records with a particular string field within a data field:
Use the Filter tool: the result will be two streams - those records that meet your filter criteria and those that do not.
On the Functions Tab, expand the string tree and select FINDSTRING(String, Target)
Replace the `String` parameter with the field name
Replace the `Target` parameter with the value you are looking to identify
Finish the expression with !=-1 which will separate the true values from the false ones.
Example - If you are trying to identify all of the customers with Joe in a data field [Name] :
The records that meet this criteria will be output from the True anchor ([Name] contains the value "Joe"). All other records will be output from the False anchor.
This function can also be used in the Formula tool; if for example you want to populate a different data column based on the [Name] field, you can use the FINDSTRING in an IF statement. Example: You would like to classify your data in a new field based on the instance of customers with Joe in a data field [Name] :
IF (FINDSTRING([Name], "Joe") != -1) THEN "JOE Customer" ELSE "Other" ENDIF
This will populate a new data field with "JOE Customer" if the field "Name" contains "Joe" otherwise it will populate that field with the value "Other"
In the Formula tool, add a new field by selecting + Add Column, or choose an existing field to update.
Make sure the appropriate Field Type and Size is also specified
On the Functions Tab, expand the Conditional Tree and select IF c THEN t ELSE f ENDIF
On the Functions Tab, expand the String tree and select FINDSTRING(String, Target) to replace c
Replace `String` with the field name
Replace `Target` with the string you are looking to identify
Finish this part of the expression with !=-1 which will separate the true values from the false ones
Replace "t" with the desired value to populate the new field if the condition is met: "JOE Customer"
Replace "f" with the desired value to populate the new field if the condition is not met: "Other"
Far more than just a window to your data, the Browse Tool has a catalog of features to best view, investigate, and copy/save data at any checkpoint you place it. That introspection to your data anywhere in your blending gives valuable feedback that often speeds workflow development and makes it easier to learn tools by readily visualizing their transforms. Be equipped, and browse through the catalog of useful applications below!
Have you ever found yourself using the Smart Tile function within the Tile Tool, but don't completely understand how the tiles are calculated? While the function itself may seem a bit "black box", we actually have a pretty standard formula for calculating the tile values. Some clients have mentioned that it incorporates a z-score, but this is untrue. Our actual formula is (range * std dev) + mean .
Scientific notation , or E notation , is used to more simply represent values that are very large or very small. Rather than represent the vertical distance from the top of Mount Everest to the bottom of the Marianas Trench as 19795000 millimeters (why millimeters, you ask? Well, why not?), expressing this distance in scientific notation, 1.9795e+7 mm, provides a more accessible way to understand the magnitude and precision of that value. When databases and spreadsheets format data in scientific notation, that formatting may be carried over into Alteryx. For some users, data in scientific notation can be problematic, especially if the data type is read in Alteryx as a string. Some Alteryx users have posted their helpful ideas on dealing with converting data in scientific notation to the full numeric value, and the links to those discussion threads are provided below. This article summarizes and demonstrates their ideas.
Every so often we get questions about a .tde (Tableau Data Extract) file that is being output from Alteryx that has a file size of 30k when the original data is much larger. When the file is opened in Tableau this error sometimes comes up: An error occurred while communicating with data Source ‘yourfilename.tde’
You've gotten your long dataset and you want to combine it with another dataset for additional information. Your dataset is nice and clean. Everything is formatted the same, no null values... The whole package. You open up the data to join to and right away you see a ton of clean up that needs to happen: nulls to replace, strings to format appropriately, extra characters, white space, the list goes on. You launch the Designer, and while fast and accurate, you have to set up a new Multi-Field Formula Tool for each situation you need to fix. If only there was a single tool that did it all.
While the Join tool is easily one of the most used tools in Alteryx, it can also be one of the most misunderstood. This is even more likely true if a new user hasn’t previously used joins in any other data-manipulating platform or they are joining big tables where they might not be keeping track of the records inside the fields they are joining on.
The Multi-Field Formula Tool offers the same functionality as the Formula Tool, but offers the added benefit of applying a function across multiple fields of data all at once. Gone are the days of writing the same function for multiple fields.
Say there are four fields with dollar signs ($) that need to be removed. It could be done with a Formula Tool and a function written for each field:
A large component of data blending is applying mathematical or transformational processes to subsets of your data. Often, this requires isolating the data that complies with a certain criteria that you’ve set. The Conditional functions build expressions that allow you to apply processes to data that satisfy conditions you set.
Time series forecasting is using a model to predict future values based on previously observed values. In a time series forecast, the prediction is based on history and we are assuming the future will resemble the past. We project current trends using existing data.
The Multi-Row Formula Tool functions much like the normal Formula Tool but adds the ability to reference multiple rows of data within one expression . Say, for example, someone was on the ground floor of a house and had a Formula Tool. They would only be able to talk to the people also on the ground floor. If they had a Multi-Row Formula Tool, though, they would also be able to talk to the people upstairs, in the attic, and in the basement as well.
The Neural Network Tool in Alteryx implements functions from the nnet package in R to generate a type of neural networks called multilayer perceptrons. By definition, neural network models generated by this tool are feed-forward (meaning data only flows in one direction through the network) and include a single Hidden Layer. In this Tool Mastery, we will review the configuration of the tool, as well as what is included in the Object and Report outputs.
Sometimes you look at the steaming pile of data before you and wonder how you’ll ever get it in the form you need. Every option seems to require a great deal of manual labor, and as a lazy– er that is , as a data blending professional , that is simply something you will not abide.
The Sample Tool allows you selectively pass patterns, block excerpts, or samples of your records (or groups of records) in your dataset: the first N, last N, skipping the first N, 1 of every N, random 1 in N chance for each record to pass, and first N%. Using these options can come in the clutch pretty often in data preparation – that’s why you’ll find it in our Favorites Category, and for good reason. While a great tool to sample your data sets, you can also use it for:
Upon creating a BINGO game, I came across a technique that I thought could be useful in "real world" scenarios for users who are attempting to iterate a process and then replenishing the data after a certain amount of time.