This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
This article is part of the CS Macro Development Series. The goal of this series is to communicate tips, tricks, and the thought process that goes into developing good, dynamic macros. In this part, we demonstrate how to read in multiple files with different schemas using a Batch Macro.
Sometimes, especially when interacting with tools/files outside Alteryx, you need to explicitly control the order of events. The classic example is a process whereby you 1) INSERT to a SQL table where an auto-incrementing ID is applied 2) read the records back in order to get the auto-incremented id 3) insert the auto-incremented ID into a different table.
In Alteryx worflow streams that terminate (as in an Output or Render), you can't reliably use Block Until Done to control the stream. One way to reliably control the order of events is with batch macros.
Batch macros guarantee that 1) every iteration is complete before the next iteration begins, and 2) that all iterations are complete before the macro reports itself complete. So if you place a process inside a batch macro and create an output stream from that macro, all processing will be done and the macro still can exist in the middle a workflow stream. You don't even need a Block Until Done as part of the process.
Please see the attached example (authored in 10.0).
Some users have reported a problem when importing and exporting macros within the Alteryx Designer when the Regional Settings for the machine are not set to English (United Kingdom) or English (United States) (see screenshots below). You may also be unable to see personalized Macro folders within the designer. They encounter the following error: "Input string was not in a correct format.”
The workaround for this is to switch the Regional Settings to the regions named above and you will then be able to import, export macros and see personalized macro folders within the designer categories.
The screenshots below demonstrate what the users will see when they try to import and export workflows with other Regional Settings than English (United Kingdom) or English (United States).
To find Region and Language settings in Windows 7, please navigate to the Control Panel>>>Clock, Language and Region>>Region and Language. For more information on Localization please see this link: http://community.alteryx.com/t5/Analytics-Blog/Alteryx-Commences-Localization-Endeavors/ba-p/11765.
Client Service Representative.
Azure ML Text Analytics Macro
This API has now been depreceated by Microsoft and has been replaced by the Microsoft Cognitive Text Analytics Macro API.
The collaboration between Alteryx and Microsoft is growing and the ability to take advantage of Azure Machine Learning is becoming increasing popular! The Azure ML Text Analytics macro makes this process easy and seemless without having to battle lines of code to perform Sentiment Analysis and Key Phrase extraction.
To use this macro you will need to create a Microsoft Account, as well as signing up to the Microsoft Azure Marketplace. Within the Azure Marketplace you can choose from the various different subscriptions depending upon the amount of text analytics you want to run. Once you have chosen your subscription if you go to 'My Account' it will give your 'Primary Account Key' which is what you will need to insert into the Azure ML text analytics macro to make it run.
Once you have inserted the Account Key you can select the field you wish to run the Text Analyisis on and then choose the type of Analysis.
To find out more about the types of Analysis and the Azure Machine learning please sign up to Cortana Analytics and this will give you additional information on the machine learning API used by the Azure ML Text analytics macro.
Again if you have any issues with the tool please do not hesitate to reach out to Support
Client Services Support Engineer
The partnership between Alteryx and Tableau is becoming stronger and stronger, and the seamless effortless integration has been made easier through the Publish to Tableau Server Tool. This article demonstrates the use of the Publish to Tableau Server tool, available on the Alteryx Analytics Gallery.
Did you know that instead of having to choose a save location in the Output Tool, you could leverage the directory that is used to save your temporary files?
This can be handy if you are running chained apps locally or have macros that have a file output process. To use temporary space instead of actually writing out a file to a specified location. All you have to do is add the following onto your file name: %temp%..\Output.yxdb. Once you run your workflow you can navigate to your temporary directory (which can be found in System Settings->Engine->General->Temporary Directory) to view your saved file.
The same path you used in the output tool can be used in an input tool to read in the file that is saved to the temporary directory.
The key component of any batch macro , the Control Parameter Tool is the gear that keeps things moving. Using the input , the control parameter accepts a field of values that will be used within the batch macro to reconfigure and rerun the macro for each of the standard input records - unless using the GroupBy feature that matches certain control parameters to buckets of records to be batched through the macro together. Adding this interface tool to any macro will upgrade it to a batch macro and will give you the ability to loop through macro configurations for added customizability. While one of the more sophisticated solutions you can build into your workflows, there are few problems you can’t solve with a batch macro:
In this article we’ll take a look at how to leverage a batch macro to read in multiple sheets from multiple Excel files using Designer version 10.0. The two Excel files I’ll be using in the example are called Fruits and Vegetables with sheets named Apples, Oranges, Broccoli and Spinach, respectively.
The first step will be to read in the list of sheet names from one of the Excel files using an Input tool. This feature is new to 10.0 and is a great addition when used in conjunction with the Dynamic Input tool.
We can then connect our Input tool from the previous step to a Dynamic Input tool. In our configuration, use the same Excel file as our Input Data Source Template. We can use the Sheet Names field from our Input tool as our List of Data Sources and our action is going to be to Change File/Table Name.
Now we’re ready to add our Interface tools to the canvas. Bring in a Control Parameter tool, 2 Action tools and a Macro Output. Connect the Control Parameter to the 2 Action tools, and then connect on to the Input and Dynamic Input. The Macro output will be connected to the output of the Dynamic Input.
In our Action tool configuration our action type will be Update Value. We want to update the File – value that is being passed on to the Input and Dynamic Input. We will, however, only need to update the portion of the string without the sheet name, which is why we’ll also select Replace a specific string: This will allow us to batch our process with multiple files.
Lastly, if our Excel files have different schemas, in the interface designer we can set the macro to Auto Configure by Name or Position so that our workflow does not error out. Note: sheets within the same file will have to be the same schema.
We’re now ready to save the macro and put it to use!
Now, on a new canvas, we can place a Directory tool (If all your Excel files are in one directory), or, we can use a Text Input to manually enter each individual file path location. After the Directory or Text Input tool, insert your macro (right click on the canvas->insert->macro), then choose the full file path for your question in the macro configuration. Add a browse, hit run and look at the results!
We now are able to read in multiple Excel sheets from multiple Excel files!
The attached workflow package includes a workflow, batch macro and excel files created in Alteryx v.10.0
Client Service Representative
One of the best things about Alteryx is the ability to read in multiple files very easily and automatically combine them into a single dataset. This becomes a bit trickier when dealing with files that have different schemas or Excel files with multiple tabs. Adding both multiple excel files with multiple tabs, and having the schema change within each tab takes it to another level.
When working with SPSS, values can have both a Text label and a numeric representation of the categories (equivalent of string factors in R). Columns can also have an encoded name ex. Q_1 and a longer descriptive name that maps Q_1 to the original question that was asked (closest thing in R is the data frame attribute on the column).
Alteryx reads .sav files and loads either the numeric representation or the textual representation of values based on the user’s selection. It also reads the variable labels into the Alteryx Field Description. When writing .sav output, Alteryx will write either the text or the numeric values (depending on what was used in the workflow) as well as the SPSS variable labels which were displayed in the description field. However sometimes to maintain the integrity of the whole SPSS file, clients will want the value labels, value levels, and variable labels to appear in the output file. For these cases, using the Alteryx tools and a few lines of R code (while leveraging the thousands of R packages on CRAN) wrapped in a macro gives us the needed functionality. Attached is a macro which will write the data, variable & value labels back into SPSS.
In this section, we will explain the R code logic that is specific to this macro. You can get an introduction to writing custom R code in Alteryx here.
Before we can do anything, we will need to pass the data to the tools inside the macro (more details on macros here). The raw numeric data should be connected to the D input of the macro. In addition, and because the data frames created in R don’t contain the Field Description data, we need to pass Field Description values to the M input (M for Metadata) of the macro. We’re using the Field Info Tool to extract the description into the rows and send it to the macro. With that done we can now look inside the macro.
Inside the Macro
Inside the macro, we are using the R Tool to contain the main functionality. We connect it to the Interface tools (Macro Inputs, File Browse, Action Tool) to get the data, metadata, and output file path from the user. Finally, we’re using the API tool to pass a message back to the user in the Alteryx Execution Messages.
The R Tool contains the code to properly format the input data and write it out to the .sav file. The majority of the work is already done for us in the ‘sjmisc' package from CRAN (R users know that most of the time they can find a package that does what they want). This package, among other features, implements reading and writing .sav files with both variable and value labels. We will first check if the package is not already installed and we’ll attempt to install it.
This workflow installs the additional sjmisc package. To avoid package version and dependency issues, it is possible to use Microsoft R Client as the base R. More details here.
If you would like to install the package separately you can use the R install packages App from the Alteryx Gallery.
filePath <- "c:\\temp\\outputRAlteryx.sav"
inputData <- read.Alteryx("#1", mode="data.frame")
ColumnLabels <- as.vector(read.Alteryx("#2", mode="data.frame")$Description)
Within the R code we also define a static ‘filepath ‘ pointing to where the output data should be written. The Action Tool that we previously mentioned will update this filepath to the one chosen by the user while at the same time correctly escaping the backslashes to work in R.
inputData <- read.Alteryx("#1", mode="data.frame")
ColumnLabels <- as.vector(read.Alteryx("#2", mode="data.frame")$Description)
We then read the data that we pass to the macro from input ‘#1’ into an R data frame. In this case we are depending on R’s default behavior of transforming text to factors to get the Value encodings for all columns ex Male(1), Female(2). In addition, we read the metadata from input ‘#2’ into R. The sjmisc function, set_label, that applies the variable names to the data frame expects the variable names to be passed as an object of type vector. To make it work, we have to convert the Description column of the data frame we’re reading in into a vector with the as.vector() base R function. For more details about ‘sjmisc’, you can find the documentation here.
labeledData <- sjmisc::set_label(inputData,ColumnLabels)
Finally we label inputData with the labels we just created and we store the result in the labeledData dataframe and then write it to the user’s filepath using the sjmisc’s write_spss function.
MessageOut <- paste("file written to: ",filePath)
names(MessageOut) <- "Output File Path"
We also pass the filepath as a message to the R Tool output to be displayed to the user.
Edit: It was brought to our attention that the macro has an issue writing out text columns that are longer than 120 characters. Unfortunately this is a defect in the underlying R package. As a workaround for now, the macro was modified to trim all text fields to 120 characters. Please keep this in mind when writing out data.
Mandatory Note: This macro and sample code were developed by the authors as a proof of concept to show what's possible. This is not a production-ready macro and is not supported by Alteryx. Do ask questions on this thread - BUT use at your own risk!
WriteSPSSWithLabels_sjlabelled.yxzp has been updated from using the R package sjmisc because the set_label command has been deprecated from sjmisc and is now in sjlabelled.
Amazon offers many different services that can be accessed from the AWS Command Line Interface (aka the AWS CLI). Alteryx workflows can take advantage of these services using the AWS CLI from the Run Command tool.
Some of the common uses of the Amazon CLI within Alteryx include:
Using S3 with IAM roles
Uploading/Downloading file types not currently supported by the S3 Upload / Download tools
Interacting with Athena
As an example of using the AWS CLI, let’s build a process that can take a local file and copy to S3, using an AWS IAM profile. Here are the steps involved:
Review the documentation for the service. In this example, we're using the "S3" service, but Amazon has many different AWS services.
Install the AWS CLI on any machines that will need to execute the AWS functions
From Windows Command Line, create a working command
(Optional step to make calling the CLI easier in the future) – Add reference to the AWS CLI executable to Window Environment variables via My Computer > Right Click > Properties > Advanced System Settings > Environment Variables > System Variables > Select “Path” > Edit > (Add the path to the directory of your installed “aws.exe” file, separated by a semicolon. Do not delete/modify any pre-existing values in the Path variable.) > OK > OK
Per the AWS documentation, create a local profile using your IAM role. The local profile references your credentials so that you don't have to pass the credentials in every command.
Test the following command in your Windows command prompt, making sure to use a valid local file to copy, and a valid destination location on your S3, and a valid profile name:
s3 cp "C:\temp\mylocalfile.csv" "s3://mylocation/mydestinationfilename.csv" --profile MyIAMUser
Once you've validated your command in the command prompt, transfer it to an Alteryx workflow
In a new workflow, pull in a “Run Command” tool from the “Developer” category
Configure the Run Command tool using the working command:
Test the above
As an optional step, we can wrap this functionality into a macro to make it easy to update, and to support batch uploads
Add a Control Parameter for “Local File Path” and configure its action to update the specific string value “C:\temp\myfile.csv”
Add a Control Parameter for “Destination File Path” and configure its action to update the specific string value “s3://mybucket/myfiletest1.csv”
Add a Text Box and configure its action to update the command value “aws” in case the user hasn’t placed the CLI exe in their PATH variables
The example above is attached in the 11.3 workflow, SaveToS3_usage.yxzp.
As currently designed, the Amazon S3 Download tool only allows one file, or object, to be read in at a time. This article explains how to create a workflow and batch macro that will read in the list of objects in a bucket and allow you to filter for the file(s) you want using wildcards!
You probably already know that you can output results to multiple sheets of an Excel file. If not, you should check out our resource that explains how to do that very thing. But what if you run that workflow every day, and you want to keep the outputs from days past?
Sometimes you may need records removed from a database table before adding in new records. This can be done using the Pre SQL Statement in the Input tool, which will run an SQL query before, for instance, appending new records to the table. However, what if you do not want to run the same delete statement every time? For example, let's say that you need to delete all records that match a particular date for every run?
This article covers how to delete records from a database using an Input Text tool containing a parameter and a macro using the Post SQL Statement. The macro below is an example of how you can do this.
Create a macro that reads in a date from a Text Input tool from a workflow.
Use the first Action tool to update part of the Post SQL Statement with the date being fed into the macro through the Control Parameter from the workflow. We are updating the ‘x’ as shown below, which is being used as a place holder in the Post SQL Statement, with the date being fed into the macro.
In the Post SQL Statement of the Input tool within the macro, we created the Delete statement with a temporary place holder for the date being read in (‘x’ as shown below). The incoming date will replace the ‘x’ as a result of the Action tool in the previous step.
Use the second Action tool to also update the filter in a smiliar fashion as the input tool:
The filter tool is needed because we are using a post-sql statement for the delete query which means that the tool will first read in the data and then run the delete, so the records to be deleted will still be read in. The filter is added to stop the deleted records from being part of the macro output.
Important: The reason this is set up that way is because if the delete statement were part of the pre-SQL query, it would be executed as soon as the workflow has been opened which may lead to unintentional deletion of records. The post-SQL statement won't execute until after the table has been read in.
You may also add a macro output for the deleted records to confirm that the correct records were deleted. This will also give you the opportunity to put records back if you didn't mean to delete them as they can be exported from the Alteryx browse window:
Create a workflow that feeds a date from a Text Input tool into the macro.
When the workflow is run, whatever date you have entered in the Text Input tool in the workflow will replace the ‘x’ in the macro, deleting the record(s) that match that date.
Important: If you change the SQL statement, make sure to make any necessary related changes to the Action tools as well.
How can you calculate the standard deviation of the population (SDpop) rather than standard deviation on the sample data (SDsample)?
Please find the attached macro, special thanks to one of our Community users!
Standard deviation of the population (SDpop) and standard deviation on the sample (SDsample) are very similar - the only difference is that in SDpop the sum of the squared variances is divided by n instead of n-1 as it is for SDsample (where n is the number of data points in the vector). This macro takes a vector of doubles which needs to be named 'dataVals' and it returns the standard deviation of the population. Accurate out to the 9th or 10th place.
Filtering data is probably one of the simplest Alteryx functions, but it can become time consuming when building the expression, especially when filtering on a larger number of values. Wouldn't it be nice to be able to feed a list of values to Filter your data? I'll show you how you can do this with a simple batch macro. Batch Macro: Setup the Filter tool within your Batch macro with a simple "In" statement, then use the Control Parameter to update the Value within the expression. Connect the Control Parameter to the Filter tool, configure to update the Value within the expression as shown below, this will update that portion of the expression with the column the user defines in the macro. Building the Expression: In a new workflow, Input the list of values you would like to use to Filter your data. Connect your data to a Summarize tool, configure the Summarize tool to Concatenate your values into a single column using "," as the Separator and " as the Start and End. Drag out a Formula tool in front of the Summarize tool, then lets add the parenthesis as show below to complete the expression. Filter: Add your batch macro into the workflow by right clicking on the canvas and selecting Macro, you will need to browse to the location where you saved. Then connect the Formula tool to the "¿" of the Batch Macro. Input the data file you would like to filter and connect to the remaining Input of the Batch Macro. Finally, click on the Control Parameter tab and select the column to be used as the Control Parameter. We are not using the Group By function for this particular example, therefore there is no need to configure. (Workflow and Macro developed in Alteryx 10.1)