This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
This article is part of the CS Macro Development Series. The goal of this series is to communicate tips, tricks, and the thought process that goes into developing good, dynamic macros. In this part, we demonstrate how to read in multiple files with different schemas using a Batch Macro.
How to dynamically run the most recent file in a file folder
Sometimes you may have daily, weekly, monthly or yearly data dumps where you want to only run the most recent file. Within Alteryx you can make this process dynamic and seamless through the use of a few tools.
Step 1: Directory Tool
The Directory Tool will allow you to browse to a folder and return all the metadata related to the files which exist within that folder. The field of interest in the metadata is the 'Creation Time'.
Step 2: Sort Tool
Using the field called ‘Creation Time’ we can use the Sort Tool to sort the date and time values into descending order to get the most recent file at the top of the dataset.
Step 3: Sample Tool
After sorting the 'Creation Time' field I now have my most recent file in record 1. Yet, I still have rows of data for the other files within that folder that I need to remove. I can now use the Sample Tool to take the ‘First 1 Record’ and this will result in the latest file information being left.
Step 4: Dynamic Input
Currently the field I have in my dataset only shows metadata avaliable for that file such as Full path, Creation Time etc. I now need to read this file and pull in the data by using the Dynamic Input Tool. In the ‘edit’ section select a placeholder file.
Then in the Read a list of data sources ‘Field’ dropdown this will need to be the ‘Full Path’ field coming from the directory Tool. In the Action dropdown this will need to be set to ‘Change Entire File Path’.
Step 5: Run the workflow
You can now run the workflow and it will dynamically always pick the latest file from that folder and read the data into Alteryx.
Please find an example module attached to this article (Built in Alteryx Designer 10.5)
Can I read in an Excel file located in a zipped archive file from Amazon S3?
Unfortunately, this is not an option within the Amazon S3 Download Tool, as it only allows you to choose between CSV, DBF and YXDB files. However, this is possible within Alteryx with the use of a simple workflow utilizing a three line batch file, the Run Command Tool (master it here), and the AWS Command Line Interface (CLI).
In order to use the CLI, you must first download it and configure its settings. Please visit this page for information on how to do that. Once that is setup, you simply need to setup the batch file and configure the Run Command Tool.
In the first step, you will use a Text Input Tool to write the batch file code. This code will use the CLI to copy the ZIP file from the S3 bucket to a locally accessible drive. Configure the Text Input Tool as follows:
Make sure that line 2 points to where your CLI is installed.
In line 3, replace "alteryxtest" with the name of your bucket, "ExcelTest.zip" with the name of your ZIP file and enter in the correct location to copy the file to.
In the second step, you will use the Run Command Tool to do the following:
Write out the batch file ("Write Source")
Run the batch file created in the previous step ("Run External Program")
Read the file into the workflow ("Read Results")
When entering in the "Read Results" section, your ZIP file will not exist at this point so you cannot simply navigate to and select the file. So, you have two options:
Click on the "Input" button and enter in the full path of where you are copying the ZIP file (found on line 3 of the Text Input tool) along with the file name, a pipe character, and then in brackets, the sheet name. For Example:
Run the workflow once without the "Read Results" section completed in order to copy the ZIP file from the S3 bucket. Then, click on the "Read Results" button and navigate to the ZIP file and choose the Archive file to read it.
This same workflow can be used to read other archived files as well. However, you will have to make slight adjustments to the "Read Results" section of the Run Command tool. For example, if reading in a CSV file, you would simply include the archived file name. Since a CSV file does not have "sheets", the bracketed sheet name is not needed.
I plan to create a simple macro with a user interface that will do the same thing. Once complete, I will post it in the reply section.
Thanks for reading!
Here in Customer Support, we like to make sure our workflows run as smoothly as possible. Sometimes you may ask yourself “Why can’t my excel file just open right after I output it?”
Where there is a will, there’s a way.
Ladies and Gentleman I introduce to you the beautiful functionality of the Run Command Tool.
With this trick, you never have to go into the directory to manually open the saved file again.
Note: There are multiple ways of doing this with the run command. This is just a simple example I created in version 10.1.
For this trick you’ll only need 5 additional tools.
Before your output tool, you’ll need a Block Until Done Tool. This is for quality control as Excel doesn’t like to open when you are writing in to it at the same time. This prevents Alteryx from writing and opening at the same time.
Next, you’ll need a Formula Tool, which will just add another field with your command. When making this dynamic, you’ll want to make sure you update both the path of your output and formula tool.
Whether you do that in a macro or app, it’s entirely up to you. After the formula tool, you’ll want a sample tool so that way we only select tool to deselect all the fields except for your command. Now just make sure your run command is set up properly. In this example it’s set to write out the command as a bat file and be read back in as a command.
To learn more about how to configure the run command, I would read though this helpful article.
After you have this set, watch as your workflow has made your life even more automatic.
"I have a problem reading an Excel database!" This seems to be a quite common scenario that we in Customer Support hear. Our CTO, Ned, has always stuck to the same response: "Excel isn't a database." While this is true, many of our customers use it as their primary means of storing data, so it is imperative that we can work through any issues our clients have. First and foremost, Alteryx uses Microsoft drivers in order to read Microsoft products, such as Excel, Access, and MS SQL. Because of this, we are subject to any bugs that may come with those drivers. For example, if you have Microsoft Office, most likely it is a 32 bit version despite your operating system (Microsoft even recommends this). Alteryx has a 64 bit version for 64 bit Windows, but since the drivers are different some people report errors when trying to read in .xlsx, such as decimal rounding or the ability to view the files at all. The good news is that there are a few solutions to remedy this situation. The first is to make sure you have the latest and greatest Microsoft drivers on your local machine. If you do not see a .xlsx file in your Input Tool file type dropdown, as seen below, first try reinstalling the Microsoft Access Database Engine (link goes to 2010 install. If you need 2013, click here). This should remedy most issues. In the event that it does not, one more driver install will secure your .xlsx files are read and written properly. 2007 Office System Driver: Data Connectivity Components will bypass any of these additional issues you may have with Microsoft Office files. If this does not work, Ned has created an excellent blog post about creating a custom macro (with downloads!) to read .xlsx as xml. Check it out, try the macro, and let us know what you think! Until next time, Chad Follow me on Twitter! @AlteryxChad
Ok, I'll be the first to admit that this has a very specific title, but the use cases can be quite advantageous. Recently, a user posted on the Forum that the ampersand character (&) was causing an error when importing a series of KML files. The fix? Have a prior Alteryx process replace this character with the word 'and' within all KML files without ruining the rest of the file. To accomplish:
Pull in an Input Tool and set the file type to .csv with no delimiter (designation is \0).
In the Input Data Source portion, use the wildcard (*) to pull in all of your text files (using KML in the example).
For the option Output File Name as Field, select File Name Only.
Use a Formula Tool to update the text within the files.
Output each file individually with one Output Tool, making sure to set the output to dynamically update based on your file name field.
The process is fairly straightforward and should help resolve any text or character issues that may be an issue. For an example, please see the attached module. Until next time! Chad Follow me on Twitter! @AlteryxChad
Tool Purpose: The Directory Tool returns all the attributes of files in a specified directory. Along with file names, other pertinent information about each file is returned, including size, creation date, last modified date, and much more. When to use: You would like to locate a file or file type in a specified directory or sub directory. Configuration Properties:
Specify the directory to return the file contents of. Either type in the file location path or use the browse icon to navigate to the intended directory.
Specify the type of files to return.
*.* will return all file types in the specified directory.
*.csv will return all csv files in the specified directory.
temp*.* will return all files that begin with temp in the file name: (eg. temp1.txt and temp_file.yxdb)
Include Sub directories check box. When checked, it will search for the specified file types in directories that lie within the specified directory. When this box is left unchecked, only files at the root level of the specified directory will be returned.
Information provided by the Directory tool:
Suppose that your spreadsheet has multiple sheets with the same structure and you would like to read several sheets into your module at once. In this case, the preferred alternative is to use the Dynamic Input tool.
Some users have reported a problem when importing and exporting macros within the Alteryx Designer when the Regional Settings for the machine are not set to English (United Kingdom) or English (United States) (see screenshots below). You may also be unable to see personalized Macro folders within the designer. They encounter the following error: "Input string was not in a correct format.”
The workaround for this is to switch the Regional Settings to the regions named above and you will then be able to import, export macros and see personalized macro folders within the designer categories.
The screenshots below demonstrate what the users will see when they try to import and export workflows with other Regional Settings than English (United Kingdom) or English (United States).
To find Region and Language settings in Windows 7, please navigate to the Control Panel>>>Clock, Language and Region>>Region and Language. For more information on Localization please see this link: http://community.alteryx.com/t5/Analytics-Blog/Alteryx-Commences-Localization-Endeavors/ba-p/11765.
Client Service Representative.
As users of Alteryx become increasingly international, it is important to support the need to prepare, blend and analyze data in a large number of languages. However, many users with data in non-English languages encounter roadblocks at the first step of their data analysis: inputting the data into Alteryx.
This article will demonstrate two ways of bringing double-byte characters (DBCs) into Alteryx. These particular types of characters are associated with languages that have many unique characters or symbols, such as Chinese, Japanese and Korean (CJK). These languages’ graphic characters are stored in two bytes of data rather than just one, which is sufficient for languages like English, French and Spanish (among many others) that can be represented by 256 characters or less. Chinese, Japanese and Korean languages require a fixed width sequence of two bytes for every character, which allows for about 65,000 characters. This need to support such an extensive dataset requires an Alteryx user whose data may be represented by CJK characters to consider certain formatting and preparation steps both before and after inputting data into Alteryx.
Before inputting data into Alteryx, data with CJK characters requires certain encoding and file type considerations. Encoding is the process of converting data from one format into another specific equivalent code that uses letters, symbols and numbers for storage and processing. When using CJK characters, encoding your data in a format that supports double-byte characters is important to accommodate the range of characters in your dataset. Storing the data as Unicode ensures that you do not lose data as you begin your inputting process into Alteryx (Figure 1).
Figure 1: To encode your data from Excel, click “Tools> Web Options” in the bottom right-hand corner of the Save As Window. Then, select “Encoding” from the file tabs. In the Dropdown menu to Save the Document, select “Unicode”, then click “OK”. Make sure that your file name does contain any non-English characters, as that can cause issues the Input tool.
The second pre-input consideration for CJK data is the file extension, or type, with respect to how the data is saved. For example, saving CJK data directly to a Comma Separated Values (CSV) format, especially without encoding, is discouraged. This is because CSV file formats distort characters other than those supported by the American Standard Code for Information Exchange (ASCII) text. In the case of CJK characters, that is likely to be all of your data (Figure 2)! Rather than suffer through the heart-wrenching experience of seeing all your data turned into meaningless question marks, try saving your data as a Unicode text file (.txt) or Excel workbook (.xlsx), both of which are supported by the Alteryx Input Tool (Figure 3). I have found that using CJK characters in these file formats ease the process of inputting data into Alteryx and even reduce some of the steps you may need to perform in your post-input process.
Figure 2: Rats! CJK characters that are saved directly to CSV format without encoding turn into meaningless question marks. WHHHYYYYYY?????
Figure 3: Rather than lose your precious data, save as Unicode Text or an Excel Workbook (encoded as Unicode as a safeguard).
Post- Input Considerations
Because CJK characters are double-byte in size, it is important that their field type is set appropriately to display the data completely. Generally, importing data stored as Unicode text or a Unicode-encoded Excel workbook will be read in as a variable length wide string format (V_WString) to accommodate these wider types of characters (Figure 4). As you can see, forcing the field type to a narrow string format (V_String) leads to data loss. Throughout the data preparation and blending process in Alteryx, you should be sure that fields containing CJK characters have the necessary space requirements to store and transmit the data. The sudden conversion of CJK characters to question marks may indicate that your field type has changed; should this occur, field types can be easily changed using a Select Tool.
Figure 4: The larger size requirements of CJK characters creates the need for using wide string fields (V_WString).
Datasets that contain a mix of Western Latin character sets and CJK characters may require the conversion of text between Code Pages. This may be especially necessary if data has been transferred among many users who have different computer language settings or encoding systems. Figure 5 shows data that has been transferred from a Chinese colleague to an American Alteryx user. Despite bringing the data in as Unicode Text, the data is still unreadable. To convert the data to a useable and meaningful format, Alteryx tools with an Edit Formula Box (or Expression builder), such as the Formula or Multi-Field Formula Tools, contain ConvertFromCodePage and ConvertToCodePage functions. These functions facilitate the conversions between language codes and Unicode. In the below example, the Multi-Field Formula tool is used to overcome coding issues to convert data from the original Chinese code to Unicode using the ConvertFromCodePage function. As a result, the data is displayed in a readable and useable way.
Figure 5: Converting the data from Code Page 20936 (Simplified Chinese GB2312) to Unicode using the ConvertFromCodePage function renders the data readable and useable…Hooray!
And voila! Errr… 这里是! Now your CJK characters are brought into Alteryx beautifully!
Have you had experience with inputting non-English language data with Alteryx? Post any helpful tips or tricks that you’ve used to the Comments section to share with the Alteryx Community.
**Thanks not only to the Alteryx users whose data prep needs have inspired this article, but also to Alteryx’s RodL for his Community post that has provided me with the steps to even know where to begin when troubleshooting data in a variety of languages including Chinese, Japanese, Korean, Thai and Vietnamese.
Alteryx allows you to input a selected range of values from your Excel spreadsheet. This can be handy when working with large Excel files where you only need a small subset of data, thus providing you an opportunity to optimize the processing time of your module.
Multiple files can easily be combined into a single Excel file containing multiple worksheets (or tabs). The format of the files do not need to be the same; they can be completely different.
In the example attached to this article, two files, a Customer .csv file and a Store .xlsx file, are output to a single Excel spreadsheet (.xlsx format) with one tab for Customer and another for Store. These files contain different data elements and have different layouts.
The first step is to create a new field in each dataset (e.g.: 'TabName') and set the expression to the name you want each worksheet to have. In the attached example, 'TabName' is set to 'Customer' and 'Store,' respectively for each dataset.
When you get ready to output your data, use an output tool for each dataset and configure them in the same way. The file format is 'Microsoft Excel (*.xlsx)' and when prompted to select a worksheet for the output, select 'Sheet1'. Check the 'Take File/Table Name' checkbox located at the bottom of the configuration window. Select 'Change File/Table Name' in the dropdown and under 'Field Containing Name or Part of File Name' enter 'TabName'. You probably don't want TabName in your final output, so uncheck the 'Keep Field in Output' checkbox.
This error is usually caused by two things:
The file you selected is of a different format than what is selected in the File Format drop down in the Configuration window.
A .yxdb file you are using as an input has become corrupt.
Other causes can be found here: https://community.alteryx.com/t5/Alteryx-Knowledge-Base/Error-FileID-does-not-match-in-the-FileHeader/ta-p/19870
Although extremely rare, if a .yxdb file becoming corrupt is the culprit (one cause is a network connection dropping in the middle of writing the file), you may have a quick and simple way to resolve this potentially stomach-dropping issue:
Open up Windows Explorer and navigate to the file.
Right-click on the file and choose Properties.
Click on the Previous Versions tab.
If you're lucky, there is at least one option of the file that you can revert back to you, hopefully allowing you to avert disaster. Obviously, this doesn't just work with Alteryx files, so perhaps this knowledge will come in handy with other files.
Thanks for reading!
Can I read Outlook emails and attachments with Alteryx?
As our users are looking at automating their processes within Alteryx, the type of files they want to be able to read varies from your regular data file types. I was asked a few times if we could read outlook emails and/or attachments with Alteryx.
One way to accomplish this is by linking the Outlook folder (s) to an Access database table (s) and then to read the Access database with Alteryx, since it is a link, the table(s) in the Access database will be updated with all the emails that are being moved into that Outlook folder.
This Link walks you step by step on how to link an Outlook folder to an Access database table http://www.fmsinc.com/microsoftaccess/email/linked/index.htm
Once the Access table(s) is linked to the Outlook folder(s) you are ready to read it into Alteryx. Go to Alteryx and bring an Input tool and browse to the Access database file. Most likely the table tab won’t be populated, go to the SQL Editor tab and write the query as shown below, where Training is the name of your table in Access, then click OK.
Upon workflow execution, you will retrieve all this information from your Outlook emails
Reading Email Attachments
A rule can be created in Outlook to automatically download attachments to a specified computer folder where Alteryx can pick them up. There is a good article that walks through the setting of this Outlook Rule.
Notice that the script is where the folder to download is specified, specific line is saveFolder = "c:\temp\"
As always, do a small test first with a controlled outlook folder. Once this rule is set, you would simply point to the file(s) with an Alteryx Input Data tool.
By combining Alteryx and Microsoft Power BI, organizations can streamline and accelerate the process of preparing and analyzing data. This provides a faster way to deliver an end-to-end experience for data access, preparation, analysis, visualization and consumption — delivering deeper business insight faster with a more complete set of data.
When importing a .xls file using the Input tool, Alteryx produces NULL values for specific cells, why does this happen?
By design, Alteryx uses either the Jet or ACE drivers when importing .xls files. In short, it’s not really an Alteryx issue but rather a limitation on the Microsoft drivers, but there’s always a workaround available, and we’ll dig into it later.
One of the common reasons this happens is due to the logic both the JET and ACE drivers follow. By design, they look at the first eight rows in your document when identifying the data types and their lengths, when it comes to handling Legacy Excel files. Translation, please? If at least one cell within the first eight rows of your file contains more than 255 characters, then the drivers will set the data type as “memo” (i.e., a length that can hold up to 32,768 characters). Now if the data type doesn’t exceed 255 characters within the first eight rows the driver sets the data type as a string with a 255-character restriction. How does this relate back to you? Well, if any of your cell values that are after the first eight rows do surpass 255 characters, then the driver will assign a NULL value to those cells. Confusing, right? It’s a good thing all is not lost, at least not yet.
The user has the choice to work around this issue or fix it in their registry. The workaround includes the following:
From the Input Data Tool configuration, check option 6: "First row contains data." This will force all fields as a V_String.
Configure the Dynamic Rename Tool to "Take Field Names from First Row of Data."
Use the Auto Field Tool to assign the best field type for each data column correctly.
The fix for this involves changing a setting in your registry. These instructions are being provided as well as documented from Microsoft, as Alteryx cannot be sure that changing this setting will not cause problems with other applications. For more information on this issue: http://support.microsoft.com/kb/189897.
To change the Registry setting***
Go to Start > Run and type "regedit"
In the registry go to HKey Local Machine > Software > Microsoft > Jet > 4.0 > Engines > Excel
Double Click: TypeGuessRows
Change the value to 0 (zero)
Another workaround is to save the .xls file as a .xlsx. It’s well known that Microsoft hasn’t updated the legacy file extension since 1997. If this is an option you desire to exercise, fear not, there is a Macro for that, especially if you need to convert files by the masses.
Quick Shout out to DanJones for creating a fantastic app that takes a directory path and converts all xls files to csv or .xlsx respectively.
The post is located all the way in the bottom.
Disclaimer, when it comes to converting a .xls as .xlsx, please review if your workflow has a lot of formatting enabled, if so, please remember that sometimes this will potentially create inconsistencies with your data. It’s also possible to lose some data in the process.
*** For performance reasons, setting the TypeGuessRows value to zero (0) is not recommended if your Excel table is enormous. When the value is set to zero, Microsoft Excel will scan all records in your table to determine the type of data in each column.
Thanks for reading! If you have any questions, please message away!