Alteryx Designer Desktop Knowledge Base

Definitive answers from Designer Desktop experts.

How To Input Excel Files That Have Encoding Issues

lepome
Alteryx Alumni (Retired)
Created
Some Excel files, particularly those created by third-party programs, are encoded differently than Alteryx expects.  In many cases, such files can be opened in Excel and then saved, resulting in files that Alteryx can open.  A workflow example is attached to this article, and the article explains how it was constructed so that you can customize it to meet your needs.
Please refer to Designer Knowledge Base article entitled "How to check for encoding or formatting issues with Excel worksheets" for diagnosis tips.

Prerequisites

  • Product - Alteryx Designer
    • 11.0 +
  • Product - Microsoft Excel
    • Most Versions 

Procedure

  1. Create a Macro that will open the desired Excel file and then save it.
    1. This article shows how to do this with a batch macro, a versatile type of Alteryx macro.  It will allow you to dynamically specify more than one file to convert and then convert them all in a single workflow.  This is intended to be a guide.  The workflow example provided is not warranted in any way.
    2. Alteryx does not itself carry out the desired conversion.  Instead it allows you to dynamically call a separate command within your workflow.  To do so, you must use a Run Command tool.  The tool has three main sections and whether you use them or not, you must enter a file location for either the Write Source or the Read Results section, although you can use both if you wish.  For the purposes of this macro, it does not matter what file you specify in either section, just that there is one.
      idea Skyscrapersidea Skyscrapers
    3. The Run External Program section is where you configure everything needed to .  Because your workflow may be stored in whatever location you choose, you will need to enter the full path to the command you intend to run.  In this example I assume that you will run the workflow on a local machine that also has Excel installed.  (If you expect to eventually run the workflow from a server (worker) node that does not have Excel installed, you will want to use a UNC path to the Excel executable.  UNC paths are the most versatile way to specify a location for Alteryx purposes.)  The path shown refers to Office16, and may not be the same as the path on your machine, so you will want to customize it as needed.
      idea Skyscrapersidea Skyscrapers
      After specifying the path and command (executable) you want to run, you'll need to add any desired Command Arguments.  In this case you need to specify information about the source file and the output file.  Because this Run Command tool is part of a macro, you can use placeholders for the path and filename.  The configuration shown is hard-coded to convert an .xlsb file to an .xlsx file in the same location.  Congratulations, you've configured the Run Command tool.
      idea Skyscrapersidea Skyscrapers
    4. To build the batch macro around the Run Command tool, add a Control Parameter tool and an Action tool.  Connect the tools as shown.  Enter a label for the Control Parameter or leave the default.  Because of the way we've set up the Run Command tool, we're going to want to feed in the full path and filename, but without the file extension.  (You can customize this as shown later.)
      idea Skyscrapersidea Skyscrapers
    5. Configure the Action tool.  The Action tool is among the most powerful tools, and it does different things in different contexts.  Here, we're going to use it to dynamically set the full path and filename (without the file extension).  There are a couple of ways to do this, and one is to Update Value with Formula.
      idea Skyscrapersidea Skyscrapers
      Select the part you want to update, here the Command Arguments, shown simply as Arguments.  When you click on the ellipsis, a window opens to construct the formula expression.  The Alteryx function used here is a string function called Replace.  Because we want to replace both placeholders, we use it twice by nesting the first time inside the second time.  Note that the available variables are listed on one of the tabs.  We check in the Destination variable and replace the 'Input' placeholder with the full path and filename that comes through the Control Parameter (without the file extension).  That variable is the name of the connector going between the Control Parameter and the Action tool, in this case, #1.  In the formula we designate the variables by putting them in brackets, just as with all other tools that use expressions.  The outer Replace checks the resulting string and replaces the 'Output' placeholder with the same full path and filename. 
      idea Skyscrapersidea Skyscrapers
      Click OK, and the macro is complete, so it's time to save it.  I called mine Batch Convert xls.yxmc.
  2. Once you have the macro, you can use it in a workflow.  Continuing with this example, the workflow has to generate the full path and filename without the file extension because that is how we set up the Run Command tool.  (Using the target file extension in the Directory tool makes the Filter tool redundant so I removed it in the attached workflow.)  As you can see below, I've pulled the information from the target directory, trimmed the last five characters from the Full Path field to get the full path and filename without the file type extension.  I add the macro to the workflow.
    idea Skyscrapersidea Skyscrapers
    And then I configure the macro.
    idea Skyscrapersidea Skyscrapers
  3. You can customize this to your needs. 
    1. For example, if you want to save the output file to a different location you can do that.  All you have to do is decide on the save location and use the Action tool slightly differently.  Where the original example has a path and filename that's the same for both the input and the output,
      idea Skyscrapersidea Skyscrapers
      you can add another Control Parameter to provide a second path for the output.
      idea Skyscrapersidea Skyscrapers
    2. If you'd prefer to input the file so that you can use the data immediately in a larger workflow, you can do that by modifying the original workflow to dynamically modify the Run Command tool in a second Action.  Where we previously had an unused filename in the Read Results section,
      idea Skyscrapersidea Skyscrapers
      you can now have a second Action tool modify that to match the file and path of the newly-saved .xlsx file.  Here I've done that with a formula that replaces a dummy filename (except for the file extension) with the desired full path and filename (without the file extension).  I added a Macro Output so that the data can come out of the macro and you can add more tools after it.
      idea Skyscrapersidea Skyscrapers
  4. You can customize this to your needs.  Throughout this article the path and filename were used without the file extension and the file extensions were set in the Run Command tool, but you could just as easily feed in the unmodified Full Path as the 'Input'.xlsb command argument and feed in a separate path as the 'Output'.xlsx or you could use a more involved Formula to trim the file extension and replace it all in the Action tool.  The possibilities are endless.  And that's the magic of Alteryx!

Additional Resources

Many thanks to MarqueeCrew for creating this macro nearly three years ago!
Interactive Lesson on Writing Expressions using String Functions
Video Introduction to Building Macros
Video: Build your first Macro
Video on Advanced Macro Development
 

 
Attachments
Comments
lepome
Alteryx Alumni (Retired)

This same kind of approach can work to remedy Access database encoding issues using the msaccess.exe /compact command:
Note also this path for 64-bit office.Note also this path for 64-bit office.Rather than overwriting the db, this creates a new one.Rather than overwriting the db, this creates a new one.Then update the Input Configuration.Then update the Input Configuration.

annelson19
5 - Atom

I am running this macro on fairly small files (~15kb - 30kb) and it is taking about a minute per file to run. Is this run time expected? I am trying to set this up to be run weekly and the source has 114 files so the time is quite burdensome. I may be able to setup a scheduled run but wanted to see if this was expected or if you had any advice.

Thanks!