Introduction
I had a problem where I corrupted a table that one of my Tableau dashboards was driven by. As a result, the workbook would no longer open, and I was stuck facing the sad probability that I had to redo all my work from scratch, creating all my calculated fields again. A colleague reminded me that you could open a .twb workbook in Notepad and look at the xml code. This was great, but if you've ever looked at one of these files, there is a lot going on in there, and it's not overly obvious what to change.
This was where the idea of using Alteryx came from. I decided to automate the process, and I built a workflow that would allow you to input a .txt file of your Tableau workbook. It would then do all the clever things Alteryx does and output all your calculated fields in an easy-to-use format, all written in one field. This can then be copied to wherever you want to use it.
How It Works
-
Takes the input from your .txt dashboard file.
-
Alongside that, it uses a set list of often-used HTML entities. This list can be added to if you use other things.
-
Both files are fed into the macro for data manipulation and restructuring
-
The macro outputs one field with all the calculations, ready to be copied and pasted where needed.
Under The Hood
Phase 1: Bring in the data and reduce the rows we are looking at
The .txt file comes in and is assigned a record ID for each line of data. We then use a filter and a Summarize tool to find the first row of data that equals </metadata-records>
. This allows us to refine the data we are going to reformat. There can be multiple instances of each calculation in the file, so this lets us just focus on the first set of them.
This record ID is appended back to the main data, and then we filter off all the data before it, getting rid of things we don’t need. The next thing we do is filter for any records containing one of the following: <column caption=
, <calculation class=
, </column>
. These are the records we want to transform. Finally, we sort them ascending so the related rows are next to each other.
Phase 2: Get the correct data together in a row
The first thing we do here is filter on <column caption=
. From the True branch this is the data that contains all the calculations metadata. The False branch has all the calculations for them. For the True branch, we parse the data using this regular expression:
caption='(.*?)'.+datatype='(.*?)'.+name='(.*?)'.+role='(.*?)'.+type='(.*?)'
This is splitting off the five pieces of information we may want to know.
For the False branch, we parse the data using this regular expression:
formula='(.*?)'
This gives us the formulas. By unioning this back together and sorting on the RecordId field, we are then left with something that looks like this:
As you can see, we have all the right data but split over multiple rows. To sort this, we use the Multi-Row Formula tool to move the formula up one cell and bring all related data together in one line.
With a little bit of data cleansing and sorting, we are then left with the following:
Phase 3: Reformat the data to be useable
The first thing we need to do here is create some formulas for the manipulation.
Calculated Field 2- takes the Calculated field and adds square brackets around it so that it is in the right format for Tableau.
Calculated Field- We then change this field to add to slashes before it. This is so that it reads as a comment when pasted in Tableau.
The next tool then replaces all
instances in the relevant fields and changes them to a line break, spreading the data across multiple lines. Again, this is so it reads nicely in Tableau. The next three FindReplace tools use the Entity look up list to convert all the HTML formats into ones that we would read naturally: commas, speech marks, etc. The final FindReplace tool is there so that we can replace the names of other calculations used within each other.
For example, converting this:
To this:
Finally, in this stage, we keep only the fields we need for reformatting:
Phase 4: Summarize and output the data
The first thing we do here is transpose the data so we have the calculated field and formula in one column as the name and the data as values. If we then group by record ID and concatenate the value data, it adds all related info for each calculated field to one cell. The final step is to use one more summarize tool to concatenate everything together so we have all the info in one cell. We use\n\n as the separator in this tool so that we have gaps between each calculation.
There we have it—all our calculations in one field ready to be used elsewhere, all written out in the format with the Calculation name as a comment and the calculation underneath with all the correct references and layout.
Practical Use Cases
- Documentation- This list of fields can be copied and pasted straight into your Tableau documentation file. This means you have a record of all calculations being used, saving you lots of time from manually typing them all out.
- Fixing Problems- If your dashboard completely fails for any reason (like mine did), you can easily retrieve and recreate all your calculations. Using the highlight, drag, and drop method, you can rebuild them all quickly.
Conclusion
This helped me fix my workbook quickly, but I also then realized I could use this to speed up the documentation of my Tableau workbook. I only focused on getting the calculated field data out of the .txt file, but there is a wealth of other information in there that the macro could be manipulated to pull out. You can download the macro on the Gallery here to try it out.