Data Science

Machine learning & data science for beginners and experts alike.
Alteryx Alumni (Retired)

The public loves following mergers and acquisitions (M&A) stories. So much suspense, excitement and… lots of work behind the scenes!


So today we’re stepping into the shoes of an imaginary M&A analyst.


Your task is simple: gather as much information as possible about company A. The domain covered: Manufacturing. The challenge: you were invited to join a senior management meeting a tad bit too late and you have <1hr to create a shiny summary report! Your tools are: A fresh off the press 10-K SEC public annual filing document and your best friend Alteryx Designer + Intelligence Suite.


Time is ticking… Let’s get to it!




Data Ingestion


Well well well, what do we have here: A 200+ page PDF doc! With the <1hr crunch time, we’re definitely crossing off the option of reading through the entire 10-K form, aren’t we?!


Our next best move? A miracle (aka the Intelligence Suite Computer Vision Image to Text tool). We point our input directory tool to the PDF file then connect it to the Image Input tool. Now our 10-K form is ready to be processed within Alteryx Designer.




Now that everything is just a click away, we start to dream big… but not for too long. The clock is ticking!


Wouldn’t it be nice to perhaps extract:


  1. Executive Officers: It’s likely that the management will need that to start scheduling intro meetings
  2. Subsidiaries: An actual visual map of where the company operates would basically speak for itself at the meeting
  3. Markets, Macroeconomic Risks and Strategy: We must take a glimpse at that! That’s the cream of the crop content that everyone always asks about!
  4. Sales Figures / KPIs: We have got to dig into some numbers. How about…checking the past years wholesales unit volumes?


Let us begin the quick and efficient insights extraction!


(1) Extracting the names of Executive Officers


Connecting the Image Template to the previously added Image Input (see above) and having both feed into the Image to Text tool allows for automatic table extraction.




The output is the table’s text which we can then pass as input to the Named Entity Recognition (NER) after some text processing. And ta-da !




Note: NER is even more effective (and in fact generally recommended to use) on fully formed sentences.


(2) Company Subsidiaries


We are interested in visualizing the company’s global geographical presence. Let us then dig up the appendix of the 10-K form, specifically section Exhibit 21. As shown below, the subsidiaries' locations are formatted as a long list based on each organization name and therefore display many country name repetitions. To effortlessly handle this format, we will re-use the PDF to text extraction process with a slight twist this time around!


Geographical locations of the the company’s subsidiariesGeographical locations of the the company’s subsidiaries


Box annotation based Image to Text processingBox annotation based Image to Text processing


Notice that the input anchor for the Image Template tool is not connected back to the Image Input tool. This set up allows us to draw our own annotation (“template”- see red box below). Connecting that output to the template, “T”, input anchor of the Image to Text tool enables extraction of only that specific area of interest!


After some data processing and latitude / longitude matching, we can use the Spatial Alteryx Designer tools to create a visual map of the various company subsidiaries' locations.


Visualization of the company’s subsidiaries globallyVisualization of the company’s subsidiaries globally


(3) Macroeconomic Risk, Markets, and Strategy Insights Extraction


The Text Pre-processing tool not only allows removing common English language stop words, it also offers the option to add additional words we would like to omit from the workflow. We could for instance consider adding generic manufacturing terms we would want excluded from our text mining use case vocabulary: “vehicle”, “new”, “product”, etc.




World Cloud is a helpful text mining tool to quickly get a glimpse at potential overarching trends in a large block of text. It is most effective to use it after Text Pre-processing to avoid having non-relevant but frequent words taking over the visualization.


Per the below output, it appears that one of the themes that seems to be surfacing is one around “autonomous, digital, electric,…” vehicles. Interesting!




(4) Sales Analysis


Lastly, we can use a combination of all of the above techniques to analyze a specific year’s wholesales unit numbers. We can do so by using the previously mentioned PDF table extraction pattern, prior to then leveraging NER’s ability of detecting Geopolitical Entities (GPEs) to successfully rank sale volumes by countries.






And there we have it! An efficient yet in depth company analysis, ideal for crunch time meeting preps 🎉


Try it yourself:


If you would like to try out this workflow tutorial:

  1. If you don't already have it, download a trial version of Alteryx Intelligence Suite here
  2. Download the attached
  3. Unzip the Folder
  4. Open 10-K_AIS_Designer_analysis.yxmd in Alteryx Designer
  5. In the Word Cloud container: select the Image Template tool and point it to the pdf named company-full-10-K.pdf. Next, import the annotation named market-section-annotations.json. See below:
  6. In the 10k Exhibit 21 - Geo exploration container, select the Image Template tool and point it to the pdf named company-subsidiaries.pdf. Next, import the annotation named jurisdiction-annotation.json
  7. Run the workflow. Please expect a few minutes for the full workflow to conclude running.


To find additional resources on the AIS tools, click here:

  1. Alteryx Intelligence Suite Learning Path
  2. Alteryx Intelligence Suite Tools Help Main Page