Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Parse Outline from Raw Text and Concat Aligning Text

hellyars
13 - Pulsar

Hi,

I am having some difficulty parsing text (from a PDF).  And my AIS PDF Tools are missing (ugh) after updating to 2021.2.  It's just a Monday.

 

Workflow attached.  Sample page image below.

 

A few things...

 

  1.  I need to remove all the header and footer information.  Referenced Record IDs 145-149 and 189-192.    The first 3 of each group are the header.  The last two of each group is the footer (date and page number).  These need to removed from the entire document.  With the exception of the page number, the other 4 values are constant.
  2. I need to pull out the outline.  Each bit of the outline will be its own line starting with(^C\..*?$). [Reference RecordID 162]  Subsections will be delineated by \([a-z]\).  [Reference RecordID 163]
  3. I then need to group (concat) all the lines of text that correspond with an outline section.  [Refernce RecordIDs 162-167)

Thanks

 

 

 

Screen Shot 2021-05-24 at 2.04.37 PM.png

1 REPLY 1
mnorberg
Alteryx Alumni (Retired)

Hi @hellyars,

 

Were you able to successfully install the Alteryx Intelligence Suite on your device and activate your Intelligence Suite license key?

 

Both will need to happen in order for you to use the PDF Input tool. The Intelligence Suite has its own installer and needs to be re-installed after an upgrade. If your company already has access to an Intelligence Suite license, the Intelligence Suite installer can be downloaded from the downloads and licensing portal at licenses.alteryx.com.

Labels