Parse Outline from Raw Text and Concat Aligning Text
Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
hellyars
13 - Pulsar
05-24-2021
11:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi,
I am having some difficulty parsing text (from a PDF). And my AIS PDF Tools are missing (ugh) after updating to 2021.2. It's just a Monday.
Workflow attached. Sample page image below.
A few things...
- I need to remove all the header and footer information. Referenced Record IDs 145-149 and 189-192. The first 3 of each group are the header. The last two of each group is the footer (date and page number). These need to removed from the entire document. With the exception of the page number, the other 4 values are constant.
- I need to pull out the outline. Each bit of the outline will be its own line starting with(^C\..*?$). [Reference RecordID 162] Subsections will be delineated by \([a-z]\). [Reference RecordID 163]
- I then need to group (concat) all the lines of text that correspond with an outline section. [Refernce RecordIDs 162-167)
Thanks
Labels:
- Labels:
- Parse
1 REPLY 1
mnorberg
Alteryx Alumni (Retired)
05-28-2021
12:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @hellyars,
Were you able to successfully install the Alteryx Intelligence Suite on your device and activate your Intelligence Suite license key?
Both will need to happen in order for you to use the PDF Input tool. The Intelligence Suite has its own installer and needs to be re-installed after an upgrade. If your company already has access to an Intelligence Suite license, the Intelligence Suite installer can be downloaded from the downloads and licensing portal at licenses.alteryx.com.
