Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

PDF input to text includes random letters

qishao2
5 - Atom

Hi,

 

I'm trying to extract data from PDF forms that have these small separators as shown in the screenshot. Alteryx recognizes them sometimes as the letter i (or sometimes as uppercase I) and in some cases I got a "b" or a "8" from empty spaces for no apparent reasons... Is there a way to get rid of them?

 

edit: I think the thresholding tools should be able to do what I need since I only need words written in black and not anything in light blue... but I couldn't figure out how to set the threshold above the blue color in the background. Somebody help lease!

0 REPLIES 0
Labels