This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
There are a couple of ways to OCR and bring PDF data into Alteryx. The method I would suggest depends on the type of PDF you're reading.
If the PDF contains mainly data, you may be able to use the free PDF input tools available on the Gallery. These use R or Python to OCR and read the data in. These will read everything from the PDF, and then you can use text to columns, regex, filters, multi-row, etc to clean and parse out the pieces you need.
If you need very specific portions of the PDF, the Intelligence Suite that Rishi mentioned could work well for your purposes. They will essentially allow you to highlight and mask sections of your PDF that you'd like to read in. These work especially well if you have a template. The Intelligence Suite is an add-on to Designer, and will require another license. This license will also give you access to additional text mining (sentiment analysis, topic modeling) as well as machine learning. Information on the IS add-on can be found here: