Alteryx Designer Desktop Discussions

umairah · ‎08-19-2020

Hello, I am a student and still learning to use Alteryx designer. I have a pdf file that I want to output as excel file and I already convert pdf file to yxdb file. My problem now is that I can't figure how to separate the country with all the data. I already filter all unnecessary sentences so I stuck at this part.

Any help is really appreciated and I will attach my workflow, yxdb and the pdf file.

Emil_Kos · ‎08-19-2020

Hi,

I believe you need to use regex functionality.

Maybe article below will be helpful for you:

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/How-to-parse-words-from-numbers/td-p/3...

PhilippK · ‎08-19-2020

Hi @umairah ,

ideally you use the Text Mining tools of the Intelligent Suite (add-on for the Alteryx Designer) to read in pdfs:

https://www.alteryx.com/products/alteryx-platform/intelligence-suite

However, this comes with a price. You could reach out to Alteryx for Good to check whether it is possible to get a free (trial) license as a student for the Intelligent Suite:

alteryxforgood@alteryx.com

https://www.alteryx.com/why-alteryx/alteryx-for-good/students

Best regards

Phil

markcurry · ‎08-19-2020

Hi @umairah

Two things that may help you... Firstly, as @Emil_Kos mentioned you could use RegEx, you could add the RegEx tool to identify 2 or more spaces, and replace them with a |, and then use Text to Columns to separate on the | (see attached). Or you could use a more complicated RegEx statement to split each line properly.

Or if you look at the data that you've extracted from a PDF in Notepad with a font like Courier or Consolas, where is character is the same length, you'll see that the data is fixed width,so you could use the SubString function to extract each section.

I hope that helps.

marcusblackhill · ‎08-19-2020

Hi @umairah !

You can use all the answers what all said before, but if you dont want to use REGEX and you just need to separate the country name of the rest of the numbers, you can use 2 parallel data cleansing tools, 1 removing numbers, punctuation and duplicate spaces to get just country names and other removing letters to get just your number, then you join them with a join tool by position.

Look the attached workflow.

Hope that help you.

umairah · ‎08-20-2020

Using the regEx is really help me in separating the country with the numbers and your solution is actually simplifly my workflow for the first part so thank you for that but for the second part in the pdf file between page 138 until 141 is the part that I stuck until now. I want to align the number with the respective country and using regEx only solve one of the issues. I am sorry for asking too much but I really don't know how solve this so any suggestion is really helpful.

umairah · ‎08-20-2020

Actually I applied for Alteryx for good but unfortunately it didn't come with text mining tools.

shreyanshrathod · ‎04-13-2021

Hi @umairah ,

Could you tell me how did you extract your pdf file and converted it to yxdb?

Thanks in advance.

Shreyansh

JakobJ · ‎02-09-2022

How did you convert PDF to an yxdb file?

Thank you in advance.

Alteryx Designer Desktop Discussions

pdf extraction