Hello, I am a student and still learning to use Alteryx designer. I have a pdf file that I want to output as excel file and I already convert pdf file to yxdb file. My problem now is that I can't figure how to separate the country with all the data. I already filter all unnecessary sentences so I stuck at this part.
Any help is really appreciated and I will attach my workflow, yxdb and the pdf file.
Solved! Go to Solution.
Hi,
I believe you need to use regex functionality.
Maybe article below will be helpful for you:
Hi @umairah ,
ideally you use the Text Mining tools of the Intelligent Suite (add-on for the Alteryx Designer) to read in pdfs:
https://www.alteryx.com/products/alteryx-platform/intelligence-suite
However, this comes with a price. You could reach out to Alteryx for Good to check whether it is possible to get a free (trial) license as a student for the Intelligent Suite:
alteryxforgood@alteryx.com
https://www.alteryx.com/why-alteryx/alteryx-for-good/students
Best regards
Phil
Hi @umairah
Two things that may help you... Firstly, as @Emil_Kos mentioned you could use RegEx, you could add the RegEx tool to identify 2 or more spaces, and replace them with a |, and then use Text to Columns to separate on the | (see attached). Or you could use a more complicated RegEx statement to split each line properly.
Or if you look at the data that you've extracted from a PDF in Notepad with a font like Courier or Consolas, where is character is the same length, you'll see that the data is fixed width,so you could use the SubString function to extract each section.
I hope that helps.
Hi @umairah !
You can use all the answers what all said before, but if you dont want to use REGEX and you just need to separate the country name of the rest of the numbers, you can use 2 parallel data cleansing tools, 1 removing numbers, punctuation and duplicate spaces to get just country names and other removing letters to get just your number, then you join them with a join tool by position.
Look the attached workflow.
Hope that help you.
Using the regEx is really help me in separating the country with the numbers and your solution is actually simplifly my workflow for the first part so thank you for that but for the second part in the pdf file between page 138 until 141 is the part that I stuck until now. I want to align the number with the respective country and using regEx only solve one of the issues. I am sorry for asking too much but I really don't know how solve this so any suggestion is really helpful.
Actually I applied for Alteryx for good but unfortunately it didn't come with text mining tools.
Hi @umairah ,
Could you tell me how did you extract your pdf file and converted it to yxdb?
Thanks in advance.
Shreyansh
How did you convert PDF to an yxdb file?
Thank you in advance.