Hi Alteryx Champions, I am trying to extract data from a PDF file, I have been able to get all the other pages extracted using R code, however this page attached that is not coming thru appropriately. Any help on a Regex or formulae tool to get this information would be great. To be more specific i want all the locations to come in one column and the workdays to be in the second column. Not so much worried about the other columns. I need location, workdays and COVID workdays.
Solved! Go to Solution.
Go ahead and give us your workflow too so we can help out with however it’s coming through! The best way is to Export the workflow under Options
DO you have Alteryx Intelligence Suite?
I think i Missed it.
thank for bringing it to my attention.
This is the code that i have, I am unable to extract the table information properly
library('pdftools')
library('tibble')
library('dplyr')
data <- read.Alteryx("#1", mode="data.frame")
pdf_file <- file.path(data$FullPath)
txtdata <- pdf_data(pdf_file)
output <- txtdata[[1]] %>% add_column(page = 1, .before = "width")
if(length(txtdata)>1){
for(i in 2:length(txtdata)){
data <- txtdata[[i]] %>% add_column(page = i, .before = "width")
output <- bind_rows(output,data)
}
}
write.Alteryx(data.frame(output), 3)
@Pranab_C as mentioned, it would be best to have your workflow so we can see what that code is producing. That way, we can suggest the right formula / regex on the data
I am sorry, but I am not an R expert so I can't directly affect your code and I am not sure what your output is. It does run but it’s different than what I’m used to - I use this tool with R code to help out and perhaps you could use it in this case: PDF Input - Alteryx Community. I used it and it read the data in better than the way in the .yxzp.
Then you can use Alteryx afterwards to parse out the parts you need. I can try to help with this, but only if it’s the path you want to go down. Good luck!