Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Massive temp files from R

JReid
9 - Comet

I'm running the R tool with the following code for reading in a PDF, converting it to a png, and running an OCR four times while flipping the image 90 degrees each time.

 

 

# read in the PDF file location which must
# be in a field called FullPath
File <- read.Alteryx("#1", mode="data.frame")
 
# Use pdf_text() function to return a character vector
# containing the text for each page of the PDF
pngfile <- pdftools::pdf_convert(file.path(File$FullPath), dpi = 600)
pngfile <- magick::image_read(pngfile)
 
# convert the character vector to a data frame, Write to Alteryx output 1
text <- magick::image_ocr(pngfile)
cat(text)
write.Alteryx(text, 1)

# rotate image 90 degrees and write to seperate Alteryx Outputs
pngfile <- magick::image_rotate(pngfile, 90)
text <- magick::image_ocr(pngfile)
cat(text)
write.Alteryx(text, 2)

pngfile <- magick::image_rotate(pngfile, 90)
text <- magick::image_ocr(pngfile)
cat(text)
write.Alteryx(text, 3)

pngfile <- magick::image_rotate(pngfile, 90)
text <- magick::image_ocr(pngfile)
cat(text)
write.Alteryx(text, 4)

 

 

I'm using this with a batch macro set to read in a folder of scanned PDFs and parse out the significant data from them. Each time an iteration runs, it leaves behind some massive temp files from the magick functions (3+ GB per iteration) that R should automatically delete. This ends up filling my /temp folder and causes later iterations to error out due to lack of temp space. Each iteration creates a unique tmp folder, and most of the files automatically delete upon completion of the iteration, just not the magick ones.

 

Is there any way to automatically clear these between iterations since R is not doing it itself?

 

I'm using Alteryx 2018.2 so I'm limited to R 3.4 and I'm using magick version 2.0

3 REPLIES 3
PeterA
Alteryx Alumni (Retired)

@JReid This has been a often discussed issue within the magick community for a number of years. Its likely the image size exceeds the available memory on your machine.  See https://imagemagick.org/script/resources.php and look for MAGICK_SYNCHRONIZE and set this environment variable to "true" to ensure all image data is fully flushed and synchronized to disk.

JReid
9 - Comet

@PeterA I've tried setting this to true by adding in Sys.setenv(MAGICK_SYNCHRONIZE = TRUE) to the start of my code right after reading in the libraries, and added in a gc() after each alteryx write for good measure, but I've still got large temp files remaining.

 

Is this the correct function to call for the environment variables in R, or do I need to find the ?Startup and add it to there?

JReid
9 - Comet

The solution I've managed to get to work is to include a Run Command tool as part of my macro to run a batch command with the following code:

 

FOR /D %%X IN (C:\Temp\Rtmp*) DO RD /S /Q "%%X"

 

 

Labels