We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Help with Alteryx Designer Python - PDFPlumber

hurleyk
5 - Atom

Hello! I have attempted to install PDFPlumber to my Alteryx Python environment but the Python tool does not register the existence of the PDFPlumber. I am trying to convert a PDF to a TXT file. Below is the file path I installed to and the error message:

 

c:\users\[my name]\appdata\local\alteryx\bin\miniconda3\envs\

Error: Python (63): Traceback (most recent call last):
File "C:\Users\[my name]\AppData\Local\Temp\Engine_6092_5beb9433a9e6474aaddaa4d715ce51db_\5c8fa4040b4685539b2a94b94d2f62e2\workbook.py", line 12, in <module>
import pdfplumber
ModuleNotFoundError: No module named 'pdfplumber'

 

3 REPLIES 3
abacon
12 - Quasar

@hurleyk Did you use the command within the python tool to install the modules?

#Package.installPackages(['pdfplumber'])

 

That error is saying it can't find the module, meaning where the python tool within alteryx is looking does not have the module.

 

Bacon

hurleyk
5 - Atom

Yes -- below is the full script/command I am using within the python tool. It was previously set up by a different dev:

#################################
from ayx import Alteryx, Package
import os
import pandas as pd
import pdfplumber

# ─── Ensure required package is installed ────────────────────
Package.installPackages(['pdfplumber'])

# ─── Read incoming PDF paths ─────────────────────────────────
df_in = Alteryx.read("#1")
out_rows = []

for _, row in df_in.iterrows():
    pdf_path = row['PDF_Path']
    base, _ = os.path.splitext(pdf_path)
    txt_out = base + '.txt'

    # If .txt already exists, skip
    if os.path.exists(txt_out):
        status = f"Skipped: TXT already exists → {txt_out}"
    else:
        try:
            with pdfplumber.open(pdf_path) as pdf:
                full_text = ""
                for page in pdf.pages:
                    text = page.extract_text()
                    if text:
                        full_text += text + "\n\n"

            # Write to TXT file
            with open(txt_out, 'w', encoding='utf-8') as f:
                f.write(full_text.strip() or f"[No text extracted from {os.path.basename(pdf_path)}]")

            status = f"Success: Created {txt_out}"
        except Exception as e:
            txt_out = ""
            status = f"Error: {e}"

    out_rows.append({
        "PDF_Path": pdf_path,
        "Text_Path": txt_out,
        "Status": status
    })

# ─── Output results to Alteryx ───────────────────────────────
df_out = pd.DataFrame(out_rows)
Alteryx.write(df_out, 1)

 

apathetichell
20 - Arcturus

right click on alteryx --- click run as admin. put a python tool on canvas. uncomment the part about installing packages in the first cell --- add your tool.

hit run.

check for errors.

close alteryx in admin.

re-run other workflow

 

Labels
Top Solution Authors