community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Import Python Library

Atom

Does anyone know how to import python libraries (like pandas) under Alteryx??

 

Thanks. 

Sr. Community Content Manager
Sr. Community Content Manager

The Python SDK is a framework to develop new Alteryx tools with Python. 3rd party library installation instructions are available here. And examples of how to create tools with the Python SDK are here and here.

Atom

In fact, 

I'm trying to run a python code (test.py) whom the content is : 

 

 

 

#import librairies
import csv
import pandas as pd
import numpy as np

# read excel file and store it in file variable
file="input.xlsx"
xl = pd.ExcelFile(file)

# Define the dataFrame df1 : contains column metadata
df1 = xl.parse('sheet_1')
# Define the dataFrame df1 : contains line metadata
df2 = xl.parse('sheet_1')

# store excel file as csv for columns & ilnes metadata
df_columns = df1.to_csv("output1.csv", index = False)
df_lines = df2.to_csv("output2.csv", index = False)

 But I get errors. I thinks Alteryx is not able to get libraries or I'm not doing right.

 

Thanks for yor help.

 

Shaikle

 

 

Alteryx
Alteryx

Follow the links that @NeilR shared. It's not getting the libraries because they do not exist on your machine. They need to be installed in a venv before they can be used.

Regards,

Stephen Ruhl
Customer Support Engineer
Atom

The given solution works only in the case when library needs to be imported in "Python SDK" tool from the "SDK Example" pallete. How to install packages when running python codes from "Apache Spark code" tool from developer pallete?

Alteryx
Alteryx

To use packages in the Apache Spark Code tool, they will need to be installed on the Apache Spark cluster you are connecting to with your workflow. The process of installing those packages depends on two things - do you want them installed permanently or just for the job / workflow you are running, and the type of Apache Spark cluster you are using (on-premises, Databricks, or Microsoft Azure HDInsight).

 

If you are wanting to install the packages on the cluster permanently, then the instructions heavily depend on the type of Apache Spark cluster you are using.

  • On-premises (i.e., Livy cluster): The packages will need to be installed using pip (or whatever Python package manager is used on your servers), preferable on each server in the cluster. It can be done on just one or a few, rather than all, but each time a job runs that uses the package, it will be copied to each worker that doesn't already have it. There are scripts / tools available to make this easier if you have a large cluster.
  • Microsoft Azure HDInsight: Essentially the same as above. However, you can do this through the Azure web interface and the process of getting the package on each worker in the cluster is easier.
  • Databricks: Simplest of all. Databricks refers to this in their documentation as "installing a library", and it is the same process for Python, Java, Scala, and R libraries. Their documentation for the process can be found at https://docs.databricks.com/user-guide/libraries.html

If you only want to use the packages for this job / workflow, then the instructions are simpler and nearly identical for each type of connection. This can be done in the connection configuration dialog where you set up the Apache Spark connection. In each connection type, whether it is an on-premises, Databricks, or Microsoft Azure HDInsight, you have the option to add libraries to your connection string. You simply add the library in that part of the connection configuration dialog. The exact instructions can be found in the Alteryx help, and since they may change in the future after my reply is written, I'll simply provide a link to that documentation here:

David Wilcox
Senior Software Engineer
Alteryx
Labels