Import Python Library
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Does anyone know how to import python libraries (like pandas) under Alteryx??
Thanks.
Solved! Go to Solution.
- Labels:
- API
- Dynamic Processing
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
In fact,
I'm trying to run a python code (test.py) whom the content is :
#import librairies import csv import pandas as pd import numpy as np # read excel file and store it in file variable file="input.xlsx" xl = pd.ExcelFile(file) # Define the dataFrame df1 : contains column metadata df1 = xl.parse('sheet_1') # Define the dataFrame df1 : contains line metadata df2 = xl.parse('sheet_1') # store excel file as csv for columns & ilnes metadata df_columns = df1.to_csv("output1.csv", index = False) df_lines = df2.to_csv("output2.csv", index = False)
But I get errors. I thinks Alteryx is not able to get libraries or I'm not doing right.
Thanks for yor help.
Shaikle
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
The given solution works only in the case when library needs to be imported in "Python SDK" tool from the "SDK Example" pallete. How to install packages when running python codes from "Apache Spark code" tool from developer pallete?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
To use packages in the Apache Spark Code tool, they will need to be installed on the Apache Spark cluster you are connecting to with your workflow. The process of installing those packages depends on two things - do you want them installed permanently or just for the job / workflow you are running, and the type of Apache Spark cluster you are using (on-premises, Databricks, or Microsoft Azure HDInsight).
If you are wanting to install the packages on the cluster permanently, then the instructions heavily depend on the type of Apache Spark cluster you are using.
- On-premises (i.e., Livy cluster): The packages will need to be installed using pip (or whatever Python package manager is used on your servers), preferable on each server in the cluster. It can be done on just one or a few, rather than all, but each time a job runs that uses the package, it will be copied to each worker that doesn't already have it. There are scripts / tools available to make this easier if you have a large cluster.
- Microsoft Azure HDInsight: Essentially the same as above. However, you can do this through the Azure web interface and the process of getting the package on each worker in the cluster is easier.
- Databricks: Simplest of all. Databricks refers to this in their documentation as "installing a library", and it is the same process for Python, Java, Scala, and R libraries. Their documentation for the process can be found at https://docs.databricks.com/user-guide/libraries.html
If you only want to use the packages for this job / workflow, then the instructions are simpler and nearly identical for each type of connection. This can be done in the connection configuration dialog where you set up the Apache Spark connection. In each connection type, whether it is an on-premises, Databricks, or Microsoft Azure HDInsight, you have the option to add libraries to your connection string. You simply add the library in that part of the connection configuration dialog. The exact instructions can be found in the Alteryx help, and since they may change in the future after my reply is written, I'll simply provide a link to that documentation here:
- On-premises (i.e., Livy) or Microsoft Azure HDInsight: https://help.alteryx.com/current/DataSources/SparkDirect.htm, under Advanced Options
- Databricks: https://help.alteryx.com/current/DataSources/SparkDatabricks.htm
Senior Software Engineer
Alteryx
