community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Promote Knowledge Base

Definitive answers from Promote experts.

Querying a Database Through a Promote Model

Sr. Data Science Content Engineer
Sr. Data Science Content Engineer
Created on

Did you know that you can use Promote to query a database (or include a database query in your Promote model)? The only additional step from deploying a simple model is making a helper function that creates a connection and queries your database.

 

In the following example, we will be connecting to a PostgreSQL database using the Python module psycopg2. This article is based on the DB Lookup example posted on Promote-Python Github page.

 

The first step is to create the helper function that connects to your database. In your working directory, create a new folder called helpers, and in that folder create a python file named dbconn.py.

 

In the dbconn.py script, import the Python module psucopg2

 

import psycopg2

 

Now, we are going to write our helper function. For this deployment, we want to connect to a table called TheMuppetShow. This table includes the Name, Species, and Occupation of a handful of stars from The Muppet Show. We are going to define a function that returns the species of a given character by querying their name. 

 

muppets.jpg

 

Before we get too far into writing the function, let's develop the code that connects to our database. We are going to use the psycopg2 connect function. 

 

conn = psycopg2.connect(
host="DBHostName", # Name of the machine your database is hosted on, provided by DBA or IT
dbname="postgres", # Name of the database
user="postgres", # Username for the account that will be used to make queries from Promote
password="postgres", # Super sneaky and secret password
port=5432) # Port number DB is listening on, default is 5432

# close the database connection
conn.close()

 

If this code executes without error, it is connecting to the database successfully, and we can move on to the next step. If it is not, leverage the error messages to troubleshoot the connection.

 

As a next step, we are going to write our SQL statement. For developing the SQL statement, I chose to work in pgAdmin, which is a GUI Database Admin program for PostgreSQL. 

 

2018-12-10_14-50-41.png

This statement filters by the character's name and returns the character's species. To put this query into Python, I added quotes around the query to make it a string, and added backslashes to escape the interior quotes around the table and column names (which are case sensitive and therefore require the quotes). 

 

q = "select \"TheMuppetShow\".\"Species\" from \"TheMuppetShow\" where \"TheMuppetShow\".\"Name\"='Kermit';"

 

To make the query dynamic, I replaced the static value Kermit with {get_name}, and added .format() at the end of the string with get_name = name as the argument. When we convert this code into a function, name will be the input value. 

 

q = "select \"TheMuppetShow\".\"Species\" from \"TheMuppetShow\" where \"TheMuppetShow\".\"Name\"='{get_name}';".format(get_name = name)
 
To test our query in Python, we can run the following code, 
 
# open db connection
conn = psycopg2.connect(host="localhost", dbname="postgres", user="postgres", password="postgres", port=5432)

# provide name to search for
name = "Kermit"

# define query
q = "select \"TheMuppetShow\".\"Species\" from \"TheMuppetShow\" where \"TheMuppetShow\".\"Name\"='{get_name}';".format(get_name = name)


cur = conn.cursor()

# execute query
cur.execute(q)

# fetch query data
data = cur.fetchall()

# Print out query return
print(data)

# Close connection
conn.close()
 
With this code all working as expected, we can convert it into a full-blown function. To help manage the error messages that the end user will see, we will wrap our connection and query executions in try: statements and corresponding except: clauses. This will allow the user to know which part of the function is failing if the Promote model fails to make a reasonable return. 
 
import psycopg2

# create a DB function
def get_db_data(name):
    """
    This function opens a connection to a Postgres DB
    and returns the user information with the matching `user_id`
    """
    data = None
    try:
        conn = psycopg2.connect(
            host="DEN-DODT-PSQ-02.extendthereach.com",
            dbname="postgis",
            user="csuser",
            password="!IamtheWalrus!",
            port=5432)
    except:
        print("Unable to connect to the database")

    q = "select \"TheMuppetShow\".\"Species\" from \"TheMuppetShow\" where \"TheMuppetShow\".\"Name\"='{get_user_id}';".format(get_user_id = name)
    cur = conn.cursor()
    try:
        # as an example of the timeout, the "pg_sleep" statement will fail
        #cur.execute("select pg_sleep(2000)")
        cur.execute(q)
        data = cur.fetchall()
    except:
        print("failed to execute query")
    # close our connection
    conn.close()
    return data

 

If you are having trouble getting your connection or query to run as expected, a helpful place to start troubleshooting is the psycopg2 documentation

 

Now that we have our helper function written, we need to add a __init__.py file to the helpers directory. This will cause Python to treat the folder it is inside of as a directory containing packages. In this use case, we can leave our __init__.py file empty.

 

With our helper function configured to be accessible by another script, can write the script we will be deploying to Promote. We will create another Python script, called main.py

 

At this point, our working directory looks like this:

 

db-muppetshow/
├── main.py ├── helpers │ ├── __init__.py │ └── dbconn.py

 

In the main.py script, we need to load the packages promote, re (for regex parsing), schema, and our helper script dbconn form our helpers folder.

 

import promote
import re
from schema import Schema  # https://pypi.python.org/pypi/schema

from helpers import dbconn

 

Next, as with any Promote deployment script, we need to provide our promote API information, and instantiate a Promote class.

 

# instanciate the Promote class with our API information
USERNAME = "USERNAME"
API_KEY = "YOUR_API_KEY"
PROMOTE_URL = "http://promote.internal.alteryx.com/"

p = promote.Promote(USERNAME, API_KEY, PROMOTE_URL)

 

We can add a line of code that validates the JSON input, and then a function that queries our database using our helper function, and returns a cleaned-up string (using the mighty powers of regex).

 

# validate JSON
@promote.validate_json(Schema({'id': str}))

# query database and return species def promoteModel(data): user_info = dbconn.get_db_data(data.get('id')) # strip extra punctuation strip = re.sub(r'[\s\[\](),\'\']', "", str(user_info)) return {"Species": strip}

 

As the last step in this script, we need to add some test data, and then deploy the model.

 

# some test data
TESTDATA = {"id": "Kermit"}
print(promoteModel(TESTDATA))

# name and deploy our model
p.deploy("TheMuppetsDBLookup", promoteModel, TESTDATA, confirm=True, verbose=0)

 

Before running our script, we need to add a requirements.txt file so that the Promote deployment will include the necessary packages. The contents of the requirement.txt file are as follows:

 

promote
schema==0.6.5
psycopg2==2.7.6.1

 

The final structure of our directory looks like this:

 

db-muppetshow/
├── main.py
├── helpers
     ├── __init__.py
     └── dbconn.py
└── requirements.txt

 

And we are ready to deploy the model to promote by running our script. Once the model successfully builds on Promote, we can run a query from the Execute tab and see that it is able to successfully access the database.

 

promoteDB.gif

 

This simple example of connecting a Promote model to a database can be leveraged in many different and exciting ways. I hope this has been a helpful jumping off point for your next Promote adventure.