This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
on
12-17-2018
02:13 PM
- edited on
02-27-2020
01:23 PM
by
KylieF
Did you know that you can use Promote to query a database (or include a database query in your Promote model)? The only additional step from deploying a simple model is making a helper function that creates a connection and queries your database.
In the following example, we will be connecting to a PostgreSQL database using the Python module psycopg2. This article is based on the DB Lookup example posted on Promote-Python Github page.
The first step is to create the helper function that connects to your database. In your working directory, create a new folder called helpers, and in that folder create a python file named dbconn.py.
In the dbconn.py script, import the Python module psucopg2
import psycopg2
Now, we are going to write our helper function. For this deployment, we want to connect to a table called TheMuppetShow. This table includes the Name, Species, and Occupation of a handful of stars from The Muppet Show. We are going to define a function that returns the species of a given character by querying their name.
Before we get too far into writing the function, let's develop the code that connects to our database. We are going to use the psycopg2 connect function.
conn = psycopg2.connect(
host="DBHostName", # Name of the machine your database is hosted on, provided by DBA or IT
dbname="postgres", # Name of the database
user="postgres", # Username for the account that will be used to make queries from Promote
password="postgres", # Super sneaky and secret password
port=5432) # Port number DB is listening on, default is 5432
# close the database connection
conn.close()
If this code executes without error, it is connecting to the database successfully, and we can move on to the next step. If it is not, leverage the error messages to troubleshoot the connection.
As a next step, we are going to write our SQL statement. For developing the SQL statement, I chose to work in pgAdmin, which is a GUI Database Admin program for PostgreSQL.
This statement filters by the character's name and returns the character's species. To put this query into Python, I added quotes around the query to make it a string, and added backslashes to escape the interior quotes around the table and column names (which are case sensitive and therefore require the quotes).
q = "select \"TheMuppetShow\".\"Species\" from \"TheMuppetShow\" where \"TheMuppetShow\".\"Name\"='Kermit';"
To make the query dynamic, I replaced the static value Kermit with {get_name}, and added .format() at the end of the string with get_name = name as the argument. When we convert this code into a function, name will be the input value.
q = "select \"TheMuppetShow\".\"Species\" from \"TheMuppetShow\" where \"TheMuppetShow\".\"Name\"='{get_name}';".format(get_name = name)
# open db connection
conn = psycopg2.connect(host="localhost", dbname="postgres", user="postgres", password="postgres", port=5432)
# provide name to search for
name = "Kermit"
# define query
q = "select \"TheMuppetShow\".\"Species\" from \"TheMuppetShow\" where \"TheMuppetShow\".\"Name\"='{get_name}';".format(get_name = name)
cur = conn.cursor()
# execute query
cur.execute(q)
# fetch query data
data = cur.fetchall()
# Print out query return
print(data)
# Close connection
conn.close()
import psycopg2 # create a DB function def get_db_data(name): """ This function opens a connection to a Postgres DB and returns the user information with the matching `user_id` """ data = None try: conn = psycopg2.connect( host="DEN-DODT-PSQ-02.extendthereach.com", dbname="postgis", user="csuser", password="!IamtheWalrus!", port=5432) except: print("Unable to connect to the database") q = "select \"TheMuppetShow\".\"Species\" from \"TheMuppetShow\" where \"TheMuppetShow\".\"Name\"='{get_user_id}';".format(get_user_id = name) cur = conn.cursor() try: # as an example of the timeout, the "pg_sleep" statement will fail #cur.execute("select pg_sleep(2000)") cur.execute(q) data = cur.fetchall() except: print("failed to execute query") # close our connection conn.close() return data
If you are having trouble getting your connection or query to run as expected, a helpful place to start troubleshooting is the psycopg2 documentation.
Now that we have our helper function written, we need to add a __init__.py file to the helpers directory. This will cause Python to treat the folder it is inside of as a directory containing packages. In this use case, we can leave our __init__.py file empty.
With our helper function configured to be accessible by another script, can write the script we will be deploying to Promote. We will create another Python script, called main.py
At this point, our working directory looks like this:
db-muppetshow/
├── main.py
├── helpers
│ ├── __init__.py
│ └── dbconn.py
In the main.py script, we need to load the packages promote, re (for regex parsing), schema, and our helper script dbconn form our helpers folder.
import promote import re from schema import Schema # https://pypi.python.org/pypi/schema from helpers import dbconn
Next, as with any Promote deployment script, we need to provide our promote API information, and instantiate a Promote class.
# instanciate the Promote class with our API information USERNAME = "USERNAME" API_KEY = "YOUR_API_KEY" PROMOTE_URL = "http://promote.internal.alteryx.com/" p = promote.Promote(USERNAME, API_KEY, PROMOTE_URL)
We can add a line of code that validates the JSON input, and then a function that queries our database using our helper function, and returns a cleaned-up string (using the mighty powers of regex).
# validate JSON @promote.validate_json(Schema({'id': str}))
# query database and return species def promoteModel(data): user_info = dbconn.get_db_data(data.get('id')) # strip extra punctuation strip = re.sub(r'[\s\[\](),\'\']', "", str(user_info)) return {"Species": strip}
As the last step in this script, we need to add some test data, and then deploy the model.
# some test data TESTDATA = {"id": "Kermit"} print(promoteModel(TESTDATA)) # name and deploy our model p.deploy("TheMuppetsDBLookup", promoteModel, TESTDATA, confirm=True, verbose=0)
Before running our script, we need to add a requirements.txt file so that the Promote deployment will include the necessary packages. The contents of the requirement.txt file are as follows:
promote schema==0.6.5 psycopg2==2.7.6.1
The final structure of our directory looks like this:
db-muppetshow/ ├── main.py ├── helpers ├── __init__.py └── dbconn.py └── requirements.txt
And we are ready to deploy the model to promote by running our script. Once the model successfully builds on Promote, we can run a query from the Execute tab and see that it is able to successfully access the database.
This simple example of connecting a Promote model to a database can be leveraged in many different and exciting ways. I hope this has been a helpful jumping off point for your next Promote adventure.