Alteryx Designer Ideas

Share your Designer product ideas - we're listening!
Check out our powerful new search update! You can read more here. Please let us know if you have any feedback by creating a topic here.

Bulk Loader Capability for Postgres SQL

Hi,

 

     Currently loading large files to Postgres SQL(over 100 MB) takes an extremely long time. For example writing a 1GB file to Postgres SQL takes 27 minutes! This is serious impacting our ability to use Alteryx as an ETL tool for loading our target Postgres Data Warehouse. We would really like to see the bulk load capacity to Postgres supported by Alteryx to help alleviate the performance issues.

 

Thanks,

Vijaya

39 Comments
Atom

Hi,

 

If Alteryx doesn't help us, we can use Python to improve time :

import psycopg2
import os
from datetime import datetime
from simple_timer import Timer
import time

try:
    pg_user = "user"
    pg_pass = "pass"
    pg_host = "pg_address"
    pg_port = "pg_port"
    pg_dbname = "database"
    pg_schema = "schema"
    pg_table = '"table"'

    file_csv = 'path_to_csv_file'

    pg_connect = "user='%s' password='%s' host='%s' port='%s' dbname='%s'"%(pg_user, pg_pass, pg_host, pg_port, pg_dbname)

    # Log start
    m_timer = Timer()

    # Database connection
    connection = psycopg2.connect(pg_connect)

    cursor = connection.cursor()

    # Print PostgreSQL version
    cursor.execute("SELECT version();")
    record = cursor.fetchone()
    print("You are connected to : ", record,"\n")

    # SET schema
    cursor.execute("SET search_path TO %s" % pg_schema)

    # TRUNCATE Table
    pg_SQL = 'TRUNCATE %s;' % pg_table
    cursor.execute(pg_SQL)
    connection.commit()
    print("Exec OK : %s" % pg_SQL)

    # Open CSV & Skip first line
    print("Ouverture fichier CSV : %s" % file_csv)
    csvfile = open(file_csv, 'r')
    # Skip First Line (si entête)
    next(csvfile)
    
    # Import et Commit
    cursor.copy_from(csvfile, pg_table, sep=';')
    connection.commit()

    # Trace fin
    time.sleep(0.001)
    m_timer.stop()

    print("Job did in {:3f} sec.".format(m_timer.duration))

except (Exception, psycopg2.Error) as error :
    print ("Error while connecting to PostgreSQL", error)
finally:
    #closing database connection.
        if(connection):
            cursor.close()
            connection.close()
            print("PostgreSQL connection is closed")
Asteroid

Can we please get an update where this bulk loader is on the roadmap?

 

Regards,

Alexis

Alteryx
Alteryx
Status changed to: Accepted
 
Alteryx
Alteryx
Status changed to: Accepted
 
Alteryx
Alteryx

@alexisjensen we cannot commit to timelines but I hope we can start the development on this (PG bulkloader) early 2020, it is literally a next item on the roadmap. Please, stay tuned, we should have some good news relatively soon.

Good news !

Waiting for it, the Python workaround given by @Yuges81 works very well.

@bora_perusic Great to hear Bora thank you! 

Excellent news!

 

Python workaround is good but will be so much better for our maintenance having this bulk load capability.

Atom

+1