Note that there are some explanatory texts on larger screens.

plurals
  1. POFast insertion of pandas DataFrame into Postgres DB using psycopg2
    primarykey
    data
    text
    <p>I am trying to insert a <a href="http://pandas.pydata.org" rel="nofollow noreferrer">pandas</a> DataFrame into a Postgresql DB (9.1) in the most efficient way (using Python 2.7).<br> Using "cursor.execute_many" is really slow, so is "DataFrame.to_csv(buffer,...)" together with "copy_from".<br> I found an already much! faster solution on the web (<a href="http://eatthedots.blogspot.de/2008/08/faking-read-support-for-psycopgs.html" rel="nofollow noreferrer">http://eatthedots.blogspot.de/2008/08/faking-read-support-for-psycopgs.html</a>) which I adapted to work with pandas. <br> My code can be found below.<br> My question is whether the method of this related question (using "copy from stdin with binary") can be easily transferred to work with DataFrames and if this would be much faster.<br> <a href="https://stackoverflow.com/questions/8144002/use-binary-copy-table-from-with-psycopg2">Use binary COPY table FROM with psycopg2</a><br> Unfortunately my Python skills aren't sufficient to understand the implementation of this approach.<br> This is my approach:</p> <pre><code> import psycopg2 import connectDB # this is simply a module that returns a connection to the db from datetime import datetime class ReadFaker: """ This could be extended to include the index column optionally. Right now the index is not inserted """ def __init__(self, data): self.iter = data.itertuples() def readline(self, size=None): try: line = self.iter.next()[1:] # element 0 is the index row = '\t'.join(x.encode('utf8') if isinstance(x, unicode) else str(x) for x in line) + '\n' # in my case all strings in line are unicode objects. except StopIteration: return '' else: return row read = readline def insert(df, table, con=None, columns = None): time1 = datetime.now() close_con = False if not con: try: con = connectDB.getCon() ###dbLoader returns a connection with my settings close_con = True except psycopg2.Error, e: print e.pgerror print e.pgcode return "failed" inserted_rows = df.shape[0] data = ReadFaker(df) try: curs = con.cursor() print 'inserting %s entries into %s ...' % (inserted_rows, table) if columns is not None: curs.copy_from(data, table, null='nan', columns=[col for col in columns]) else: curs.copy_from(data, table, null='nan') con.commit() curs.close() if close_con: con.close() except psycopg2.Error, e: print e.pgerror print e.pgcode con.rollback() if close_con: con.close() return "failed" time2 = datetime.now() print time2 - time1 return inserted_rows </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload