Note that there are some explanatory texts on larger screens.

plurals
  1. POPython peewee insert multiple records into multiple tables at once
    text
    copied!<p><strong>Overview</strong><br> I have a relational MySQL InnoDB database. I currently have code in place to add data to my database, using the peewee ORM in Python 2.6.6. The problem is that because of all of my foreign keys, I end up doing five <code>SELECT</code> and (worst case) five <code>INSERT</code> statements per data point. As you can imagine, when I try to add 5,000,000 or so data points (40 MiB or so of data) it takes an incredibly long time, due to all of the transactions.</p> <p><strong>Example</strong><br> Here is a dummy example, to demonstrate what I am trying to do:</p> <pre><code>import csv import peewee as pw db = pw.MySQLDatabase(example, **{'passwd': 'example', 'host': 'example', 'port': 3306, 'user': 'example'}) class BaseModel(pw.Model): class Meta: database = db class Users(BaseModel): User = pw.PrimaryKeyField(db_column = 'User_ID') User_Name = pw.CharField(db_column = 'User_Name', max_length = 50) class Meta: db_table = 'users' class Pets(BaseModel): Pets = pw.PrimaryKeyField(db_column = 'Pet_ID') Pet_Name = pw.CharField(db_column = 'Pet_Name', max_length = 50) User = pw.ForeignKeyField(db_column = 'User_ID', rel_model = Users) class Meta: db_table = 'pets' def add_measurement(user_name, pet_name): # Add user try: self.dbo_users = Users.get(Users.User_Name == user_name) except Users.DoesNotExist: self.dbo_users = Users.create(User_Name = user_name) # Add pet and link to user try: self.dbo_pets = Pets.get(Pets.User == self.dbo_users.User, Pets.Pet_Name == pet_name) except Pets.DoesNotExist: self.dbo_pets = Pets.create(User = self.dbo_users.User, Pet_Name = pet_name) db.connect() example_data_file = r'C:\users_pets.csv' # Add all data in CSV file to database, assume first row == header, all other # rows are data, with the format of: user_name, pet_name with open(example_data_file, 'rb') as f: reader = csv.reader(f) reader.next() # Skip header for row in reader: add_measurement(row[0], row[1]) </code></pre> <p><strong>Problem</strong><br> In that basic example, each row in the input file uses two <code>SELECT</code> statements and up to two <code>INSERT</code> statements. For very large files, it will take a significant amount of time to put all of this data into the database.</p> <p>I would like some way to read in chunks of the input file and then perform one <code>INSERT</code> statement to mass dump all of the data into the database. As I will not know all of the FK relationships I will still have to do all of those <code>SELECT</code> statements; however, because all of my FKs are just auto-incremented, I could manually track them.</p> <p>I'd ideally like to do this with peewee; however, I'm also open to a pure SQL solution. I was thinking I could build up all of the data I'm trying to insert in memory and then dump it all at once. Using this approach, I would also need to check the memory contents in addition to the database for any FK relationships.</p> <p>Basically, I'm looking for the "fastest" way to take a large amount of data and dump it into the database. Any input is greatly appreciated, I'm pretty much stuck, at the moment.</p> <p><strong>Solution</strong><br> I was able to come up with a solution that appears to be 2,000X+ faster, than the above implementation, to the client. I ended up building a CSV file with all of the data, copying that to the server, and then using <code>LOAD DATA INFILE</code> to store that data into a temporary table. While this technically does not have the data in the database, correctly, it allows the user to quickly dump data in, without having to wait for it to finish. I then have a stored procedure to take care of the actual data insertion. This solution is a bit convoluted, but it does work well.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload