Note that there are some explanatory texts on larger screens.

plurals
  1. PO"Aborted_clients" on Rails 3.2, mysql2 gem, Amazon RDS (mysql)
    primarykey
    data
    text
    <p>Two weeks ago we migrated our database from a postgres instance hosted on the web server to a mysql instance hosted on Amazon RDS. </p> <p>After the migration, we started experiencing "Lost connection during query" errors on about 0.5% of requests, causing the web request to fail. Prior to the migration (when we were on Postgres), we had never seen this issue before.</p> <p>There is a lot of documentation on the "Lost connection during query" error from mysql as well as other communities online. The issue is not reproducible in dev or staging, only production. The issue occurs relatively rarely in production but seems to occur much more frequently when hitting a fairly complex page in ActiveAdmin that runs a number of queries on one page (an admin dashboard). </p> <p>Here is the list of things I've tried thus far to no avail:</p> <p>1) 'wait_timeout' variable - this doesn't seem applicable as it isn't related to connection pool connections dying after 8 hours. I can restart the database and rails app and reproduce the issue within 50 requests by hitting the active admin dashboard page. That said, I've increased the wait_timeout on mysql and rebooted the Amazon RDS instance to no avail.</p> <p>2) 'reconnect = true' in database.yml. I suspect this helps mask the issue as Aborted_clients grows sometimes without corresponding failures on the rails frontend, I assume because connection pool retries the connection on failures in some cases and gracefully recovers.</p> <p>3) disabled delayed_job - thinking perhaps delayed job was corrupting entries, I disabled it and reproduced the issue again.</p> <p>4) Upgrade Amazon RDS instance from small to medium - the DB is under very light load but thought this may be a factor so I upgraded the Amazon RDS instance from 'small' to 'medium' with no luck.</p> <p>5) Simplifying the queries on the dashboard page have made that page less likely to reproduce the error, but hasn't eliminated the error entirely.</p> <p>6) In instances where the error occurs, the error occurs immediately - not after a period of 3-5 seconds, leading me to believe it isn't a read/write/connection timeout.</p> <p>7) Rebooting EC2 web server doesn't help</p> <p>8) Rebooting Amazon RDS database instance doesn't help</p> <p>9) Neither the web server nor the Amazon RDS instance are under any significant load</p> <p>10) Both web server and RDS DB instance are in the same availability zone</p> <p>I feel like the connection pool entries in Rails may be getting left in a bad state by a prior query, causing the next attempt on that connection to fail immediately, but have no way to prove that.</p> <p>I also see that Rails 4.0 has a new Reaper concept that seems to check the pool entries on a regular basis for dead connections - leading me to think maybe this is a wider spread problem that they are now fixing in Rails 4.0. </p> <p>Because this is only reproducible in our production environment at this point, I cannot move the mysql instance off of RDS to the web server to isolate it being RDS (remote DB instance) or not.</p> <p>Thoughts from the Stack Overflow community on what to try next?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload