Note that there are some explanatory texts on larger screens.

plurals
  1. PODistributed and replicated data storage for small amounts of data under Windows
    primarykey
    data
    text
    <p>We're looking for a good solution to a caching problem. We'd like to distribute a relatively small amount of data (perhaps 10's of GBs) among a cluster of web servers such that:</p> <ol> <li>The data is replicated to all nodes</li> <li>The data is persistent</li> <li>The data can be accessed locally</li> </ol> <p>Our motivation for a caching solution is that we currently have a single point of failure: a SQL Server database. We're unable to set up a fail-over cluster for this database, unfortunately. We're already using Memcached to a large extent, but we want to avoid the problem where if a Memcached node goes down, we'd suddenly have a large amount of cache misses and therefore experience a massive amount of requests to one endpoint. </p> <p>We'd prefer instead to have local persistent caches on each web server node so that the resulting load would be distributed. When a retrieval is made, it would pass through the following:</p> <ol> <li>Check for data in Memcached. If it's not there...</li> <li>Check for data in local persistent storage. If it's not there...</li> <li>Retrieve data from the database.</li> </ol> <p>When data changes, the cache key is invalidated at both caching layers.</p> <p>We've been looking at a bunch of potential solutions, but none of them seem to match exactly what we need:</p> <h2>CouchDB</h2> <p>This is pretty close; the data model we'd like to cache is very document-oriented. However, its replication model isn't exactly what we're looking for. It seems to me as though replication is an <em>action</em> you have to perform rather than a permanent <em>relationship among nodes</em>. You can set up continuous replication, but this doesn't persist between restarts.</p> <h2>Cassandra</h2> <p>This solution seems to be mostly geared toward those with large storage requirements. We have a large amount of users, but small amounts of data. Cassandra looks to be able to support <em>n</em> number of <em>fail-over nodes</em>, but 100% replication among nodes doesn't seem to be what it's intended for; instead, it seems more geared toward distribution only.</p> <h2>SAN</h2> <p>One attractive idea is that we can store a bunch of files on a SAN or similar type of appliance. I haven't worked with these before, but it seems like this would still be a single point of failure; if the SAN goes down, we'd suddenly be going to the database for all cache misses.</p> <h2>DFS Replication</h2> <p>A simple Google search revealed this. It seems to do what we want; it synchronizes files across all nodes in a replication cluster. But the marketing text makes it look like it's more of a system for ensuring documents are copied to different office locations. Also, it has limits, like a file count maximum, that wouldn't work well for us.</p> <p>Have any of you had similar requirements to ours and found a good solution that meets your needs?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload