Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Sharding files, like you're doing, is a great way to avoid performance problems with having many files in one directory: in this method, you're ensuring that only a few entries (directories or files) are in any given directory. It's also easy to split across multiple volumes if you need to - you simply mount some of the high-level directories in different places.</p> <p>You should consider a couple things though. </p> <h2>Identity</h2> <p>If you're going for basically permanent storage of these images, you may want to shard based on id. This is a bit easier to deal with from the DB side (for the same reasons we use an arbitrary primary key in DB design). </p> <p>Like @Veger suggests: imageid <code>123456</code> becomes <code>/12/1234/123456.jpg</code>. </p> <h3>Security</h3> <p>Using dates, user ids or an auto-increment number may pose a security risk, though, in that they are relatively easy to guess and thus it's pretty easy for someone to harvest all images. </p> <p>Additionally, having the date in the URL potentially leaks information, if there is no reason for a user to know the upload date. </p> <p>If you're using a very hard-to-guess key, it provides some level of security against both harvesting and information leakage. For example, you could use a GUID: Image ID <code>6f33395e-eda8-4486-8b8e-51ea0f91751b</code> gets stored as <code>/6/6f33/6f33395e/6f33395e-eda8-4486-8b8e-51ea0f91751b.jpg</code>. </p> <p>There are a crazy high number of GUIDs (it's 128bits) and so it would likely take millions of years for someone to harvest everything (even if you don't take any extra steps like limiting connections per IP per hour etc).</p> <h3>Volatile images</h3> <p>If your images are volatile -- that is, they expire after some amount of time -- then it may actually be best to shard based on a date structure, eg <code>/2012/12/14/2012-12-14-hhmmss-userid.jpg</code>, or you can combine this with a guid and get <code>/2012/12/14/6f/6f33395e-eda8-4486-8b8e-51ea0f91751b.jpg</code>.</p> <p>If you want to delete all of 2011's files, you just <code>rm -rf 2011</code>. A great example of when you'd use this is for log files.</p> <p>You have to keep in mind that this only really makes sense for a <em>very</em> high number of images, because you can do a query in your database to find outdated images based on date, then just delete them one-by-one. </p> <h2>Granularity of shards</h2> <p>Use higher granularity of shards for the more images you plan to eventually store, but keep in mind that if you go too granular, you are going to lose a lot of overhead disk space to directory entries.</p> <p>The goal is to keep the number of entries per directory to something the filesystem can handle; good rule of thumb seems to be about 10,000 max. You have to predict the traffic your site will get for the next while. Don't go crazy though, thinking at some point that you maybe will have millions of users a day. It's not impossible to re-shard, but it's a pain. Predict your growth for the next couple years and handle that. If you grow faster and have to re-shard as a result, well, it's a nice problem to be solving. If you run out of disk space because your directory entries take up more room than your images, well, that's a stupid problem to deal with.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload