StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
13885153
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2012-12-14T19:33:25.900
FavoriteCount
0
LastActivityDate
2014-01-19T20:04:50.513
LastEditDate
2014-01-19T20:04:50.513
LastEditorUserId
7913
OwnerUserId
7913
ParentId
13884883
PostTypeId
2
Score
5
ViewCount
0
LastEditorDisplayName
text
Body
<p>Sharding files, like you're doing, is a great way to avoid performance problems with having many files in one directory: in this method, you're ensuring that only a few entries (directories or files) are in any given directory. It's also easy to split across multiple volumes if you need to - you simply mount some of the high-level directories in different places.</p> <p>You should consider a couple things though. </p> <h2>Identity</h2> <p>If you're going for basically permanent storage of these images, you may want to shard based on id. This is a bit easier to deal with from the DB side (for the same reasons we use an arbitrary primary key in DB design). </p> <p>Like @Veger suggests: imageid <code>123456</code> becomes <code>/12/1234/123456.jpg</code>. </p> <h3>Security</h3> <p>Using dates, user ids or an auto-increment number may pose a security risk, though, in that they are relatively easy to guess and thus it's pretty easy for someone to harvest all images. </p> <p>Additionally, having the date in the URL potentially leaks information, if there is no reason for a user to know the upload date. </p> <p>If you're using a very hard-to-guess key, it provides some level of security against both harvesting and information leakage. For example, you could use a GUID: Image ID <code>6f33395e-eda8-4486-8b8e-51ea0f91751b</code> gets stored as <code>/6/6f33/6f33395e/6f33395e-eda8-4486-8b8e-51ea0f91751b.jpg</code>. </p> <p>There are a crazy high number of GUIDs (it's 128bits) and so it would likely take millions of years for someone to harvest everything (even if you don't take any extra steps like limiting connections per IP per hour etc).</p> <h3>Volatile images</h3> <p>If your images are volatile -- that is, they expire after some amount of time -- then it may actually be best to shard based on a date structure, eg <code>/2012/12/14/2012-12-14-hhmmss-userid.jpg</code>, or you can combine this with a guid and get <code>/2012/12/14/6f/6f33395e-eda8-4486-8b8e-51ea0f91751b.jpg</code>.</p> <p>If you want to delete all of 2011's files, you just <code>rm -rf 2011</code>. A great example of when you'd use this is for log files.</p> <p>You have to keep in mind that this only really makes sense for a <em>very</em> high number of images, because you can do a query in your database to find outdated images based on date, then just delete them one-by-one. </p> <h2>Granularity of shards</h2> <p>Use higher granularity of shards for the more images you plan to eventually store, but keep in mind that if you go too granular, you are going to lose a lot of overhead disk space to directory entries.</p> <p>The goal is to keep the number of entries per directory to something the filesystem can handle; good rule of thumb seems to be about 10,000 max. You have to predict the traffic your site will get for the next while. Don't go crazy though, thinking at some point that you maybe will have millions of users a day. It's not impossible to re-shard, but it's a pain. Predict your growth for the next couple years and handle that. If you grow faster and have to re-shard as a result, well, it's a nice problem to be solving. If you run out of disk space because your directory entries take up more room than your images, well, that's a stupid problem to deal with.</p>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POHow to store many images on a webserver?
  singulars
  PostTypePostTypeId
  PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USgregmac
UserOwnerUserId
1. USgregmac
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POHow to store many images on a webserver?
  singulars
  PostTypePostTypeId
  PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
  singulars
  PostPostId
  PO
  UserUserId
  This table or related slice is empty.
  VoteTypeVoteTypeId
  VTAcceptedByOriginator
2. VO
  singulars
  PostPostId
  PO
  UserUserId
  This table or related slice is empty.
  VoteTypeVoteTypeId
  VTUpMod
3. VO
  singulars
  PostPostId
  PO
  UserUserId
  This table or related slice is empty.
  VoteTypeVoteTypeId
  VTUpMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.