StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
17941042
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2013-07-30T07:40:55.613
FavoriteCount
0
LastActivityDate
2013-07-30T07:40:55.613
LastEditDate
LastEditorUserId
0
OwnerUserId
1973299
ParentId
17935978
PostTypeId
2
Score
3
ViewCount
0
LastEditorDisplayName
text
Body
Short Answer: I'm not sure there's a hard-lined way to answer this. You mentioned files being from 1KB to 1GB.. I wouldn't store binary data in a DB if it's going to anywhere near 1KB, let along 1GB. I may store a few bytes of binary data in a DB if it's incidental, but any large amount of data, especially that doesn't need to be searched, should be stored in the filesystem: When you store data in a DB, you're storing it on a filesystem anyway, you've just added another layer (the DB) to the mix. There's a cost to this layer, so there ought to be a benefit to make up the difference. If you're storing the data so that you can search based on it or join it to other data, then this makes sense. But file data, binary or not, is typically not used in that way. Example Implementation: There are better methods to distribute file data than to enter it into a DB, such as a distributed filesystems (check into GlusterFS, MooseFS, both of which will scale by simply adding additional hard drives, whereas MySQL will not). Typically, I'll store file data in the filesystem using an SHA1 hash of the data as the name of the file. If the hash is 98a75af529f07b1ef7be7400f51344b9f07b1ef7, then I'll store it in this directory structure: <pre><code>./98/a7/98a75af529f07b1ef7be7400f51344b9f07b1ef7 </code></pre> That is, a top-level directory made up of the first two characters, a second-level directory made up of the second two characters, and then finally the file with the name of the total string. In this way, I can literally have billions of files without having so many in a single directory that the system is too slow to function. Then I create a DB table with these columns to hold the meta data: <ul> <li>file_id, an auto_increment field</li> <li>created, a field with a default value of current_timestamp</li> <li>prev_id, more on this below</li> <li>hash, the SHA1 hash on the filesystem</li> <li>name, a textual name of the file (such as the original name that the file would have taken on disk.</li> </ul> When I need a hierarchical directory structure, I would also create a directory table and add a dir_id to the list of columns above. If I edit the file represented by <code>./98/a7/98a75af529f07b1ef7be7400f51344b9f07b1ef7</code>, I don't actually change that file on disk, I create a new one (because the new file contents would be represented by a new SHA1 hash), and create a new entry in the files table where prev_id equals the file_id of the file I edited. In other words, I now have versioning. If I need this to be available in a distributed fashion, I setup MySQL replication and then use GlusterFS to replicate he filesystem across multiple servers.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POStoring large files / binary data in a mysql database: when is it ok?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USNick Coons
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COMany thanks for this, it's a very good answer. I particularly like your suggested file storage method. It would also be nice to use a distributed filesystem as you mentioned, but unfortunately we are not working with dedicated servers (for the time being at least). I need to try and make this as compatible as possible with typical shared web hosting setups, which of course makes it difficult to use anything which needs installing on the server. But again, thank you for the elegant file system method :)
 singulars
 PostPostId
 PO
 UserUserId
 USAlfie

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.