StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POWhich DB would you use? MongoDB/Neo4j/SQL... all of them?
primarykey
Id
14096884
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
2012-12-31T03:08:08.627
CommentCount
3
CommunityOwnedDate
CreationDate
2012-12-31T02:46:30.753
FavoriteCount
5
LastActivityDate
2012-12-31T15:50:32.153
LastEditDate
2017-05-23T11:56:42.883
LastEditorUserId
-1
OwnerUserId
82609
ParentId
0
PostTypeId
1
Score
2
ViewCount
737
LastEditorDisplayName
text
Body
I'd like to know which choices you would do for my usecase. It's about building a social webapp where each user has its personal filesystem. <hr> Specification <ul> <li>Users all have their own filesystem</li> <li>Files metadata look like unstructured documents </li> <li>Files content are sent to Amazon S3</li> <li>Users can create directories and files in this filesystem</li> <li>Users can share a single directory with other users (like unix)</li> <li>Some directories can be set as public (shared with all users)</li> <li>Users can search for content (their own content, public content, and shared content)</li> <li>Users can bookmark directories or files</li> <li>Performances and scalability should be ok</li> </ul> <hr> For now, we choose MongoDB for some reasons <ul> <li>The unstructured nature of files</li> <li>Advices of someone who already used it</li> <li>I accepted to contribute to this project to discover new technologies with a real usecase</li> <li>The ability to index JSON documents in ElasticSearch for scalable text search.</li> </ul> <hr> MongoDB needs denormalization (and ElasticSearch too) The pain comes directly from the relational part between directories: each directory refers to its parent directory with a parentId attribute. This means when a directory is bookmarked and accessed, the breadcrumb should be available. Without denormalization of the breadcrumb, this leads to an expensive recursion. When doing a search query for content, it is the same: I'd like the breadcrumb of the directory to be available directly in the document (actually, I use the same parser to get back my object from ElasticSearch and MongoDB since both are using JSON/BSON). So denormalization works fine until a user move one of its root directories, under which there are thousands of subcategories: the subcategories breadcrumbs should be updated -> MongoDB doesn't really help for consistency here and it is kind of hard to maintain this denormalized breadcrumb up to date. <hr> Graph databases seems appropriate to build a filesystem structure, but what about scalability? I don't know so much about graph databases like Neo4J or Titan... but would it help to build the filesystem structure? As far as I know graphs are not good for distribution, and having the directories of a user distributed doesn't seem good for breadcrumb computation. But users have their own filesystem, which is a single/isolated graph. This means that perhaps I could create, and shard, a graph database per user? But then what about permissions for shared directories? Where should I store them? Anyway, in my search engine I still need to have a denormalized breadcrumb for files metadata (at least if I keep using ElasticSearch). And it is hard to denormalize all the shared directory permissions, so that a user can search on a subset of the content of another user. It seems hard to index a graph for search anyway: <a href="https://stackoverflow.com/questions/9970193/how-to-store-tree-data-in-a-lucene-solr-elasticsearch-index-or-a-nosql-db">How to store tree data in a Lucene/Solr/Elasticsearch index or a NoSQL db?</a> <hr> MongoDB is perhaps not a good choice to store structured and nearly static content like users Another thing that matters is consistency. When creating a new user, I need to create 8 root directories. These root directories are not subdocuments of the user document. So how should I create these directories during user creation? MongoDB doesn't have transactions so how can I be sure that the 9 inserts are done atomically (user + 8 directories). It wouldn't be nice for us to have a user created with half of its directories. It wouldn't be very nice to have an async job and a flag on user document to check directories are created... So a traditionnal SQL database (free) seems nice for consistency, to store user related data. Scalability can be done using partitioning at the application level like it is done by Facebook or Tumblr. User related data can be colocated to the same instance to be able to perform some joins: for exemple, on the user's filesystem... And I know SQL and multitenancy strategies. <hr> So in the end, I'm totally lost into this NoSQL/SQL world. I just wonder if you could help me make a choice for this usecase? I'm not trying to over optimize, just to see what we may need to do in the future. Does someone know any company that is doing something similar? Some thing I think about is using an hybrid solution, where for exemple we store structured data in MySQL/PosgreSQL, the files metadata in MongoDB, directories in (? don't know), and when a user connects, we could cache its whole filesystem graph using an embedded Neo4J database (assuming the size of a graph is big but acceptable) Does it seem a nice idea?
Tags
<mysql><scala><mongodb><neo4j>
Title
Which DB would you use? MongoDB/Neo4j/SQL... all of them?
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USCommunity
UserOwnerUserId
1. USSebastien Lorber
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POWhich DB would you use? MongoDB/Neo4j/SQL... all of them?
 UserUserId
 UShafichuk
 VoteTypeVoteTypeId
 VTFavorite
2. VO
 singulars
 PostPostId
 POWhich DB would you use? MongoDB/Neo4j/SQL... all of them?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POWhich DB would you use? MongoDB/Neo4j/SQL... all of them?
 UserUserId
 USSantosh Gokak
 VoteTypeVoteTypeId
 VTFavorite
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.