Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p><strong>First, the user interface:</strong> as user I <strong>hate</strong> to search a product in a catalog organized in a <em>strictly hierarchical</em> way. I never remember in what sub-sub-sub-sub...-category an "exotic" product is in and this force me to waste time exploring "promising" categories just to discover it is categorized in a (for me, at least) strange way.</p> <p>What <a href="https://stackoverflow.com/questions/1622528/database-structure-advice-needed/1695290#1695290">Kevin Peno</a> suggests is a good advice and is known as <strong><a href="http://en.wikipedia.org/wiki/Faceted_search" rel="nofollow noreferrer">faceted browsing</a></strong>. As <a href="http://www.gseis.ucla.edu/faculty/bates/" rel="nofollow noreferrer">Marcia Bates</a> wrote in <a href="http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/971/892" rel="nofollow noreferrer">After the Dot-Bomb: Getting Web Information Retrieval Right This Time</a>, " <em>.. faceted classification is to hierarchical classification as relational databases are to hierarchical databases. ..</em> ".</p> <p>In essence, faceted search allows users to search your catalog starting from whatever "facet" they prefer and let them filter information choosing other facets along the search. Note that, contrary to how tag systems are usually conceived, nothing prevents you to organize some of these facets hierarchically.</p> <p>To quickly understand what faceted search is all about, there are <strong><a href="http://flamenco.berkeley.edu/demos.html" rel="nofollow noreferrer">some demos</a></strong> to explore at <a href="http://flamenco.berkeley.edu/" rel="nofollow noreferrer">The Flamenco Search Interface Project - Search Interfaces that Flow</a>.</p> <p><strong>Second, the application logic:</strong> what <a href="https://stackoverflow.com/questions/1622528/database-structure-advice-needed/1687560#1687560">Manitra</a> proposes is also a good advice (as I understand it), i.e. separating <code>nodes</code> and <code>links</code> of a tree/graph in different relations. What he calls "ancestor table" (which is a much better intuitive name, however) is known as <a href="http://en.wikipedia.org/wiki/Transitive_closure#Uses" rel="nofollow noreferrer">transitive closure of a directed acyclic graph (DAG)</a> (reachability relation). Beyond performance, it simplify queries greatly, as Manitra said.</p> <p><strong>But</strong> I suggest a <a href="http://dev.mysql.com/doc/refman/5.4/en/create-view.html" rel="nofollow noreferrer">view</a> for such "ancestor table" (transitive closure), so that updates are in real-time and incremental, not periodical by a batch job. There is SQL code (but I think it needs to be adapted a little to specific DBMSes) in papers I mentioned in my answer to <a href="https://stackoverflow.com/questions/1692915/query-language-for-graph-sets-data-modeling-question/1693483#1693483">query language for graph sets: data modeling question</a>. In particular, look at <a href="http://www.comp.nus.edu.sg/~wongls/psZ/dlsw-ijit97-16.ps" rel="nofollow noreferrer">Maintaining Transitive Closure of Graphs in SQL</a> (.ps - postscript).</p> <p><strong>Products-Categories relationship</strong></p> <p>The first point of Manitra is worth of emphasis, also.</p> <p>What he is saying is that between products and categories there is a many-to-many relationship. I.e.: each product can be in one or more categories and in each category there can be zero or more products.</p> <p>Given relation variables (relvars) Products and Categories such relationship can be represented, for example, as a relvar PC with at least attributes P# and C#, i.e. product and category numbers (identifiers) in a foreign-key relationships with corresponding Products and Categories numbers.</p> <p>This is complementary to management of categories' hierarchies. Of course, this is only a design sketch.</p> <p><strong>On faceted browsing in SQL</strong></p> <p>A useful concept to implement "faceted browsing" is <a href="http://en.wikipedia.org/wiki/Relational_algebra#Division" rel="nofollow noreferrer">relational division</a>, or, even, <a href="http://www.dbdebunk.com/page/page/772076.htm" rel="nofollow noreferrer">relational comparisons</a> (see bottom of linked page). I.e. dividing PC (Products-Categories) by a (growing) list of categories chosen from a user (facet navigation) one obtains only products in such categories (of course, categories are presumed <strong>not</strong> all mutually exclusive, otherwise choosing two categories one will obtain zero products).</p> <p>SQL-based DBMS usually lack this operators (division and comparisons), so I give below some interesting papers that implement/discuss them:</p> <ul> <li><a href="http://fie-conference.org/fie2003/papers/1057.pdf" rel="nofollow noreferrer">ON MAKING RELATIONAL DIVISION COMPREHENSIBLE</a> (.pdf from <a href="http://fie-conference.org/fie2003/sessions/F2C.htm" rel="nofollow noreferrer">FIE 2003 Session Index</a>);</li> <li><a href="http://www.jise.appstate.edu/Issues/13/085.pdf" rel="nofollow noreferrer">A simpler (and better) SQL approach to relational division</a> (.pdf from Journal of Information Systems Education - <a href="http://www.jise.appstate.edu/Contents/Contents-13-2.htm" rel="nofollow noreferrer">Contents Volume 13, Number 2 (2002)</a>);</li> <li><a href="http://citeseer.ist.psu.edu/old/691302.html" rel="nofollow noreferrer">Processing frequent itemset discovery queries by division and set containment join operators</a>;</li> <li><a href="http://www.informatik.uni-stuttgart.de/cgi-bin/NCSTRL/NCSTRL_view.pl?id=TR-2005-08&amp;inst=AS&amp;mod=&amp;engl=1" rel="nofollow noreferrer">Laws for Rewriting Queries Containing Division Operators</a>;</li> <li><a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.7.9785" rel="nofollow noreferrer">Algorithms and Applications for Universal Quantification in Relational Databases</a>;</li> <li><a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.3013" rel="nofollow noreferrer">Optimizing Queries with Universal Quantification in Object-Oriented and Object-Relational Databases</a>;</li> <li>(ACM access required) <a href="http://portal.acm.org/citation.cfm?id=1065178" rel="nofollow noreferrer">On the complexity of division and set joins in the relational algebra</a>;</li> <li>(ACM access required) <a href="http://portal.acm.org/citation.cfm?id=210202" rel="nofollow noreferrer">Fast algorithms for universal quantification in large databases</a>;</li> </ul> <p>and so on...</p> <p>I will not go into details here but interaction between categories hierarchies and facet browsing needs special care.</p> <p><strong>A digression on "flatness"</strong></p> <p>I briefly looked at the article linked by <a href="https://stackoverflow.com/questions/1622528/database-structure-advice-needed/1622692#1622692">Pras</a>, <a href="http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/" rel="nofollow noreferrer">Managing Hierarchical Data in MySQL</a>, but I stopped reading after these few lines in the introduction:</p> <blockquote> <p><strong>Introduction</strong></p> <p>Most users at one time or another have dealt with hierarchical data in a SQL database and no doubt learned that the management of hierarchical data is not what a relational database is intended for. The tables of a relational database are not hierarchical (like XML), but are simply a <strong>flat list</strong>. Hierarchical data has a parent-child relationship that is not naturally represented in a relational database table. ...</p> </blockquote> <p>To understand why this insistence on flatness of relations is <strong>just nonsense</strong>, imagine a cube in a <a href="http://en.wikipedia.org/wiki/Cartesian_coordinate_system#Cartesian_coordinates_in_three_dimensions" rel="nofollow noreferrer">three dimensional Cartesian coordinate system</a>: it will be identified by 8 coordinates (triplets), say P1(x1,y1,z1), P2(x2,y2,z2), ..., P8(x8, y8, z8) [here we are not concerned with constraints on these coordinates so that they represent really a cube].</p> <p>Now, we will put these set of coordinates (points) into a relation variable and we will name this variable <code>Points</code>. We will <strong>represent</strong> the relation value of <code>Points</code> as a table below:</p> <pre> <b>Points</b>| <b>x</b> | <b>y</b> | <b>z</b> | =======+====+====+====+ | x1 | y1 | z1 | +----+----+----+ | x2 | y2 | z2 | +----+----+----+ | .. | .. | .. | | .. | .. | .. | +----+----+----+ | x8 | y8 | z8 | +----+----+----+ </pre> <p>Does this cube is being "flattened" by the mere act of representing it in a tabular way? Is a relation (value) the same thing as its tabular representation?</p> <p>A relation variable assumes as values sets of points in a n-dimensional discrete space, where n is the number of relation attributes ("columns"). What does it mean, for a n-dimensional discrete space, to be "flat"? Just nonsense, as I wrote above.</p> <p>Don't get me wrong, It is certainly true that SQL is a badly designed language and that SQL-based DBMSes are full of idiosyncrasies and shortcomings (NULLs, redundancy, ...), especially the bad ones, the DBMS-as-dumb-store type (no referential constraints, no integrity constrains, ...). But that has nothing to do with relational data model fantasized limitations, on the contrary: more they turn away from it and worse is the outcome.</p> <p>In particular, the relational data model, once you understand it, poses no problem in representing whatever structure, even hierarchies and graphs, as I detailed with references to published papers mentioned above. Even SQL can, if you gloss over its deficiencies, missing something better.</p> <p><strong>On the "The Nested Set Model"</strong></p> <p>I skimmed the rest of <a href="http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/" rel="nofollow noreferrer">that article</a> and I'm not particularly impressed by such logical design: it suggests to muddle two different entities, <strong>nodes</strong> and <strong>links</strong>, into one relation and this will probably cause awkwardness. But I'm not inclined to analyze that design more thoroughly, sorry.</p> <hr> <p><strong>EDIT:</strong> Stephan Eggermont objected, in comments below, that " <em>The flat list model is a problem. It is an abstraction of the implementation that makes performance difficult to achieve. ...</em> ".</p> <p>Now, my point is, precisely, that:</p> <ol> <li>this "flat list model" is a <strong>fantasy</strong>: just because one lay out (represents) relations as tables ("flat lists") does not mean that relations are "flat lists" (an "object" and its representations are not the same thing);</li> <li>a logical representation (relation) and physical storage details (horizontal or vertical decompositions, compression, indexes (hashes, b+tree, r-tree, ...), clustering, partitioning, etc.) are distinct; one of the points of relational data model (<strong>RDM</strong>) is to decouple logical from "physical" model (with advantages to both users and implementors of DBMSes);</li> <li>performance is a direct consequence of physical storage details (implementation) and <strong>not</strong> of logical representation (Eggermont's comment is a classic example of <a href="http://www.inconcept.com/JCM/May2002/pascal.html" rel="nofollow noreferrer">logical-physical confusion</a>).</li> </ol> <p>RDM model does not constraint implementations in any way; one is free to implement tuples and relations as one see fit. Relations are <strong>not necessarily</strong> files and tuples are <strong>not necessarily</strong> records of a file. Such correspondence is a dumb <strong>direct-image implementation</strong>.</p> <p>Unfortunately SQL-based DBMS implementations <strong>are</strong>, too often, dumb direct-image implementations and they suffer poor performance in a variety of scenarios - <a href="http://en.wikipedia.org/wiki/Online_analytical_processing" rel="nofollow noreferrer">OLAP</a>/<a href="http://en.wikipedia.org/wiki/Extract,_transform,_load" rel="nofollow noreferrer">ETL</a> products exist to cover these shortcomings.</p> <p>This is slowly changing. There are commercial and free software/open source implementations that finally avoid this fundamental pitfall:</p> <ul> <li><a href="http://www.vertica.com/" rel="nofollow noreferrer">Vertica</a>, which is a commercial successor of..</li> <li><a href="http://db.csail.mit.edu/projects/cstore/" rel="nofollow noreferrer">C-Store: A Column-Oriented DBMS</a>;</li> <li><a href="http://monetdb.cwi.nl/" rel="nofollow noreferrer">MonetDB</a>;</li> <li><a href="http://www.luciddb.org/" rel="nofollow noreferrer">LucidDB</a>;</li> <li><a href="http://www.kx.com/" rel="nofollow noreferrer">Kdb</a> in a way;</li> <li>an so on...</li> </ul> <p>Of course, the point is <strong>not</strong> that there must exist an "optimal" physical storage design, but that whatever physical storage design can be abstracted away by a nice <em>declarative language</em> based on relational algebra/calculi (and SQL is a <strong>bad</strong> example) or more directly on a logic programming language (like Prolog, for example - see my answer to "<a href="https://stackoverflow.com/questions/724377/prolog-to-sql-converter/1682486#1682486">prolog to SQL converter</a>" question). A good DBMS should be change physical storage design on-the-fly, based on data access statistics (and/or user hints).</p> <p>Finally, in Eggermont's comment the statement " <em>The relational model is getting squeeezed between the cloud and prevayler.</em> " is another nonsense but I cannot give a rebuttal here, this comment is already too long.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload