Note that there are some explanatory texts on larger screens.

plurals
  1. POAggregate functions, Lack of Types, Entropy in Query problems on NoSQL DBs
    primarykey
    data
    text
    <p>While we believe that NoSQL Databases have come to fill a number of gaps which are challenging on the side of RDBMS, i have had several challenges over time with NoSQL DBs in the area of their query eco-system. <br><br> <a href="http://www.couchbase.com" rel="nofollow noreferrer">Couchbase</a> for example, like its mother <a href="http://couchdb.apache.org/" rel="nofollow noreferrer">CouchDB</a> have had major improvements in reading data using views, lists, Key lookups, map reduce, e.t.c. Couchbase has even <a href="http://www.couchbase.com/press-releases/unql-query-language" rel="nofollow noreferrer">moved to create</a> an <a href="http://www.couchbase.com/communities/n1ql" rel="nofollow noreferrer">SQL-like query engine</a> for their huge 2.X verson. MongoDB has also made serious improvements and complex queries are possible on it and many other NoSQL DB developments going on out there.<br><br> Most <a href="https://stackoverflow.com/q/2351040/431620">NoSQL DBs can perform Complex queries</a> based on <code>LOGICAL and COMPARISON OPERATORS</code> e.g. <code>AND</code>, <code>OR</code>,<code>==</code> e.t.c However, aggergation and performing complex relations on data are a problem on my part. For example, in CouchDB and/or Couchbase, Views span only a single DB. It is not possible to write a view which will aggregate data from two or more databases.<br><br> Let me now get to the problem. Functions (whether aggregate or not): <code>AVG</code>, <code>SUM</code>, <code>ROUND</code>,<code>TRUNC</code>,<code>MAX</code>, <code>MIN</code>, e.t.c The lack of data types makes it impossible to efficiently work with Date and Times hence the lack of Date and time functions e.g. <code>TO_DATE</code>,<code>SYSDATE</code> (for system date/time), <code>ADD_MONTHSs</code>, <code>DATE BETWEEN</code>, <code>DATE/TIME format Conversion</code> e.t.c. It is true, that many will say that , they lack Schemas, types and stuff, but, i have found myself not running away from the need for atleast any one of the functions listed up there. For example because NoSQL DBs have no Date/Time data type, it is hard to perform queries based on those, because you might want to analyse trends based on time. Also, others have tried to use UNIX/EPOC Time stamps and stuff to solve this but it aint a <code>single size fits all</code> solution.<br><br> Map Reduce can be used to attain aggregation to a certain (small) degree, but the overhead has been realised to be great. However, the lack of <code>GROUP BY</code> functionality makes it a straineous solution to filter through what ou want. Look at the query below: <br></p> <pre> SELECT doc.field1, doc.field3, SUM(doc.field2 + doc.field4) FROM couchdb.my_database GROUP BY doc.field1, doc.field3 HAVING SUM(doc.field2 + doc.field4) > 20000; </pre> <p>This is not very easy to attain on CouchDB or Couchbase. i am not sure if its possible on MongoDB. I wish it were possible out of the box. This has made it difficult to use NoSQL as a Data warehouse or OLTP/OLAP solution. I found that, each time a complex analysis needs to be made, one needs to do it in the middle ware by paging through different datasets. Now, most experienced Guys (e.g. <a href="https://cloudant.com/" rel="nofollow noreferrer"><code>CLOUDANT</code></a>) have tweaked <a href="http://lucene.apache.org/" rel="nofollow noreferrer"><code>LUCENE</code></a> to perform complex queries, but because it was initially meant for indexing and text search, it has not solved the lack of <code>FUNCTIONS</code> and <code>DATA AGGREGATION</code> on most NoSQL DBs.<br><br> Because of lack of <code>FUNCTIONS</code>, most NoSQL DBs have the <code>NULL</code> data type but lack the option of converting <code>NULL</code> Objects to something else, like it is in some RDBMS. For example in Oracle, i could: <code>NVL(COLUMN,0)</code> in order to include all the rows while performing say an <code>AVG</code> calculation on a given column (since say, by default the null columns will not be counted/included in the query processing). <br><br> To fully understand the problem, CouchDB views for example operate within the scope of a doc like this below: <pre> function(doc){ // if statements, logical operators, comparison operators // e.t.c here. until you do am emit of that doc // if it satisfies the conditions set // emit(null, doc) OR emit(doc.x,[doc.y, doc.z]) e.t.c. // you can only emit javascript data types anyways emit(doc.field1,doc) } </pre> The docs which satisfy the filters, are let through and go onto the next stage or to a reduce function. Imagine a doc structure like this below: </p> <pre> { x: '', y: '', z: { p: '', n: N // integer or number data type }, date: 'DD/MON/YYYY' // date format } </pre> <p>Now, lets imagine the possibility of this kind of query:</p> <pre> function(){ var average = select AVG(doc.z.n) from couchdb.my_database; var Result = select doc.x,doc.y from couchdb.my_database where doc.z.n > average and doc.y = 'some string' and doc.date between '01-JUN-2012' and '03-AUG-2012'; emit(Result); } </pre> <p>OR if this query were possible: <pre> function(){ var latest = select MAX(doc.date) from couchdb.my_database; var Result = select doc.x,doc.z.p,MONTHS_BETWEEN(doc.date,latest) as "Months_interval" from couchdb.my_database where doc.y like '%john%' order by doc.z.p; emit(Result); } </pre> <b>Qn 1: </b> Which NoSQL Database solution has attained to a great degree, the query capability being talked about in the details above ? what key features make it stand out ? <br><br> <b>Qn 2: </b> Is the lack of a Schema, or the characteristic of being <code>Key-Value</code> a reason for the lack of <code>FUNCTIONS</code> in Querying these Databases ? What is the reason for the lack of Aggregate functionality in most NoSQL DBs ? <br><br> <b>Qn 3: </b> If the query ability above is possible in any of the NoSQL DBs, show how the last two (2) query problems above can be attained using the existing NoSQL infrastracture (consider any NoSQL technology of your choice)</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload