StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>To expand slightly on Arnon's answer, Hadoop has been recognized as a force that is not going away and is gaining increasing traction in organizations, many times via grassroots efforts from developers. MPP databases are good at answering questions that we know about at design time such as "How many transactions do we get per hour by country?". </p> <p>Hadoop started as a platform for a new type of developer that lives somewhere between analysts and developers, one who can write code but also understands data analysis and machine learning. MPP databases (column or not) are very poor at serving this type of developer who often is analyzing unstructured data, using algorithms that require too much CPU power to run in a database or datasets which are too large. The sheer amount of CPU power required to build some models makes running these algorithms in any sort of traditional sharded DB impossible.</p> <p>My personal pipeline using hadoop typically looks like:</p> <ol> <li>Run a number of very large global queries in Hadoop to get a basic feel for the data and the distribution of variables.</li> <li>Use Hadoop to build a smaller dataset with just the data I am interested in.</li> <li>Export the smaller dataset into a relational DB.</li> <li>Run lots of small queries on the relational db, build excel sheets, sometimes do a little R.</li> </ol> <p>Bear in mind that this workflow only works for the "analyst developer" or "data scientist". Others mileage will vary.</p> <p>Coming back to your question due to people like me abandoning their tools these companies are looking for ways to remain relevant in an age where Hadoop is synonymous with big data, the coolest startups and cutting edge technology (whether this is earned or not you may discuss amongst yourselves.) Also many Hadoop installations are an order of magnitude or more larger than an organizations MPP deployments, meaning more data is being retained for longer in Hadoop.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload