Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>[in advance, apologies to the group for using part of the response space for meta-like matters]</p> <p>From the OP, Lars D:<br> <em>I do not consider [this] answer to be an answer to the question, because it does not bring me closer to a solution. I know what cloud computing is, and I know that the algorithm can be perfectly split into more than 300,000 servers if needed, although the extra costs wouldn't give much extra performance because of network latency.</em></p> <p>Lars,<br> I sincerely apologize for reading and responding to your question at a naive and generic level. I hope you can see how both the lack of specifity in the question itself, particularly in its original form, and also the somewhat unusual nature of the problem (1) would prompt me respond to the question in like fashion. This, and the fact that such questions on SO typically emanate from hypotheticals by folks who have put but little thought and research into the process, are my excuses for believing that I, a non-practionner [of <em>massively</em> distributed systems], could help your quest. The many similar responses (some of which had the benefits of the extra insight you provided) and also the many remarks and additional questions addressed to you show that I was not alone with this mindset.</p> <p>(1) Unsual problem: An [apparently] mostly computational process (no mention of distributed/replicated storage structures), very highly paralellizable (1,500 servers), into fifty-millisecondish-sized tasks which collectively provide a sub-second response (? for human consumption?). And yet, a process that would only be required a few times [daily..?].</p> <p>Enough looking back!<br> In <strong>practical terms</strong>, you may consider some of the following <strong>to help improve this SO question</strong> (or move it to other/alternate questions), and hence foster the help from <strong>experts in the domain</strong>.</p> <ul> <li>re-posting as a distinct (more specific) question. In fact, probably several questions: eg. on the [likely] poor latency and/or overhead of mapreduce processes, on the current prices (for <em>specific</em> TOS and volume details), on the rack-awareness of distributed processes at various vendors etc.</li> <li>Change the title</li> <li>Add details about the process you have at hand (see many questions in the notes of both the question and of many of the responses)</li> <li>in some of the questions, add tags specific to a give vendor or technique (EC2, Azure...) as this my bring in the possibly not quite unbuyist but helpful all the same, commentary from agents at these companies</li> <li>Show that you understand that your quest is somewhat of a tall order</li> <li>Explicitly state that you wish responses from effective practionners of the underlying technologies (maybe also include folks that are "getting their feet wet" with these technologies as well, since with the exception of the physics/high-energy folks and such, who BTW traditionnaly worked with clusters rather than clouds, many of the technologies and practices are relatively new)</li> </ul> <p>Also, I'll be pleased to take the hint from you (with the implicit non-veto from other folks on this page), to delete my response, if you find that doing so will help foster better responses. </p> <p>-- original response--</p> <p>Warning: <strong>Not all processes or mathematical calculations can readily be split in individual pieces that can then be run in parallel...</strong></p> <p>Maybe you can check Wikipedia's entry from <a href="http://en.wikipedia.org/wiki/Cloud_computing" rel="nofollow noreferrer"><strong>Cloud Computing</strong></a>, understanding that cloud computing is however not the only architecture which allows parallel computing.</p> <p>If your process/calculation can efficitively be chunked in parallelizable pieces, maybe you can look into <a href="http://hadoop.apache.org/" rel="nofollow noreferrer"><strong>Hadoop</strong></a>, or other implementations of <a href="http://en.wikipedia.org/wiki/MapReduce" rel="nofollow noreferrer"><strong>MapReduce</strong></a>, for an general understanding about these parallel processes. Also, (and I believe utilizing the same or similar algorithms), there also exist commercially available frameworks such as <strong>EC2</strong> from <a href="http://aws.amazon.com/" rel="nofollow noreferrer">amazon</a>.</p> <p>Beware however that the above systems are not particularly well suited for very quick response time. They fare better with hour long (and then some) data/number crunching and similar jobs, rather than minute long calculations such as the one you wish to parallelize so it provides results in 1/10 second.</p> <p>The above frameworks are generic, in a sense that they could run processes of most any nature (again, the ones that can at least in part be chunked), but there also exist various offerings for specific applications such as searching or DNA matching etc. The search applications in particular can have very short response times (cf Google for example) and BTW this is in part tied to fact that such jobs can very easily and quickly be chunked for parallel processing.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload