StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
1665625
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
21
CommunityOwnedDate
CreationDate
2009-11-03T06:42:17.550
FavoriteCount
0
LastActivityDate
2014-02-04T11:34:51.013
LastEditDate
2014-02-04T11:34:51.013
LastEditorUserId
759866
OwnerUserId
166686
ParentId
1665613
PostTypeId
2
Score
8
ViewCount
0
LastEditorDisplayName
text
Body
[in advance, apologies to the group for using part of the response space for meta-like matters] From the OP, Lars D: I do not consider [this] answer to be an answer to the question, because it does not bring me closer to a solution. I know what cloud computing is, and I know that the algorithm can be perfectly split into more than 300,000 servers if needed, although the extra costs wouldn't give much extra performance because of network latency. Lars, I sincerely apologize for reading and responding to your question at a naive and generic level. I hope you can see how both the lack of specifity in the question itself, particularly in its original form, and also the somewhat unusual nature of the problem (1) would prompt me respond to the question in like fashion. This, and the fact that such questions on SO typically emanate from hypotheticals by folks who have put but little thought and research into the process, are my excuses for believing that I, a non-practionner [of massively distributed systems], could help your quest. The many similar responses (some of which had the benefits of the extra insight you provided) and also the many remarks and additional questions addressed to you show that I was not alone with this mindset. (1) Unsual problem: An [apparently] mostly computational process (no mention of distributed/replicated storage structures), very highly paralellizable (1,500 servers), into fifty-millisecondish-sized tasks which collectively provide a sub-second response (? for human consumption?). And yet, a process that would only be required a few times [daily..?]. Enough looking back! In practical terms, you may consider some of the following to help improve this SO question (or move it to other/alternate questions), and hence foster the help from experts in the domain. <ul> <li>re-posting as a distinct (more specific) question. In fact, probably several questions: eg. on the [likely] poor latency and/or overhead of mapreduce processes, on the current prices (for specific TOS and volume details), on the rack-awareness of distributed processes at various vendors etc.</li> <li>Change the title</li> <li>Add details about the process you have at hand (see many questions in the notes of both the question and of many of the responses)</li> <li>in some of the questions, add tags specific to a give vendor or technique (EC2, Azure...) as this my bring in the possibly not quite unbuyist but helpful all the same, commentary from agents at these companies</li> <li>Show that you understand that your quest is somewhat of a tall order</li> <li>Explicitly state that you wish responses from effective practionners of the underlying technologies (maybe also include folks that are "getting their feet wet" with these technologies as well, since with the exception of the physics/high-energy folks and such, who BTW traditionnaly worked with clusters rather than clouds, many of the technologies and practices are relatively new)</li> </ul> Also, I'll be pleased to take the hint from you (with the implicit non-veto from other folks on this page), to delete my response, if you find that doing so will help foster better responses. -- original response-- Warning: Not all processes or mathematical calculations can readily be split in individual pieces that can then be run in parallel... Maybe you can check Wikipedia's entry from <a href="http://en.wikipedia.org/wiki/Cloud_computing" rel="nofollow noreferrer">Cloud Computing</a>, understanding that cloud computing is however not the only architecture which allows parallel computing. If your process/calculation can efficitively be chunked in parallelizable pieces, maybe you can look into <a href="http://hadoop.apache.org/" rel="nofollow noreferrer">Hadoop</a>, or other implementations of <a href="http://en.wikipedia.org/wiki/MapReduce" rel="nofollow noreferrer">MapReduce</a>, for an general understanding about these parallel processes. Also, (and I believe utilizing the same or similar algorithms), there also exist commercially available frameworks such as EC2 from <a href="http://aws.amazon.com/" rel="nofollow noreferrer">amazon</a>. Beware however that the above systems are not particularly well suited for very quick response time. They fare better with hour long (and then some) data/number crunching and similar jobs, rather than minute long calculations such as the one you wish to parallelize so it provides results in 1/10 second. The above frameworks are generic, in a sense that they could run processes of most any nature (again, the ones that can at least in part be chunked), but there also exist various offerings for specific applications such as searching or DNA matching etc. The search applications in particular can have very short response times (cf Google for example) and BTW this is in part tied to fact that such jobs can very easily and quickly be chunked for parallel processing.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POHow to create a system with 1500 servers that deliver results instantaneously?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USBenjamin
UserOwnerUserId
1. USmjv
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.