StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
7308218
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
7
CommunityOwnedDate
CreationDate
2011-09-05T12:38:52.787
FavoriteCount
0
LastActivityDate
2017-10-12T09:24:43.730
LastEditDate
2017-10-12T09:24:43.730
LastEditorUserId
2457
OwnerUserId
2457
ParentId
7307634
PostTypeId
2
Score
101
ViewCount
0
LastEditorDisplayName
text
Body
<h2>Should I just start with a few gen_servers with a supervisor and incrementally build on that?</h2> <p>You're missing one key component in Erlang architectures here: applications! (That is, the concept of OTP applications, not software applications).</p> <p>Think of applications as components. A component in your system solves a particular problem, is responsible for a coherent set of resources or abstract something important or complex from the system.</p> <p>The first step when designing an Erlang system is to decide which applications are needed. Some can be pulled from the web as they are, these we can refer to as libraries. Others you'll need to write yourself (otherwise you wouldn't need this particular system). These applications we usually refer to as the business logic (often you need to write some libraries yourself as well, but it is useful to keep the distinction between the libraries and the core business applications that tie everything together).</p> <h2>How many supervisors should I have?</h2> <p>You should have one supervisor for each kind of process you want to monitor. </p> <p>A bunch of identical temporary workers? One supervisor to rule them all.</p> <p>Different process with different responsibilities and restart strategies? A supervisor for each different type of process, in a correct hierarchy (depending on when things should restart and what other process needs to go down with them?).</p> <p>Sometimes it is okay to put a bunch of different process types under the same supervisor. This is usually the case when you have a few singleton processes (e.g. one HTTP server supervisor, one ETS table owner process, one statistics collector) that will always run. In that case, it might be too much cruft to have one supervisor for each, so it is common to add the under one supervisor. Just be aware of the implications of using a particular restart strategy when doing this, so you don't take down your statistics process for example, in case your web server crashes (<code>one_for_one</code> is the most common strategy to use in cases like this). Be careful not to have any dependencies between processes in a <code>one_for_one</code> supervisor. If a process depends on another crashed process, it can crash as well, triggering the supervisors' restart intensity too often and crash the supervisor itself too soon. This can be avoided by having two different supervisors, which would completely control the restarts by the configured intensity and period (<a href="https://github.com/ThomasArts/tricky-supervisor" rel="noreferrer">longer explanation</a>).</p> <h2>How do I decide which parts of the system should be process-based?</h2> <p>Every concurrent activity in your system should be in it's own process. Having the wrong abstraction of concurrency is the most common mistake by Erlang system designers in the beginning.</p> <p>Some people are not used to deal with concurrency; their systems tend to have too little of it. One process, or a few gigantic ones, that runs everything in sequence. These systems are usually full of code smell and the code is very rigid and hard to refactor. It also makes them slower, because they may not use all the cores available to Erlang.</p> <p>Other people immediately grasp the concurrency concepts but fail to apply them optimally; their systems tend to overuse the process concept, making many process stay idle waiting for others that are doing work. These systems tend to be unnecessarily complex and hard to debug.</p> <p>In essence, in both variants you get the same problem, you don't use all the concurrency available to you and you don't get the maximum performance out of the system.</p> <p>If you stick to the <a href="http://www.codinghorror.com/blog/2007/03/curlys-law-do-one-thing.html" rel="noreferrer">single responsibility principle</a> and abide by the rule to have a process for every <em>truly</em> concurrent activity in your system, you should be okay.</p> <p>There are valid reasons to have idle processes. Sometimes they keep important state, sometimes you want to keep some data temporarily and later discard the process, sometimes they wait on external events. The bigger pitfall is to pass important messages through a long chain of largely inactive processes, as it will slow down your system with lots of copying and use more memory.</p> <h2>How should I avoid bottlenecks?</h2> <p>Hard to say, depends very much on your system and what it's doing. Generally though, if you have a good division of responsibility between applications you should be able to scale the application that appears to be the bottleneck separately from the rest of the system.</p> <p>The golden rule here is to <em>measure, measure, measure</em>! Don't think you have something to improve until you've measured.</p> <p>Erlang is great in that it allows you to hide concurrency behind interfaces (known as implicit concurrency). For example, you use a functional module API, a normal <code>module:function(Arguments)</code> interface, that could in turn spawn thousands of processes without the caller having to know that. If you got your abstractions and your API right, you can always parallelize or optimize a library after you've started using it.</p> <p>That being said, here are some general guide lines:</p> <ul> <li>Try to send messages to the recipient directly, avoid channeling or routing messages through intermediary processes. Otherwise the system just spends time moving messages (data) around without really working.</li> <li>Don't overuse the OTP design patterns, such as gen_servers. In many cases, you only need to start a process, run some piece of code, and then exit. For this, a gen_server is overkill.</li> </ul> <p>And one bonus advice: don't reuse processes. Spawning a process in Erlang is so cheap and quick that it doesn't make sense to re-use a process once its lifetime is over. In some cases it might make sense to re-use state (e.g. complex parsing of a file) but that is better canonically stored somewhere else (in an ETS table, database etc.).</p> <h2>Should I add logging later?</h2> <p>There's some basic logging functionality in Erlang/OTP already, the <a href="http://erlang.org/doc/man/error_logger.html" rel="noreferrer">error logger</a>. Together with <a href="http://erlang.org/doc/man/sasl_app.html" rel="noreferrer">SASL</a> (System Architecture Support Libraries) you can get up and running with logging in no-time.</p> <p>When the time comes (and if you've abstracted the logging API from the beginning) you could exchange this for something that better fits your needs. The de-facto 3rd party logging library today is <a href="https://github.com/basho/lager" rel="noreferrer">Basho's Lager</a>.</p> <h2>What is the general approach to Erlang/OTP distributed fault-tolerant multiprocessors systems architecture?</h2> <p>To summarize what's been said above:</p> <ul> <li>Divide your system into applications</li> <li>Put your processes in the correct supervision hierarchy, depending on their needs and dependencies</li> <li>Have a process for every truly concurrent activity in your system</li> <li>Maintain a functional API towards the other components in the system. This lets you: <ul> <li>Refactor your code without changing the code that's using it</li> <li>Optimize code afterwards</li> <li>Distribute your system when needed (just make a call to another node behind the API! The caller won't notice!)</li> <li>Test the code more easily (less work setting up test harnesses, easier to understand how to use it)</li> </ul></li> <li>Start using the libraries available to you in OTP until you need something different (you'll know, when the time comes)</li> </ul> <p>Common pitfalls:</p> <ul> <li>Too many processes</li> <li>Too few processes</li> <li>Too much routing (forwarded messages, chained processes)</li> <li>Too few applications (I've never seen the opposite case, actually)</li> <li>Not enough abstraction (makes it hard to refactor and reason about. It also makes it hard to test!)</li> </ul>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POHow do you design the architecture of an Erlang/OTP-based distributed fault-tolerant multicore system?
  singulars
  PostTypePostTypeId
  PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USAdam Lindberg
UserOwnerUserId
1. USAdam Lindberg
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POHow do you design the architecture of an Erlang/OTP-based distributed fault-tolerant multicore system?
  singulars
  PostTypePostTypeId
  PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
  singulars
  PostPostId
  PO
  UserUserId
  This table or related slice is empty.
  VoteTypeVoteTypeId
  VTUpMod
2. VO
  singulars
  PostPostId
  PO
  UserUserId
  This table or related slice is empty.
  VoteTypeVoteTypeId
  VTUpMod
3. VO
  singulars
  PostPostId
  PO
  UserUserId
  This table or related slice is empty.
  VoteTypeVoteTypeId
  VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.