Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>The real problem with <em>estimation</em> of time taken by a process is the <strong>quantification of the workload</strong>. Once you can quantify that, you can made a <em>better</em> estimate</p> <h2>Examples of good estimates</h2> <ul> <li><p>File system I/O or network transfer. Whether or not file systems have bad performance, you can get to know in advance, you can <strong>quantify</strong> the total number of bytes to be processed <strong>and</strong> you can <strong>measure</strong> the speed. Once you have these, and once you can monitor how many bytes have you transferred, you get a good estimate. Random factors may affect your estimate (i.e. an application starts meanwhile), but you still get a significative value</p></li> <li><p>Encryption on large streams. For the reasons above. Even if you are computing a MD5 hash, you always know how many blocks have been processed, how many are to be processed and the total.</p></li> <li><p>Item synchronization. This is a little trickier. <strong>If you can assume</strong> that the per-unit workload is constant or you can make a good estimate of the time required to process an item when variance is low or insignificant, then you can make another good estimate of the process. Pick email synchronization: if you don't know the byte size of the messages (otherwise you fall in case 1) but common practice tells that the majority of emails have quite the same size, then you can use the <em>mean</em> of the time taken to download/upload all processed emails to estimate the time taken to process a single email. This won't work in 100% of the cases and <strong>is</strong> subject to error, but you still see progress bar <em>progressing</em> on a large account</p></li> </ul> <p>In general the rule is that you can make a good estimate of ETC/ETA (ETA is actually the date and time the operation is expected to complete) if you have a homogeneous process about of which you know the <strong>numbers</strong>. Homogeneity grants that the time to process a work item is comparable to others, i.e. the time taken to process a previous item <strong>can</strong> be used to estimate future. Numbers are used to make correct calculations.</p> <h2>Examples of bad estimates</h2> <ul> <li><p>Operations on a number of files of unknown size. This time you know only how many files you want to process (e.g. to download) but you don't know their size in advance. Once the size of the files has a high <em>variance</em> you see troubles. Having downloaded half of the file, when these were the smallest and sum up to 10% of total bytes, can be said being halfway? No! You just see the progress bar growing fast to 50% and then much slowly</p></li> <li><p>Heterogenous processes. E.g. Windows installations. As pointed out by @HansPassant, Windows installations provide a worse-than-bad estimate. Installing a Windows software involves several processes including: file copy (this can be estimated), registry modifications (usually never estimated), execution of transactional code. The real problem is the last. Transactional processes involving execution of custom installer code are discusses below</p></li> <li><p>Execution of <strong>generic</strong> code. This <strong>can never be estimated</strong>. A code fragment involves <strong>conditional</strong> statements. The execution of these involve changing paths depending on a condition external to the code. This means, for example, that a program behaves differently whether you have a printer installed or not, whether you have a local or a domain account, etc.</p></li> </ul> <h2>Conclusions</h2> <p>Estimating the duration of a software process isn't both an <strong>impossible</strong> and an <strong>exact</strong>/*<em>deterministic</em>* task.</p> <ul> <li><p>It's not impossible because, even in the case of code fragments, you can either find a model for your code (pick a LU factorization as an example, this may be estimated). Or you might redesign your code splitting it into an estimation phase - where you first determine the branch conditions - and an execution phase, where all pre-determined branches are taken. I said might because this task is in practice impossible: most code determines branches as effects of previous conditions, meaning that estimating a branch actually involves running the code. Chicken and egg circle</p></li> <li><p>It's not a deterministic process. Computer systems, especially if <strong>multitasking</strong> are affected by a number of random factors that may impact on your estimated process. You will never get a correct estimate before running your process. At most, you can detect external factors and re-estimate your process. The <code>fork</code> between your estimate and the real duration of process is mathematically converging to <strong>zero</strong> when you get closer to process end (lim [x->N] |est(N) - real(N)| == 0, where N is the process duration)</p></li> </ul>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload