Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I'm the author of the <a href="http://scikit-learn.org" rel="noreferrer">scikit-learn</a> <a href="http://scikit-learn.org/dev/modules/ensemble.html#gradient-boosting" rel="noreferrer">gradient boosting module</a>, a Gradient Boosted Regression Trees implementation in Python. I put some effort in optimizing prediction time since the method was targeted at low-latency environments (in particular ranking problems); the prediction routine is written in C, still there is some overhead due to Python function calls. Having said that: prediction time for single data points with ~50 features and about 250 trees should be &lt;&lt; 1ms.</p> <p>In my use-cases prediction time is often governed by the cost of feature extraction. I strongly recommend profiling to pin-point the source of the overhead (if you use Python, I can recommend <a href="http://packages.python.org/line_profiler/" rel="noreferrer">line_profiler</a>).</p> <p>If the source of the overhead is prediction rather than feature extraction you might check whether its possible to do batch predictions instead of predicting single data points thus limiting the overhead due to the Python function call (e.g. in ranking you often need to score the top-K documents, so you can do the feature extraction first and then run predict on the K x n_features matrix.</p> <p>If this doesn't help either you should try the limit the number of trees because the runtime cost for prediction is basically linear in the number of trees. There are a number of ways to limit the number of trees without affecting the model accuracy:</p> <ol> <li><p>Proper tuning of the learning rate; the smaller the learning rate, the more trees are needed and thus the slower is prediction.</p></li> <li><p>Post-process GBM with L1 regularization (Lasso); See <a href="http://www-stat.stanford.edu/~tibs/ElemStatLearn/" rel="noreferrer">Elements of Statistical Learning</a> Section 16.3.1 - use predictions of each tree as new features and run the representation through a L1 regularized linear model - remove those trees that don't get any weight.</p></li> <li><p>Fully-corrective weight updates; instead of doing the line-search/weight update just for the most recent tree, update all trees (see [Warmuth2006] and [Johnson2012]). Better convergence - fewer trees.</p></li> </ol> <p>If none of the above does the trick you could investigate cascades or early-exit strategies (see [Chen2012])</p> <p>References:</p> <p>[Warmuth2006] M. Warmuth, J. Liao, and G. Ratsch. Totally corrective boosting algorithms that maximize the margin. In Proceedings of the 23rd international conference on Machine learning, 2006.</p> <p>[Johnson2012] Rie Johnson, Tong Zhang, Learning Nonlinear Functions Using Regularized Greedy Forest, arxiv, 2012.</p> <p>[Chen2012] Minmin Chen, Zhixiang Xu, Kilian Weinberger, Olivier Chapelle, Dor Kedem, Classifier Cascade for Minimizing Feature Evaluation Cost, JMLR W&amp;CP 22: 218-226, 2012.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload