StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
13137344
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
2
CommunityOwnedDate
2012-10-30T10:46:08.057
CreationDate
2012-10-30T10:46:08.057
FavoriteCount
0
LastActivityDate
2012-10-30T10:59:32.723
LastEditDate
2012-10-30T10:59:32.723
LastEditorUserId
177614
OwnerUserId
177614
ParentId
11295755
PostTypeId
2
Score
14
ViewCount
0
LastEditorDisplayName
text
Body
I'm the author of the <a href="http://scikit-learn.org" rel="noreferrer">scikit-learn</a> <a href="http://scikit-learn.org/dev/modules/ensemble.html#gradient-boosting" rel="noreferrer">gradient boosting module</a>, a Gradient Boosted Regression Trees implementation in Python. I put some effort in optimizing prediction time since the method was targeted at low-latency environments (in particular ranking problems); the prediction routine is written in C, still there is some overhead due to Python function calls. Having said that: prediction time for single data points with ~50 features and about 250 trees should be << 1ms. In my use-cases prediction time is often governed by the cost of feature extraction. I strongly recommend profiling to pin-point the source of the overhead (if you use Python, I can recommend <a href="http://packages.python.org/line_profiler/" rel="noreferrer">line_profiler</a>). If the source of the overhead is prediction rather than feature extraction you might check whether its possible to do batch predictions instead of predicting single data points thus limiting the overhead due to the Python function call (e.g. in ranking you often need to score the top-K documents, so you can do the feature extraction first and then run predict on the K x n_features matrix. If this doesn't help either you should try the limit the number of trees because the runtime cost for prediction is basically linear in the number of trees. There are a number of ways to limit the number of trees without affecting the model accuracy: <ol> <li>Proper tuning of the learning rate; the smaller the learning rate, the more trees are needed and thus the slower is prediction.</li> <li>Post-process GBM with L1 regularization (Lasso); See <a href="http://www-stat.stanford.edu/~tibs/ElemStatLearn/" rel="noreferrer">Elements of Statistical Learning</a> Section 16.3.1 - use predictions of each tree as new features and run the representation through a L1 regularized linear model - remove those trees that don't get any weight.</li> <li>Fully-corrective weight updates; instead of doing the line-search/weight update just for the most recent tree, update all trees (see [Warmuth2006] and [Johnson2012]). Better convergence - fewer trees.</li> </ol> If none of the above does the trick you could investigate cascades or early-exit strategies (see [Chen2012]) References: [Warmuth2006] M. Warmuth, J. Liao, and G. Ratsch. Totally corrective boosting algorithms that maximize the margin. In Proceedings of the 23rd international conference on Machine learning, 2006. [Johnson2012] Rie Johnson, Tong Zhang, Learning Nonlinear Functions Using Regularized Greedy Forest, arxiv, 2012. [Chen2012] Minmin Chen, Zhixiang Xu, Kilian Weinberger, Olivier Chapelle, Dor Kedem, Classifier Cascade for Minimizing Feature Evaluation Cost, JMLR W&CP 22: 218-226, 2012.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POGradient boosting predictions in low-latency production environments?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USPeter Prettenhofer
UserOwnerUserId
1. USPeter Prettenhofer
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POGradient boosting predictions in low-latency production environments?
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COI had tried to find information about scikit-learn prediction times in July but couldn't, so I chose to use shallow learning algorithms instead (that I knew would meet the latency requirements). My use case was ranking bids in online advertising, so batch prediction would've helped, but decisioning for each impression still had to complete in less than 10ms. Re: your suggestions to reduce the # of trees, I tried 1, just read through 2, and will look into 3 -- thanks for the references/ideas. For a new project, I am considering Weka and scikit-learn, so your response was helpful.
 singulars
 PostPostId
 PO
 UserUserId
 USlockedoff

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.