Note that there are some explanatory texts on larger screens.

plurals
  1. POModel formula for glm regression with user-specified family
    primarykey
    data
    text
    <p><strong>Background</strong></p> <p>I am trying to predict the sales for a product line (y_test in the sample at the end). Its sales for a timeperiod are based on all previous sales of another product (x_test) and how many of those previous sales are still being used. It is not possible to directly measure the number of those previously-sold products that are still in use though, so a survival curve needs to be inferred.</p> <p>For instance, if you make accessories for a particular smartphone model, the accessory sales are at least partly based on the number of those smartphones still being used. (This is not homework, BTW.)</p> <p><strong>Details</strong></p> <p>I have some time series data and would like to fit a regression model using <code>glm</code> or something similar. The relationship between the dependent and independent variable is this: <img src="https://i.stack.imgur.com/SVNZg.gif" alt="regression formula"></p> <p>Where p is the time period, y<sub>p</sub> is the dependent variable, x<sub>p</sub> is the independent variable, c<sub>0</sub> and c<sub>1</sub> are the regression coefficients, F<sub>t</sub> is a cumulative distribution function (such as <code>pgamma</code>), and e<sub>p</sub> are the residuals.</p> <p>Through the first three time periods, the function would expand to something like this:</p> <pre><code>#y[1] = c0 + c1*(x[1]*(1-integrate(function(q) {pgamma(q, c2, c2/c3)}, 0, 1)$value)) #y[2] = c0 + c1*(x[1]*(1-integrate(function(q) {pgamma(q, c2, c2/c3)}, 1, 2)$value) + x[2]*(1-integrate(function(q) {pgamma(q, c2, c2/c3)}, 0, 1)$value)) #y[3] = c0 + c1*(x[1]*(1-integrate(function(q) {pgamma(q, c2, c2/c3)}, 2, 3)$value) + x[2]*(1-integrate(function(q) {pgamma(q, c2, c2/c3)}, 1, 2)$value) + x[3]*(1-integrate(function(q) {pgamma(q, c2, c2/c3)}, 0, 1)$value)) </code></pre> <p><strong>So, I have historical data for x<sub>p</sub> and y<sub>p</sub>, and I want to get the values for the coefficients/parameters c<sub>0</sub>, c<sub>1</sub>, c<sub>2</sub>, and c<sub>3</sub> that minimize the residuals.</strong></p> <p><strong>I think the solution is to use <code>glm</code> and create a custom family, but I'm not sure how to do that.</strong> I looked at the code for the <code>Gamma</code> family but didn't get very far. I have been able to do the optimization "manually" using <code>nlminb</code>, but I would much prefer the simplicity and usefulness (i.e. <code>predict</code> and others) offered by <code>glm</code> or similar functions.</p> <p>Here is some example data:</p> <pre><code># Survival function (the integral part): fsurv&lt;-function(q, par) { l&lt;-length(q) out&lt;-vapply(1:l, function(i) {1-integrate(function(x) {pgamma(x, par[1], par[1]/par[2])}, q[i]-1, q[i])$value}, FUN.VALUE=0) return(out)} # Sum up the products: frevsumprod &lt;- function(x,y) { l &lt;- length(y) out &lt;- vapply(1:l, function(i) sum(x[1:i]*rev(y[1:i])), FUN.VALUE=0) return(out)} # Sample data: p&lt;-1:24 # Number of periods x_test&lt;-c(1188, 2742, 4132) # Sample data y_test&lt;-c(82520, 308910, 749395, 801905, 852310, 713935, 624170, 603960, 640660, 553600, 497775, 444140) # Sample data c&lt;-c(-50.161147,128.787437,0.817085,13.845487) # Coefficients and parameters, from another method that fit the data # Pad the data to the correct length: pad&lt;-function(p,v,padval=0) { l&lt;-length(p) padv&lt;-l-length(v) if(padv&gt;0) (v&lt;-c(v,rep(padval,padv))) return(v) } x_test&lt;-pad(p,x_test) y_test&lt;-pad(p,y_test,NA) y_fitted&lt;-c[0+1]+c[1+1]*frevsumprod(x_test,fsurv(p,c[(2:3)+1])) # Fitted values from regression library(ggplot2) ggplot(data.frame(p,y_test,y_fitted))+geom_point(aes(p,y_test))+geom_line(aes(p,y_fitted)) # Plot actual and fit </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload