| java.lang.Object org.apache.commons.math.stat.regression.SimpleRegression
SimpleRegression | public class SimpleRegression implements Serializable(Code) | | Estimates an ordinary least squares regression model
with one independent variable.
y = intercept + slope * x
Standard errors for intercept and slope are
available as well as ANOVA, r-square and Pearson's r statistics.
Observations (x,y pairs) can be added to the model one at a time or they
can be provided in a 2-dimensional array. The observations are not stored
in memory, so there is no limit to the number of observations that can be
added to the model.
Usage Notes:
- When there are fewer than two observations in the model, or when
there is no variation in the x values (i.e. all x values are the same)
all statistics return
NaN . At least two observations with
different x coordinates are requred to estimate a bivariate regression
model.
- getters for the statistics always compute values based on the current
set of observations -- i.e., you can get statistics, then add more data
and get updated statistics without using a new instance. There is no
"compute" method that updates all statistics. Each of the getters performs
the necessary computations to return the requested statistic.
version: $Revision: 348519 $ $Date: 2005-11-23 12:12:18 -0700 (Wed, 23 Nov 2005) $ |
Method Summary | |
public void | addData(double x, double y) Adds the observation (x,y) to the regression data set.
Uses updating formulas for means and sums of squares defined in
"Algorithms for Computing the Sample Variance: Analysis and
Recommendations", Chan, T.F., Golub, G.H., and LeVeque, R.J. | public void | addData(double[][] data) Adds the observations represented by the elements in
data .
(data[0][0],data[0][1]) will be the first observation, then
(data[1][0],data[1][1]) , etc. | public void | clear() Clears all data from the model. | public double | getIntercept() Returns the intercept of the estimated regression line.
The least squares estimate of the intercept is computed using the
normal equations.
The intercept is sometimes denoted b0. | public double | getInterceptStdErr() Returns the
standard error of the intercept estimate,
usually denoted s(b0). | public double | getMeanSquareError() Returns the sum of squared errors divided by the degrees of freedom,
usually abbreviated MSE. | public long | getN() Returns the number of observations that have been added to the model. | public double | getR() Returns
Pearson's product moment correlation coefficient,
usually denoted r. | public double | getRSquare() Returns the
coefficient of determination,
usually denoted r-square. | public double | getRegressionSumSquares() Returns the sum of squared deviations of the predicted y values about
their mean (which equals the mean of y).
This is usually abbreviated SSR or SSM. | public double | getSignificance() Returns the significance level of the slope (equiv) correlation. | public double | getSlope() Returns the slope of the estimated regression line. | public double | getSlopeConfidenceInterval() Returns the half-width of a 95% confidence interval for the slope
estimate. | public double | getSlopeConfidenceInterval(double alpha) Returns the half-width of a (100-100*alpha)% confidence interval for
the slope estimate.
The (100-100*alpha)% confidence interval is
(getSlope() - getSlopeConfidenceInterval(),
getSlope() + getSlopeConfidenceInterval())
To request, for example, a 99% confidence interval, use
alpha = .01
Usage Note:
The validity of this statistic depends on the assumption that the
observations included in the model are drawn from a
Bivariate Normal Distribution.
Preconditions:
- If there are fewer that three observations in the
model, or if there is no variation in x, this returns
Double.NaN . | public double | getSlopeStdErr() Returns the standard
error of the slope estimate,
usually denoted s(b1). | public double | getSumSquaredErrors() Returns the
sum of squared errors (SSE) associated with the regression
model.
Preconditions:
- At least two observations (with at least two different x values)
must have been added before invoking this method.
| public double | getTotalSumSquares() Returns the sum of squared deviations of the y values about their mean. | public double | predict(double x) Returns the "predicted" y value associated with the
supplied x value, based on the data that has been
added to the model when this method is activated.
predict(x) = intercept + slope * x
Preconditions:
- At least two observations (with at least two different x values)
must have been added before invoking this method.
|
SimpleRegression | public SimpleRegression()(Code) | | Create an empty SimpleRegression instance
|
addData | public void addData(double x, double y)(Code) | | Adds the observation (x,y) to the regression data set.
Uses updating formulas for means and sums of squares defined in
"Algorithms for Computing the Sample Variance: Analysis and
Recommendations", Chan, T.F., Golub, G.H., and LeVeque, R.J.
1983, American Statistician, vol. 37, pp. 242-247, referenced in
Weisberg, S. "Applied Linear Regression". 2nd Ed. 1985
Parameters: x - independent variable value Parameters: y - dependent variable value |
addData | public void addData(double[][] data)(Code) | | Adds the observations represented by the elements in
data .
(data[0][0],data[0][1]) will be the first observation, then
(data[1][0],data[1][1]) , etc.
This method does not replace data that has already been added. The
observations represented by data are added to the existing
dataset.
To replace all data, use clear() before adding the new
data.
Parameters: data - array of observations to be added |
clear | public void clear()(Code) | | Clears all data from the model.
|
getIntercept | public double getIntercept()(Code) | | Returns the intercept of the estimated regression line.
The least squares estimate of the intercept is computed using the
normal equations.
The intercept is sometimes denoted b0.
Preconditions:
- At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,
Double,NaN is
returned.
the intercept of the regression line |
getInterceptStdErr | public double getInterceptStdErr()(Code) | | Returns the
standard error of the intercept estimate,
usually denoted s(b0).
If there are fewer that three observations in the
model, or if there is no variation in x, this returns
Double.NaN .
standard error associated with intercept estimate |
getMeanSquareError | public double getMeanSquareError()(Code) | | Returns the sum of squared errors divided by the degrees of freedom,
usually abbreviated MSE.
If there are fewer than three data pairs in the model,
or if there is no variation in x , this returns
Double.NaN .
sum of squared deviations of y values |
getN | public long getN()(Code) | | Returns the number of observations that have been added to the model.
n number of observations that have been added. |
getR | public double getR()(Code) | | Returns
Pearson's product moment correlation coefficient,
usually denoted r.
Preconditions:
- At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,
Double,NaN is
returned.
Pearson's r |
getRSquare | public double getRSquare()(Code) | | Returns the
coefficient of determination,
usually denoted r-square.
Preconditions:
- At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,
Double,NaN is
returned.
r-square |
getRegressionSumSquares | public double getRegressionSumSquares()(Code) | | Returns the sum of squared deviations of the predicted y values about
their mean (which equals the mean of y).
This is usually abbreviated SSR or SSM. It is defined as SSM
here
Preconditions:
- At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,
Double.NaN is
returned.
sum of squared deviations of predicted y values |
getSignificance | public double getSignificance() throws MathException(Code) | | Returns the significance level of the slope (equiv) correlation.
Specifically, the returned value is the smallest alpha
such that the slope confidence interval with significance level
equal to alpha does not include 0 .
On regression output, this is often denoted Prob(|t| > 0)
Usage Note:
The validity of this statistic depends on the assumption that the
observations included in the model are drawn from a
Bivariate Normal Distribution.
If there are fewer that three observations in the
model, or if there is no variation in x, this returns
Double.NaN .
significance level for slope/correlation throws: MathException - if the significance level can not be computed. |
getSlope | public double getSlope()(Code) | | Returns the slope of the estimated regression line.
The least squares estimate of the slope is computed using the
normal equations.
The slope is sometimes denoted b1.
Preconditions:
- At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,
Double.NaN is
returned.
the slope of the regression line |
getSlopeConfidenceInterval | public double getSlopeConfidenceInterval() throws MathException(Code) | | Returns the half-width of a 95% confidence interval for the slope
estimate.
The 95% confidence interval is
(getSlope() - getSlopeConfidenceInterval(),
getSlope() + getSlopeConfidenceInterval())
If there are fewer that three observations in the
model, or if there is no variation in x, this returns
Double.NaN .
Usage Note:
The validity of this statistic depends on the assumption that the
observations included in the model are drawn from a
Bivariate Normal Distribution.
half-width of 95% confidence interval for the slope estimate throws: MathException - if the confidence interval can not be computed. |
getSlopeConfidenceInterval | public double getSlopeConfidenceInterval(double alpha) throws MathException(Code) | | Returns the half-width of a (100-100*alpha)% confidence interval for
the slope estimate.
The (100-100*alpha)% confidence interval is
(getSlope() - getSlopeConfidenceInterval(),
getSlope() + getSlopeConfidenceInterval())
To request, for example, a 99% confidence interval, use
alpha = .01
Usage Note:
The validity of this statistic depends on the assumption that the
observations included in the model are drawn from a
Bivariate Normal Distribution.
Preconditions:
- If there are fewer that three observations in the
model, or if there is no variation in x, this returns
Double.NaN .
(0 < alpha < 1) ; otherwise an
IllegalArgumentException is thrown.
Parameters: alpha - the desired significance level half-width of 95% confidence interval for the slope estimate throws: MathException - if the confidence interval can not be computed. |
getSlopeStdErr | public double getSlopeStdErr()(Code) | | Returns the standard
error of the slope estimate,
usually denoted s(b1).
If there are fewer that three data pairs in the model,
or if there is no variation in x, this returns Double.NaN .
standard error associated with slope estimate |
getSumSquaredErrors | public double getSumSquaredErrors()(Code) | | Returns the
sum of squared errors (SSE) associated with the regression
model.
Preconditions:
- At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,
Double,NaN is
returned.
sum of squared errors associated with the regression model |
getTotalSumSquares | public double getTotalSumSquares()(Code) | | Returns the sum of squared deviations of the y values about their mean.
This is defined as SSTO
here.
If n < 2 , this returns Double.NaN .
sum of squared deviations of y values |
predict | public double predict(double x)(Code) | | Returns the "predicted" y value associated with the
supplied x value, based on the data that has been
added to the model when this method is activated.
predict(x) = intercept + slope * x
Preconditions:
- At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,
Double,NaN is
returned.
Parameters: x - input x value predicted y value |
|
|