|
|
(10 intermediate revisions by 3 users not shown) |
Line 1: |
Line 1: |
− | [[Category:Doc Status D]] <!-- For Lumina use, do not change --> | + | [[Category: Generalized Regression library functions]] |
− | [[Category:Data Analysis Functions]]
| |
| | | |
− | Logistic regression is a techique for predicting a Bernoulli (i.e., 0,1-valued) random variable from a set of continuous dependent variables. See the [http://en.wikipedia.org/wiki/Logistic_regression Wikipedia article on Logistic regression] for a simple description. Another generalized logistic model that can be used for this purpose is the [[Probit_Regression]] model. These differ in functional form, with the logistic regression using a logit function to link the linear predictor to the predicted probability, while the probit model uses a cumulative normal for the same.
| + | The [[Logistic_Regression]] function is obsolete, and has been replaced by the [[LogisticRegression]] function. Please see [[LogisticRegression]]. |
| | | |
− | = Logistic_Regression( Y,B,I,K ) =
| + | The old [[Logistic_Regression]] function (with the underscore) is implemented as a [[User-Defined Function]] in the |
| + | ([[media:Generalized Regression.ana|Generalized Regression library]]). It requires the Analytica [[Optimizer]] edition to use. It still exists to support legacy models. |
| | | |
− | (''Requires Analytica Optimizer'')
| + | The newer [[LogisticRegression]] function is available in all editions of Analytica. It exists in [[Analytica 4.5]] and up. |
| | | |
− | The Logistic_regression function returns the best-fit coefficients, c, for a model of the form
| + | To convert a legacy model to use the newer version, simply remove the underscores -- the parameter order is the same. |
| | | |
− | [[Image:LogisticEq.jpg]] | + | ==History== |
− | <!--<math>
| + | In [[Analytica 4.5]], this library function [[Logistic_Regression]]() has been superseded by the built-in [[LogisticRegression]] function that does not require the Optimizer edition. |
− | logit(p_i) = ln\left( {{p_i}\over{1-p_i}} \right) = \sum_k c_k B_{i,k}
| |
− | </math>-->
| |
| | | |
− | given a data set basis B, with each sample classified as y_i, having a classification of 0 or 1.
| + | == See Also == |
− | | + | * [[LogisticRegression]] |
− | The syntax is the same as for the Regression function. The basis may be of a generalized linear form, that is, each term in the basis may be an arbitrary non-linear function of your data; however, the logit of the prediction is a linear combination of these.
| |
− | | |
− | Once you have used the Logistic_Regression function to compute the coefficients for your model, the predictive model that results returns the probability that a given data point is classified as 1.
| |
− | | |
− | = Library =
| |
− | | |
− | Generalized Regression.ana
| |
− | | |
− | = Example =
| |
− | | |
− | Suppose you want to predict the probability that a particular treatment for diabetes is effective given several lab test results. Data is collected for patients who have undergone the treatment, as follows, where the variable ''Test_results'' contains lab test data and ''Treatment_effective'' is set to 0 or 1 depending on whether the treatment was effective or not for that patient:
| |
− | | |
− | [[image:DiabetesData.jpg]]
| |
− | [[Image:DiabetesOutcome.jpg]]
| |
− | | |
− | Using the data directly as the regression basis, the logistic regression coefficients are computed using:
| |
− | | |
− | Variable c := Logistic_regression( Treatment_effective, Test_results, Patient_ID, Lab_test )
| |
− | | |
− | We can obtain the predicted probability for each patient in this testing set using:
| |
− | | |
− | Variable Prob_Effective := [[InvLogit]]( [[Sum]]( c*Test_results, Lab_Test ))
| |
− | | |
− | If we have lab tests for a new patient, say New_Patient_Tests, in the form of a vector indexed by Lab_Test, we can predict the probability that treatment will be effective using:
| |
− | | |
− | [[InvLogit]]( [[Sum]]( c*New_patient_tests, Lab_test ) )
| |
− | | |
− | It is often possible to improve the predictions dramatically by including a y-offset term in the linear basis. Using the test data directly as the regression basis requires the linear combination part to pass through the origin. To incorporate the y-offset term, we would add a column to the basis having the constant value 1 across all ''patient_ID''s:
| |
− | | |
− | Index K := [[Concat]]([1],Lab_Test)
| |
− | Variable B := if K=1 then 1 else Test_results[Lab_test=K]
| |
− | Variable C2 := Logistic_Regression( Treatment_effectiveness, B, Patient_ID, K )
| |
− | Variable Prob_Effective2 := [[Logit]]( [[Sum]]( C2*B, K )
| |
− | | |
− | | |
− | | |
− | | |
− | = See Also = | |
− | | |
− | * [[Probit_Regression]]
| |
− | * [[Regression]], [[RegressionDist]] : When Y is continuous, with normally-distributed error
| |
− | * [[Poisson_Regression]] : When Y models a count (number of events that occur) | |
Enable comment auto-refresher