Difference between revisions of "Logistic Regression"
Line 9: | Line 9: | ||
The Logistic_regression function returns the best-fit coefficients, c, for a model of the form | The Logistic_regression function returns the best-fit coefficients, c, for a model of the form | ||
− | <math> | + | |
+ | [[Image:LogisticEq.jpg]] | ||
+ | <!--<math> | ||
logit(p_i) = ln\left( {{p_i}\over{1-p_i}} \right) = \sum_k c_k B_{i,k} | logit(p_i) = ln\left( {{p_i}\over{1-p_i}} \right) = \sum_k c_k B_{i,k} | ||
− | </math> | + | </math>--> |
+ | |||
given a data set basis B, with each sample classified as y_i, having a classification of 0 or 1. | given a data set basis B, with each sample classified as y_i, having a classification of 0 or 1. | ||
Line 40: | Line 43: | ||
[[InvLogit]]( [[Sum]]( c*New_patient_tests, Lab_test ) ) | [[InvLogit]]( [[Sum]]( c*New_patient_tests, Lab_test ) ) | ||
+ | |||
+ | It is often possible to improve the predictions dramatically by including a y-offset term in the linear basis. Using the test data directly as the regression basis requires the linear combination part to pass through the origin. To incorporate the y-offset term, we would add a column to the basis having the constant value 1 across all ''patient_ID''s: | ||
+ | |||
+ | Index K := [[Concat]]([1],Lab_Test) | ||
+ | Variable B := if K=1 then 1 else Test_results[Lab_test=K] | ||
+ | Variable C2 := Logistic_Regression( Treatment_effectiveness, B, Patient_ID, K ) | ||
+ | Variable Prob_Effective2 := [[Logit]]( [[Sum]]( C2*B, K ) | ||
Revision as of 21:32, 1 February 2008
Logistic regression is a techique for predicting a Bernoulli (i.e., 0,1-valued) random variable from a set of continuous dependent variables. See the Wikipedia article on Logistic regression for a simple description. Another generalized logistic model that can be used for this purpose is the Probit_Regression model. These differ in functional form, with the logistic regression using a logit function to link the linear predictor to the predicted probability, while the probit model uses a cumulative normal for the same.
Logistic_Regression( Y,B,I,K )
(Requires Analytica Optimizer)
The Logistic_regression function returns the best-fit coefficients, c, for a model of the form
given a data set basis B, with each sample classified as y_i, having a classification of 0 or 1.
The syntax is the same as for the Regression function. The basis may be of a generalized linear form, that is, each term in the basis may be an arbitrary non-linear function of your data; however, the logit of the prediction is a linear combination of these.
Once you have used the Logistic_Regression function to compute the coefficients for your model, the predictive model that results returns the probability that a given data point is classified as 1.
Library
Generalized Regression.ana
Example
Suppose you want to predict the probability that a particular treatment for diabetes is effective given several lab test results. Data is collected for patients who have undergone the treatment, as follows, where the variable Test_results contains lab test data and Treatment_effective is set to 0 or 1 depending on whether the treatment was effective or not for that patient:
Using the data directly as the regression basis, the logistic regression coefficients are computed using:
Variable c := Logistic_regression( Treatment_effective, Test_results, Patient_ID, Lab_test )
We can obtain the predicted probability for each patient in this testing set using:
Variable Prob_Effective := InvLogit( Sum( c*Test_results, Lab_Test ))
If we have lab tests for a new patient, say New_Patient_Tests, in the form of a vector indexed by Lab_Test, we can predict the probability that treatment will be effective using:
InvLogit( Sum( c*New_patient_tests, Lab_test ) )
It is often possible to improve the predictions dramatically by including a y-offset term in the linear basis. Using the test data directly as the regression basis requires the linear combination part to pass through the origin. To incorporate the y-offset term, we would add a column to the basis having the constant value 1 across all patient_IDs:
Index K := Concat([1],Lab_Test) Variable B := if K=1 then 1 else Test_results[Lab_test=K] Variable C2 := Logistic_Regression( Treatment_effectiveness, B, Patient_ID, K ) Variable Prob_Effective2 := Logit( Sum( C2*B, K )
See Also
- Probit_Regression
- Regression, RegressionDist : When Y is continuous, with normally-distributed error
- Poisson_Regression : When Y models a count (number of events that occur)
Enable comment auto-refresher