Difference between revisions of "Logistic Regression"

(Obsolete -- removed content, redirected reader to LogisticRegression)
 
(10 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Category:Doc Status D]] <!-- For Lumina use, do not change -->
+
[[Category: Generalized Regression library functions]]
[[Category:Data Analysis Functions]]
 
  
Logistic regression is a techique for predicting a Bernoulli (i.e., 0,1-valued) random variable from a set of continuous dependent variables.  See the [http://en.wikipedia.org/wiki/Logistic_regression Wikipedia article on Logistic regression] for a simple description. Another generalized logistic model that can be used for this purpose is the [[Probit_Regression]] model.  These differ in functional form, with the logistic regression using a logit function to link the linear predictor to the predicted probability, while the probit model uses a cumulative normal for the same.
+
The [[Logistic_Regression]] function is obsolete, and has been replaced by the [[LogisticRegression]] function. Please see [[LogisticRegression]].
  
= Logistic_Regression( Y,B,I,K ) =
+
The old [[Logistic_Regression]] function (with the underscore) is implemented as a [[User-Defined Function]] in the
 +
([[media:Generalized Regression.ana|Generalized Regression library]]). It requires the Analytica [[Optimizer]] edition to use. It still exists to support legacy models.
  
(''Requires Analytica Optimizer'')
+
The newer [[LogisticRegression]] function is available in all editions of Analytica. It exists in [[Analytica 4.5]] and up.
  
The Logistic_regression function returns the best-fit coefficients, c, for a model of the form
+
To convert a legacy model to use the newer version, simply remove the underscores -- the parameter order is the same.
  
[[Image:LogisticEq.jpg]]
+
==History==
<!--<math>
+
In [[Analytica 4.5]], this library function [[Logistic_Regression]]() has been superseded by the built-in [[LogisticRegression]] function that does not require the Optimizer edition.
logit(p_i) = ln\left( {{p_i}\over{1-p_i}} \right) = \sum_k c_k B_{i,k}
 
</math>-->
 
  
given a data set basis B, with each sample classified as y_i, having a classification of 0 or 1.
+
== See Also ==
 
+
* [[LogisticRegression]]
The syntax is the same as for the Regression function.  The basis may be of a generalized linear form, that is, each term in the basis may be an arbitrary non-linear function of your data; however, the logit of the prediction is a linear combination of these.
 
 
 
Once you have used the Logistic_Regression function to compute the coefficients for your model, the predictive model that results returns the probability that a given data point is classified as 1. 
 
 
 
= Library =
 
 
 
Generalized Regression.ana
 
 
 
= Example =
 
 
 
Suppose you want to predict the probability that a particular treatment for diabetes is effective given several lab test results.  Data is collected for patients who have undergone the treatment, as follows, where the variable ''Test_results'' contains lab test data and ''Treatment_effective'' is set to 0 or 1 depending on whether the treatment was effective or not for that patient:
 
 
 
[[image:DiabetesData.jpg]]
 
[[Image:DiabetesOutcome.jpg]]
 
 
 
Using the data directly as the regression basis, the logistic regression coefficients are computed using:
 
 
 
  Variable c := Logistic_regression( Treatment_effective, Test_results, Patient_ID, Lab_test )
 
 
 
We can obtain the predicted probability for each patient in this testing set using:
 
 
 
  Variable Prob_Effective := [[InvLogit]]( [[Sum]]( c*Test_results, Lab_Test ))
 
 
 
If we have lab tests for a new patient, say New_Patient_Tests, in the form of a vector indexed by Lab_Test, we can predict the probability that treatment will be effective using:
 
 
 
  [[InvLogit]]( [[Sum]]( c*New_patient_tests, Lab_test ) )
 
 
 
It is often possible to improve the predictions dramatically by including a y-offset term in the linear basis.  Using the test data directly as the regression basis requires the linear combination part to pass through the origin.  To incorporate the y-offset term, we would add a column to the basis having the constant value 1 across all ''patient_ID''s:
 
 
 
  Index K := [[Concat]]([1],Lab_Test)
 
  Variable B := if K=1 then 1 else Test_results[Lab_test=K]
 
  Variable C2 := Logistic_Regression( Treatment_effectiveness, B, Patient_ID, K )
 
  Variable Prob_Effective2 := [[Logit]]( [[Sum]]( C2*B, K )
 
 
 
 
 
 
 
 
 
= See Also =
 
 
 
* [[Probit_Regression]]
 
* [[Regression]], [[RegressionDist]] : When Y is continuous, with normally-distributed error
 
* [[Poisson_Regression]] : When Y models a count (number of events that occur)
 

Latest revision as of 00:15, 19 September 2018


The Logistic_Regression function is obsolete, and has been replaced by the LogisticRegression function. Please see LogisticRegression.

The old Logistic_Regression function (with the underscore) is implemented as a User-Defined Function in the (Generalized Regression library). It requires the Analytica Optimizer edition to use. It still exists to support legacy models.

The newer LogisticRegression function is available in all editions of Analytica. It exists in Analytica 4.5 and up.

To convert a legacy model to use the newer version, simply remove the underscores -- the parameter order is the same.

History

In Analytica 4.5, this library function Logistic_Regression() has been superseded by the built-in LogisticRegression function that does not require the Optimizer edition.

See Also

Comments


You are not allowed to post comments.