Skip to content

LogitModel Class

The LogitModel class provides functionality for logistic regression modeling, performance evaluation, and prediction. It encapsulates implementations from both the sklearn and statsmodels libraries, allowing users to choose based on their preference or requirements.

Attributes

  • sk_model: The trained scikit-learn logistic regression model.
  • sm_model: The trained statsmodels logistic regression model.
  • x_train: The training data features (pd.DataFrame).
  • y_train: The training data labels (pd.Series).
  • x_test: The test data features (pd.DataFrame).
  • y_test: The test data labels (pd.Series).
  • intercept: Whether to fit an intercept term (default True).

Methods

__init__(self, x_train, y_train, x_test, y_test, intercept=True)

Initialize the logistic regression model with training and test data.

Model_SK(self)

Train and return the scikit-learn logistic regression model.

Model_SM(self)

Train and return the statsmodels logistic regression model.

Gini_Train(self) -> float

Calculate and return the Gini coefficient on the training set.

Accuracy_Train(self, cutoff: float) -> float

Calculate and return the accuracy ratio on the training set given a cutoff value.

AUC_Train(self) -> float

Calculate and return the AUC (Area Under the Curve) on the training set.

Predict_Proba_Train(self) -> np.ndarray

Calculate and return the predicted probabilities on the training set.

Predict_LogProba_Train(self) -> np.ndarray

Calculate and return the logarithm of predicted probabilities on the training set.

PredictLabel_Train(self, cutoff: float) -> list

Calculate and return the predicted labels on the training set given a cutoff value.

Brier_Train(self) -> float

Calculate and return the Brier score on the training set.

F1_Train(self) -> float

Calculate and return the F1 score on the training set.

Recall_Train(self, cutoff: float) -> float

Calculate and return the recall on the training set given a cutoff value.

Precision_Train(self, cutoff: float) -> float

Calculate and return the precision on the training set given a cutoff value.

Confusion_Matrix_Train(self, cutoff: float) -> np.ndarray

Calculate and return the confusion matrix on the training set given a cutoff value.

FPR_Train(self) -> np.ndarray

Calculate and return the false positive rate on the training set.

TPR_Train(self) -> np.ndarray

Calculate and return the true positive rate on the training set.

ROC_Curve_Train(self)

Plot the ROC (Receiver Operating Characteristic) curve for the training set.

Gini_Test(self) -> float

Calculate and return the Gini coefficient on the test set.

Accuracy_Test(self, cutoff: float) -> float

Calculate and return the accuracy ratio on the test set given a cutoff value.

AUC_Test(self) -> float

Calculate and return the AUC (Area Under the Curve) on the test set.

Predict_Proba_Test(self) -> np.ndarray

Calculate and return the predicted probabilities on the test set.

Predict_LogProba_Test(self) -> np.ndarray

Calculate and return the logarithm of predicted probabilities on the test set.

PredictLabel_Test(self, cutoff: float) -> list

Calculate and return the predicted labels on the test set given a cutoff value.

Brier_Test(self) -> float

Calculate and return the Brier score on the test set.

F1_Test(self) -> float

Calculate and return the F1 score on the test set.

Recall_Test(self, cutoff: float) -> float

Calculate and return the recall on the test set given a cutoff value.

Precision_Test(self, cutoff: float) -> float

Calculate and return the precision on the test set given a cutoff value.

Confusion_Matrix_Test(self, cutoff: float) -> np.ndarray

Calculate and return the confusion matrix on the test set given a cutoff value.

FPR_Test(self) -> np.ndarray

Calculate and return the false positive rate on the test set.

TPR_Test(self) -> np.ndarray

Calculate and return the true positive rate on the test set.

ROC_Curve_Test(self)

Plot the ROC (Receiver Operating Characteristic) curve for the test set.

GetCoefficients_SK(self) -> np.ndarray

Return an array of coefficients from the scikit-learn model.

GetIntercept_SK(self) -> np.ndarray

Return the intercept from the scikit-learn model.

Summary(self)

Return a summary of the statsmodels model.

GetCoefficients_SM(self) -> pd.DataFrame

Return a DataFrame of coefficients from the statsmodels model.

Prediction(self, x_data: pd.DataFrame, logprob: bool = False) -> np.ndarray

Make predictions on external data.

  • ValueError:
    Raised if columns of x_train and x_test are not identical
    Raised if the length of x_train and x_test are not identical

  • TypeError
    Raised if x_train is not a pandas DataFrame object
    Raised if x_test is not a pandas DataFrame object
    Raised if y_train is not a pandas Series object
    Raised if y_test is not a pandas Series object
    Raised if intercept is not boolean

Example Usage

from combat.models import LogitModel
import pandas as pd
import numoy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (roc_auc_score
                            , precision_score
                            , recall_score
                            , f1_score
                            , auc
                            , roc_curve
                            , accuracy_score
                            , brier_score_loss
                            , confusion_matrix
                            )

import statsmodels.api as sm
from statsmodels.tools.tools import add_constant

# Initialize LogitModel
model = LogitModel(x_train, y_train, x_test, y_test)

# Train scikit-learn model
model.Model_SK()

# Train statsmodels model
model.Model_SM()

# Evaluate model accuracy on training and testing sets
print("Accuracy on training set: ", model.Accuracy_Train(cutoff=0.5))
print("Accuracy on test set: ", model.Accuracy_Test(cutoff=0.5))

print("Gini on training set: ", model.Gini_Train())
print("Gini on testing set: ", model.Gini_Test())

# Make predictions on external data
predictions = model.Prediction(external_data)