LogitModel Class
The LogitModel class provides functionality for logistic regression modeling, performance evaluation, and prediction. It encapsulates implementations from both the sklearn and statsmodels libraries, allowing users to choose based on their preference or requirements.
Attributes
sk_model: The trained scikit-learn logistic regression model.sm_model: The trained statsmodels logistic regression model.x_train: The training data features (pd.DataFrame).y_train: The training data labels (pd.Series).x_test: The test data features (pd.DataFrame).y_test: The test data labels (pd.Series).intercept: Whether to fit an intercept term (default True).
Methods
__init__(self, x_train, y_train, x_test, y_test, intercept=True)
Initialize the logistic regression model with training and test data.
Model_SK(self)
Train and return the scikit-learn logistic regression model.
Model_SM(self)
Train and return the statsmodels logistic regression model.
Gini_Train(self) -> float
Calculate and return the Gini coefficient on the training set.
Accuracy_Train(self, cutoff: float) -> float
Calculate and return the accuracy ratio on the training set given a cutoff value.
AUC_Train(self) -> float
Calculate and return the AUC (Area Under the Curve) on the training set.
Predict_Proba_Train(self) -> np.ndarray
Calculate and return the predicted probabilities on the training set.
Predict_LogProba_Train(self) -> np.ndarray
Calculate and return the logarithm of predicted probabilities on the training set.
PredictLabel_Train(self, cutoff: float) -> list
Calculate and return the predicted labels on the training set given a cutoff value.
Brier_Train(self) -> float
Calculate and return the Brier score on the training set.
F1_Train(self) -> float
Calculate and return the F1 score on the training set.
Recall_Train(self, cutoff: float) -> float
Calculate and return the recall on the training set given a cutoff value.
Precision_Train(self, cutoff: float) -> float
Calculate and return the precision on the training set given a cutoff value.
Confusion_Matrix_Train(self, cutoff: float) -> np.ndarray
Calculate and return the confusion matrix on the training set given a cutoff value.
FPR_Train(self) -> np.ndarray
Calculate and return the false positive rate on the training set.
TPR_Train(self) -> np.ndarray
Calculate and return the true positive rate on the training set.
ROC_Curve_Train(self)
Plot the ROC (Receiver Operating Characteristic) curve for the training set.
Gini_Test(self) -> float
Calculate and return the Gini coefficient on the test set.
Accuracy_Test(self, cutoff: float) -> float
Calculate and return the accuracy ratio on the test set given a cutoff value.
AUC_Test(self) -> float
Calculate and return the AUC (Area Under the Curve) on the test set.
Predict_Proba_Test(self) -> np.ndarray
Calculate and return the predicted probabilities on the test set.
Predict_LogProba_Test(self) -> np.ndarray
Calculate and return the logarithm of predicted probabilities on the test set.
PredictLabel_Test(self, cutoff: float) -> list
Calculate and return the predicted labels on the test set given a cutoff value.
Brier_Test(self) -> float
Calculate and return the Brier score on the test set.
F1_Test(self) -> float
Calculate and return the F1 score on the test set.
Recall_Test(self, cutoff: float) -> float
Calculate and return the recall on the test set given a cutoff value.
Precision_Test(self, cutoff: float) -> float
Calculate and return the precision on the test set given a cutoff value.
Confusion_Matrix_Test(self, cutoff: float) -> np.ndarray
Calculate and return the confusion matrix on the test set given a cutoff value.
FPR_Test(self) -> np.ndarray
Calculate and return the false positive rate on the test set.
TPR_Test(self) -> np.ndarray
Calculate and return the true positive rate on the test set.
ROC_Curve_Test(self)
Plot the ROC (Receiver Operating Characteristic) curve for the test set.
GetCoefficients_SK(self) -> np.ndarray
Return an array of coefficients from the scikit-learn model.
GetIntercept_SK(self) -> np.ndarray
Return the intercept from the scikit-learn model.
Summary(self)
Return a summary of the statsmodels model.
GetCoefficients_SM(self) -> pd.DataFrame
Return a DataFrame of coefficients from the statsmodels model.
Prediction(self, x_data: pd.DataFrame, logprob: bool = False) -> np.ndarray
Make predictions on external data.
-
ValueError:
Raised if columns ofx_trainandx_testare not identical
Raised if the length ofx_trainandx_testare not identical -
TypeError
Raised ifx_trainis not a pandas DataFrame object
Raised ifx_testis not a pandas DataFrame object
Raised ify_trainis not a pandas Series object
Raised ify_testis not a pandas Series object
Raised ifinterceptis notboolean
Example Usage
from combat.models import LogitModel
import pandas as pd
import numoy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (roc_auc_score
, precision_score
, recall_score
, f1_score
, auc
, roc_curve
, accuracy_score
, brier_score_loss
, confusion_matrix
)
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant
# Initialize LogitModel
model = LogitModel(x_train, y_train, x_test, y_test)
# Train scikit-learn model
model.Model_SK()
# Train statsmodels model
model.Model_SM()
# Evaluate model accuracy on training and testing sets
print("Accuracy on training set: ", model.Accuracy_Train(cutoff=0.5))
print("Accuracy on test set: ", model.Accuracy_Test(cutoff=0.5))
print("Gini on training set: ", model.Gini_Train())
print("Gini on testing set: ", model.Gini_Test())
# Make predictions on external data
predictions = model.Prediction(external_data)