ModelCombination

The function creates multiple models and checks whether they are accurate or not.

Parameters:

y_train: pd.Series
A series of binary dependent variable of the train sample.
x_train: pd.DataFrame
A dataframe with explanatory variables of the train sample.
y_test: pd.Series
A series of binary dependent variable of the test sample.
x_test: pd.DataFrame
A dataframe with explanatory variables of the test sample.
max_model_number: int
Maximum number of models to be created.
dependent_number: int
Quantity of dependent variable to input in the model. The number of coef_expectation must be less than the columns number of x_train.
coef_expectation: pd.DataFrame
A dataframe with variable names and their sign expectations. The name of variables must be the same as x_train and x_test column names.
intercept: bool, default = True
An indicator whether to include intercept into model or not.
p_value: float, default = 0.05
Maximum significance level.
check_sample: str, {'test', 'train'}, default = 'test'
A sample to calculate key metrics.
metric: str, {'gini', 'auc'}, default = 'gini'
Metric to calculate.
gini_cutoff: float, default = 0.5
A cutoff value of Gini.
auc_cutoff: float, default = 0.7
A cutoff value of AUC.

Returns:

final_data: dict
key: the number of model
value: model

Exceptions:

ValuError:
Raised if the x_train columns are not identical with coef_expectation index
Raised if the dependent_number is greater than x_train columns quantity
Raised if intercept is not bool
Raised if check_sample is not in [train, test]
Raised if metric is not in [gini, auc]
Raised if p_value is not float or not in (0, 0.5)
Raised if gini_cutoff is not float or not in (0, 1)
Raised if auc_cutoff is not float or not in (0, 1)
TypeError
Raised if x_train is not a pandas DataFrame object
Raised if x_test is not a pandas DataFrame object
Raised if y_train is not a pandas Series object
Raised if y_test is not a pandas Series object

Example:

from combat.models import LogitModel
from combat.combat import ModelCombination
import pandas as pd

# Sample input data
y_train = pd.Series([0, 1, 0, 1, 0])
x_train = pd.DataFrame({'var1': [1, 2, 3, 4, 5], 'var2': [6, 7, 8, 9, 10], 'var3': [11, 12, 13, 14, 15]})
y_test = pd.Series([0, 1, 0, 1, 0])
x_test = pd.DataFrame({'var1': [1, 2, 3, 4, 5], 'var2': [6, 7, 8, 9, 10]}, 'var3': [11, 12, 13, 14, 15]})
max_model_number = 5
dependent_number = 2
coef_expectation = pd.DataFrame({'dtype': ['int', 'int'], 'sign_expectation': [1, -1, 0]})
intercept = True
p_value = 0.05
check_sample = 'test'
metric = 'gini'
gini_cutoff = 0.5
auc_cutoff = 0.7

# Create and validate models
final_models = ModelCombination(y_train
                                , x_train
                                , y_test
                                , x_test
                                , max_model_number
                                , dependent_number
                                , coef_expectation
                                , intercept
                                , p_value
                                , check_sample
                                , metric
                                , gini_cutoff
                                , auc_cutoff
                                )
print("Final models:", final_models)