Skip to content

ModelCombination

ModelCombination

The function creates multiple models and checks whether they are accurate or not.

Parameters:

  • y_train: pd.Series
    A series of binary dependent variable of the train sample.
  • x_train: pd.DataFrame
    A dataframe with explanatory variables of the train sample.
  • y_test: pd.Series
    A series of binary dependent variable of the test sample.
  • x_test: pd.DataFrame
    A dataframe with explanatory variables of the test sample.
  • max_model_number: int
    Maximum number of models to be created.
  • dependent_number: int
    Quantity of dependent variable to input in the model. The number of coef_expectation must be less than the columns number of x_train.
  • coef_expectation: pd.DataFrame
    A dataframe with variable names and their sign expectations. The name of variables must be the same as x_train and x_test column names.
  • intercept: bool, default = True
    An indicator whether to include intercept into model or not.
  • p_value: float, default = 0.05
    Maximum significance level.
  • check_sample: str, {'test', 'train'}, default = 'test'
    A sample to calculate key metrics.
  • metric: str, {'gini', 'auc'}, default = 'gini'
    Metric to calculate.
  • gini_cutoff: float, default = 0.5
    A cutoff value of Gini.
  • auc_cutoff: float, default = 0.7
    A cutoff value of AUC.

Returns:

  • final_data: dict
    key: the number of model
    value: model

Exceptions:

  • ValuError:
    Raised if the x_train columns are not identical with coef_expectation index
    Raised if the dependent_number is greater than x_train columns quantity
    Raised if intercept is not bool
    Raised if check_sample is not in [train, test]
    Raised if metric is not in [gini, auc]
    Raised if p_value is not float or not in (0, 0.5)
    Raised if gini_cutoff is not float or not in (0, 1)
    Raised if auc_cutoff is not float or not in (0, 1)

  • TypeError
    Raised if x_train is not a pandas DataFrame object
    Raised if x_test is not a pandas DataFrame object
    Raised if y_train is not a pandas Series object
    Raised if y_test is not a pandas Series object

Example:

from combat.models import LogitModel
from combat.combat import ModelCombination
import pandas as pd

# Sample input data
y_train = pd.Series([0, 1, 0, 1, 0])
x_train = pd.DataFrame({'var1': [1, 2, 3, 4, 5], 'var2': [6, 7, 8, 9, 10], 'var3': [11, 12, 13, 14, 15]})
y_test = pd.Series([0, 1, 0, 1, 0])
x_test = pd.DataFrame({'var1': [1, 2, 3, 4, 5], 'var2': [6, 7, 8, 9, 10]}, 'var3': [11, 12, 13, 14, 15]})
max_model_number = 5
dependent_number = 2
coef_expectation = pd.DataFrame({'dtype': ['int', 'int'], 'sign_expectation': [1, -1, 0]})
intercept = True
p_value = 0.05
check_sample = 'test'
metric = 'gini'
gini_cutoff = 0.5
auc_cutoff = 0.7

# Create and validate models
final_models = ModelCombination(y_train
                                , x_train
                                , y_test
                                , x_test
                                , max_model_number
                                , dependent_number
                                , coef_expectation
                                , intercept
                                , p_value
                                , check_sample
                                , metric
                                , gini_cutoff
                                , auc_cutoff
                                )
print("Final models:", final_models)