ModelCombination
ModelCombination
The function creates multiple models and checks whether they are accurate or not.
Parameters:
- y_train:
pd.Series
A series of binary dependent variable of the train sample. - x_train:
pd.DataFrame
A dataframe with explanatory variables of the train sample. - y_test:
pd.Series
A series of binary dependent variable of the test sample. - x_test:
pd.DataFrame
A dataframe with explanatory variables of the test sample. - max_model_number:
int
Maximum number of models to be created. - dependent_number:
int
Quantity of dependent variable to input in the model. The number of coef_expectation must be less than the columns number of x_train. - coef_expectation:
pd.DataFrame
A dataframe with variable names and their sign expectations. The name of variables must be the same as x_train and x_test column names. - intercept:
bool, default =True
An indicator whether to include intercept into model or not. - p_value:
float, default = 0.05
Maximum significance level. - check_sample:
str, {'test', 'train'}, default = 'test'
A sample to calculate key metrics. - metric:
str, {'gini', 'auc'}, default = 'gini'
Metric to calculate. - gini_cutoff:
float, default = 0.5
A cutoff value of Gini. - auc_cutoff:
float, default = 0.7
A cutoff value of AUC.
Returns:
- final_data:
dict
key: the number of model
value: model
Exceptions:
-
ValuError:
Raised if thex_traincolumns are not identical withcoef_expectationindex
Raised if thedependent_numberis greater thanx_traincolumns quantity
Raised ifinterceptis notbool
Raised ifcheck_sampleis not in [train,test]
Raised ifmetricis not in [gini,auc]
Raised ifp_valueis notfloator not in(0, 0.5)
Raised ifgini_cutoffis notfloator not in(0, 1)
Raised ifauc_cutoffis notfloator not in(0, 1) -
TypeError
Raised ifx_trainis not a pandas DataFrame object
Raised ifx_testis not a pandas DataFrame object
Raised ify_trainis not a pandas Series object
Raised ify_testis not a pandas Series object
Example:
from combat.models import LogitModel
from combat.combat import ModelCombination
import pandas as pd
# Sample input data
y_train = pd.Series([0, 1, 0, 1, 0])
x_train = pd.DataFrame({'var1': [1, 2, 3, 4, 5], 'var2': [6, 7, 8, 9, 10], 'var3': [11, 12, 13, 14, 15]})
y_test = pd.Series([0, 1, 0, 1, 0])
x_test = pd.DataFrame({'var1': [1, 2, 3, 4, 5], 'var2': [6, 7, 8, 9, 10]}, 'var3': [11, 12, 13, 14, 15]})
max_model_number = 5
dependent_number = 2
coef_expectation = pd.DataFrame({'dtype': ['int', 'int'], 'sign_expectation': [1, -1, 0]})
intercept = True
p_value = 0.05
check_sample = 'test'
metric = 'gini'
gini_cutoff = 0.5
auc_cutoff = 0.7
# Create and validate models
final_models = ModelCombination(y_train
, x_train
, y_test
, x_test
, max_model_number
, dependent_number
, coef_expectation
, intercept
, p_value
, check_sample
, metric
, gini_cutoff
, auc_cutoff
)
print("Final models:", final_models)