Skip to content

VarExpPower

VarExpPower

The function describes the data in terms of explanatory power.

Parameters:

  • y_train: pd.Series()
    The series of binary dependent variable of train set.
  • x_train: pd.DataFrame()
    The dataframe of all independent variables of train set.
  • y_test: pd.Series()
    The series of binary dependent variable of test set.
  • x_test: pd.DataFrame()
    The dataframe of all independent variables of test set.
  • discriminatory: str {'ttest', 'kruskal'}, default = 'ttest'
    Which test to perform to conduct the test to compare samples means. The default is 'ttest'.
  • vif: bool, default = True
    Calculate the Variance inflation factor or not. If False VIFs will not be calculated.
  • individual_accuracy: str {'gini', 'auc', 'f1_score'}, default = 'gini'
    Calculates individual accuracy in pairwise regression.
  • check_sample: str {'test', 'train'}, default = 'test'
    Data sample to perform individual accuracy test.

Outputs:

  • final_data: pd.DataFrame()
    Pandas DataFrame with all variables analysis.

Exceptions:

  • TypeError:
    Raised if x_train parameter is not a pandas DataFrame object.
    Raised if x_test parameter is not a pandas DataFrame object.
    Raised if y_train parameter is not a pandas Series.
    Raised if y_test parameter is not a pandas Series.
    Raised if vif parameter is not logical.

  • ValueError:
    Raised if discriminatory parameter is not in ('ttest', 'kruskal').
    Raised if individual_accuracy parameter is not in ('gini', 'auc', 'f1_score').
    Raised if check_sample parameter is not in ['train', 'test'].

Example:

import pandas as pd
from combat.short_list import VarExpPower

# Sample input data
y_train = pd.Series([0, 1, 0, 1, 0])
x_train = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [6, 7, 8, 9, 10]})
y_test = pd.Series([0, 1, 0, 1, 0])
x_test = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [6, 7, 8, 9, 10]})
discriminatory = 'ttest'
vif = True
individual_accuracy = 'gini'
check_sample = 'test'

# Calculate variable explanatory power
result = VarExpPower(y_train, x_train, y_test, x_test, discriminatory, vif, individual_accuracy, check_sample)
print(result)