WoETransform Function
The WoETransform function performs the Weight of Evidence (WoE) transformation on explanatory variables.
Parameters:
-
x: pd.Series
A pandas Series of the explanatory variable. -
y: pd.Series
A pandas Series of the target binary variable. -
mon_constraint: int {-1, 0, 1}
Numeric type of monotonic constraint. -
special_codes: list
Special codes in the data. -
var_name: str
A variable name. -
var_type: str {'numerical', 'categorical'}
A type of explanatory variable. -
metric: str, {'woe', 'event_rate'}, default = 'woe'
A metric to perform transformation. -
prebinning_method: str, {'cart', 'mdlp', 'quantile', 'uniform', None}, default="cart"
The pre-binning method. -
solver: str, {'cp', 'mip', 'ls'}, default="cp"
The optimizer to solve the optimal binning problem. -
divergence: str, {'iv', 'js', 'hellinger', 'triangular'}, default="iv"
The divergence measure in the objective function to be maximized. -
max_n_prebins: int, default=20
The maximum number of bins after pre-binning (prebins). -
min_prebin_size: float, default=0.05
The fraction of the minimum number of records for each prebin. -
min_n_bins: int or None, optional, default=None
The minimum number of bins. -
max_n_bins: int or None, optional, default=None
The maximum number of bins. -
min_bin_size: float or None, optional, default=None
The fraction of the minimum number of records for each bin. -
max_bin_size: float or None, optional, default=None
The fraction of the maximum number of records for each bin. -
min_bin_n_nonevent: int or None, optional, default=None
The minimum number of non-event records for each bin. -
max_bin_n_nonevent: int or None, optional, default=None
The maximum number of non-event records for each bin. -
min_bin_n_event: int or None, optional, default=None
The minimum number of event records for each bin. -
max_bin_n_event: int or None, optional, default=None
The maximum number of event records for each bin. -
min_event_rate_diff: float, default=0
The minimum event rate difference between consecutive bins. -
max_pvalue: float or None, optional, default=None
The maximum p-value among bins. -
max_pvalue_policy: str, default="consecutive"
The method to determine bins not satisfying the p-value constraint. -
gamma: float, default=0
Regularization strength to reduce the number of dominating bins. -
outlier_detector: str or None, optional, default=None
The outlier detection method. -
outlier_params: dict or None, optional, default=None
Dictionary of parameters to pass to the outlier detection method. -
class_weight: dict, "balanced" or None, optional, default=None
Weights associated with classes. -
cat_cutoff: float or None, optional, default=None
Generate bin others with categories in which the fraction of occurrences is below the cutoff value. -
cat_unknown: float, str or None, default=None
The assigned value to the unobserved categories in training but occurring during transform. -
user_splits: array-like or None, optional, default=None
The list of pre-binning split points. -
user_splits_fixed: array-like or None, default=None
The list of pre-binning split points that must be fixed. -
special_codes: array-like, dict or None, optional, default=None
List of special codes. -
split_digits: int or None, optional, default=None
The significant digits of the split points. -
mip_solver: str, {'bop', 'cbc'}, default="bop"
The mixed-integer programming solver. -
time_limit: int, default=100
The maximum time in seconds to run the optimization solver. -
verbose: bool, default=False
Enable verbose output.
Returns:
final_data: dict
A dictionary with transformed data, status, binning table, and WoE transformation.
Exceptions
-
TypeError:
Raised if the parameterxis not a pandas Series.
Raised if the parameteryis not a pandas Series.
Raised if the parametervar_nameis not a string.
Raised if the parameterplotis not a boolean value.
Raised if the parametersolveris not one of 'cp', 'ls', or 'mip'.
Raised if the parametermax_n_prebinsis not an integer greater than 1.
Raised if the parametermin_prebin_sizeis not a float in the range (0, 0.5].
Raised if the parametermin_n_binsis not a positive integer.
Raised if the parametermax_n_binsis not a positive integer.
Raised if the parametermin_bin_sizeis not a float in the range (0, 0.5].
Raised if the parametermax_bin_sizeis not a float in the range (0, 1].
Raised if the parametermin_bin_n_noneventis not a positive integer.
Raised if the parametermax_bin_n_noneventis not a positive integer.
Raised if the parametermin_bin_n_eventis not a positive integer.
Raised if the parametermax_bin_n_eventis not a positive integer.
Raised if the parametermin_event_rate_diffis not a float in the range [0, 1].
Raised if the parametermax_pvalueis not a float in the range (0, 1].
Raised if the parametermax_pvalue_policyis not one of 'all' or 'consecutive'.
Raised if the parametergammais not a non-negative float.
Raised if the parameteroutlier_detectoris provided and not one of 'range' or 'zscore'.
Raised if the parameteroutlier_paramsis provided and not a dictionary.
Raised if the parameterclass_weightis provided and not a dictionary or 'balanced'.
Raised if the parameterclass_weightis a string and not equal to 'balanced'.
Raised if the parametercat_cutoffis provided and not a float in the range (0, 1].
Raised if the parametercat_unknownis provided and not a float or a string.
Raised if the parameteruser_splitsis provided and not a numpy.ndarray or a list.
Raised if the parameteruser_splits_fixedis provided and: -
user_splitsis None. - Not a numpy.ndarray or a list.
- Not a list of booleans.
-
Length mismatch with
user_splits.
Raised if the parameterspecial_codesis provided and not a numpy.ndarray, list, or dictionary.
Raised if the parameteroutlier_paramsis provided but not a dictionary.
Raised if the parametercat_unknownis provided but not a float or a string. -
ValueError:
Raised if the parameter mon_constraint is not one of -1, 0, or 1.
Raised if the parameter var_type is not one of 'categorical' or 'numerical'.
Raised if the parameter metric is not one of 'woe' or 'event_rate'.
Raised if the parameter prebinning_method is not one of 'cart', 'mdlp', 'quantile', 'uniform', or None.
Raised if the lengths of user_splits and user_splits_fixed parameters are not equal.
Raised if the parameter divergence is not one of 'iv', 'js', 'hellinger', 'triangular'
Raised if the lengths of user_splits and user_splits_fixed parameters do not match.
Raised if the parameter var_type is provided but not one of 'categorical' or 'numerical'.
Raised if the parameter metric is provided but not one of 'woe' or 'event_rate'.
Raised if the parameter prebinning_method is provided but not one of 'cart', 'mdlp', 'quantile', 'uniform', or None.
Raised if the parameter outlier_detector is provided but not one of 'range' or 'zscore'.
Raised if the parameter split_digits is provided but not an integer in the range [0, 8].
Raised if the parameter max_pvalue_policy is provided but not one of 'all' or 'consecutive'.
Raised if the parameter mon_constraint is provided but not one of -1, 0, or 1.
Raised if the parameter special_codes is provided as a dictionary but it is empty. The special_codes dictionary must contain at least one special code.
- TypeError/ValueError (Special Cases):
Raised if the parameter class_weight is provided as a string but not equal to 'balanced'.
Raised if the parameter user_splits_fixed is provided without user_splits.
Raised if the parameter user_splits_fixed is not a list of booleans.
Raised if the lengths of user_splits and user_splits_fixed parameters do not match.
- ValueError (Inconsistent Parameters):
Raised if both min_n_bins and max_n_bins are provided, but min_n_bins is greater than max_n_bins.
Raised if both min_bin_size and max_bin_size are provided, but min_bin_size is greater than max_bin_size.
import pandas as pd
from combat.transform import WoETransform
# Sample data
data = {
'age': [25, 35, 45, 55, 65],
'income': [50000, 60000, 70000, 80000, 90000],
'target': [0, 1, 0, 1, 0] # Binary target variable
}
df = pd.DataFrame(data)
# Define explanatory and target variables
x = df[['age', 'income']]
y = df['target']
# Perform WoE transformation
woe_transformer = WoETransform()
result = woe_transformer(x=x['age']
, y=y
, mon_constraint=1
, var_name='age'
, var_type='numerical'
# Add other parameters as needed
)
# Display transformed data
print(result['woe_transform'])