UBaymodel¶
UBayFS
- class UBaymodel.UBaymodel(data, target, feat_names=[], M=100, tt_split=0.75, nr_features='auto', method=['mrmr'], prior_model='dirichlet', weights=[1], constraints=None, l=1, optim_method='GA', popsize=100, maxiter=100, random_state=None)¶
Initialization of a UBaymodel.
- Parameters:
data (<numpy array> or <pandas dataframe>) – Dataset on which feature selection shall be performed. Variable types must be numeric or integer.
target (<numpy array> or <pandas dataframe>) – Response variable of data. Variable types must be numeric or integer.
feat_names (<list>) – List holding feature names. Preferably a list of string values. If empty, feature names will be generated automatically. Default:
feat_names=[]
.M (<int>) – Positive integer determining the number of ensemble models. Default
M=100
. tt_split : <float> Ratio of samples used for training a single ensemble model. Defaulttt_split=0.75
.nr_features (<string or int>) –
- Number of features selected in a single ensemble. Default:
nr_features="auto"
. string="auto"
: A random number between 1 and the total number of features.int
: A positive integer.
- Number of features selected in a single ensemble. Default:
method (<list of strings>) –
- List of feature selectors used as ensemble feature selectors.Currently options are:
mrmr
: minimum Redundancy maximal Relevance criterion. This method supports classification and regression tasks.chi
: chi square whateverfisher
: Fisher score (classification only)
prior_model (<string>) – Type of prior. Default:
prior_model="dirichlet"
. So far, “dirichlet” is the only implemented prior model type.weights (<list>) – A list of integers defining the prior weights of the features. If a list with only one entry is used, this value is assigned to each feature as prior weight. Default:
weight=[1]
constraints (<UBayconstraint>) – A UBayconstraint object describing user-defined constraints. See description UBayconstraint. Default:
constraints=None
l (<float>) – Positive float. The Lagrange parameter defining the penalization strength imposed on a feature set violating the constraints. Default:
l=1
optim_method (<string>) – Optimizer. Currently only Genetic Algorithm “GA” available. Default:
optim_metod="GA"
popsize (<integer>) – Positive integer for the population size in GA.
maxiter (<integer>) – Positive integer for the maximal number of GA iterations.
- admissibility(state, log=True)¶
Get admissibility of a feature set. :param state: Binary 1-d array indicating which features are selected (1) and which are not selected (0). :type state: <np.array> :param log: Use of log-scale. :type log: <boolean>
- Return type:
A numeric value.
- evaluateFS(state, method='spearman', log=False)¶
Train the UBaymodel.
- Return type:
A <dictionary> with different key parameters of the selected feature set.
- getConstraints()¶
Get side constraints.
- Return type:
A list.
- getOptim()¶
Get optimization parameters.
- Return type:
A dictionary with the optimization parameters.
- getWeights()¶
Get prior weights.
- Return type:
A numpy array with the prior weights.
- posteriorExpectation()¶
Posterior expectation score.
- Return type:
A numeric value.
- sampleInitial(post_scores, size)¶
Sample an initial feature set based on a search heuristic.
- Return type:
A binary <numpy array> feature set.
- setConstraints(constraints, append=False)¶
Set side oconstraints.
- Parameters:
constraints (<UBayconstraint>) – A UBayconstraint object describing user-defined constraints. See description UBayconstraint.
append (<boolean>) –
True: Append a new constraint to the list of present constraints
False: Replace all present constraints with the new constraint
- setOptim(optim_method, popsize, maxiter)¶
Set parameters for optimization.
- Parameters:
optim_method (<string>) – Currently only genetic algorithm (“GA”) possible.
popsize (<integer>) – Positive integer for the population size in GA.
maxiter (<integer>) – Positive integer for the maximal number of GA iterations.
- setWeights(weights, block_list=None, block_matrix=None)¶
Set prior weights.
- Parameters:
weights (<list>) – A list of integers defining the prior weights of the features. If a list with only one entry is used, this value is assigned to each feature as prior weight. Block assignment information for features.
block_matrix (<np.array>) – Numpy array matrix definint the block assignment information for features.
- train()¶
Train the UBaymodel.
- Returns:
<pandas dataframe> with the optimal feature set and their names as index
<list> of selected feature names