UBaymodel

UBayFS

class UBaymodel.UBaymodel(data, target, feat_names=[], M=100, tt_split=0.75, nr_features='auto', method=['mrmr'], prior_model='dirichlet', weights=[1], constraints=None, l=1, optim_method='GA', popsize=100, maxiter=100, random_state=None)

Initialization of a UBaymodel.

Parameters:
  • data (<numpy array> or <pandas dataframe>) – Dataset on which feature selection shall be performed. Variable types must be numeric or integer.

  • target (<numpy array> or <pandas dataframe>) – Response variable of data. Variable types must be numeric or integer.

  • feat_names (<list>) – List holding feature names. Preferably a list of string values. If empty, feature names will be generated automatically. Default: feat_names=[].

  • M (<int>) – Positive integer determining the number of ensemble models. Default M=100. tt_split : <float> Ratio of samples used for training a single ensemble model. Default tt_split=0.75.

  • nr_features (<string or int>) –

    Number of features selected in a single ensemble. Default: nr_features="auto".
    • string="auto" : A random number between 1 and the total number of features.

    • int : A positive integer.

  • method (<list of strings>) –

    List of feature selectors used as ensemble feature selectors.Currently options are:
    • mrmr : minimum Redundancy maximal Relevance criterion. This method supports classification and regression tasks.

    • chi : chi square whatever

    • fisher : Fisher score (classification only)

  • prior_model (<string>) – Type of prior. Default: prior_model="dirichlet". So far, “dirichlet” is the only implemented prior model type.

  • weights (<list>) – A list of integers defining the prior weights of the features. If a list with only one entry is used, this value is assigned to each feature as prior weight. Default: weight=[1]

  • constraints (<UBayconstraint>) – A UBayconstraint object describing user-defined constraints. See description UBayconstraint. Default: constraints=None

  • l (<float>) – Positive float. The Lagrange parameter defining the penalization strength imposed on a feature set violating the constraints. Default: l=1

  • optim_method (<string>) – Optimizer. Currently only Genetic Algorithm “GA” available. Default: optim_metod="GA"

  • popsize (<integer>) – Positive integer for the population size in GA.

  • maxiter (<integer>) – Positive integer for the maximal number of GA iterations.

admissibility(state, log=True)

Get admissibility of a feature set. :param state: Binary 1-d array indicating which features are selected (1) and which are not selected (0). :type state: <np.array> :param log: Use of log-scale. :type log: <boolean>

Return type:

A numeric value.

evaluateFS(state, method='spearman', log=False)

Train the UBaymodel.

Return type:

A <dictionary> with different key parameters of the selected feature set.

getConstraints()

Get side constraints.

Return type:

A list.

getOptim()

Get optimization parameters.

Return type:

A dictionary with the optimization parameters.

getWeights()

Get prior weights.

Return type:

A numpy array with the prior weights.

posteriorExpectation()

Posterior expectation score.

Return type:

A numeric value.

sampleInitial(post_scores, size)

Sample an initial feature set based on a search heuristic.

Return type:

A binary <numpy array> feature set.

setConstraints(constraints, append=False)

Set side oconstraints.

Parameters:
  • constraints (<UBayconstraint>) – A UBayconstraint object describing user-defined constraints. See description UBayconstraint.

  • append (<boolean>) –

    • True: Append a new constraint to the list of present constraints

    • False: Replace all present constraints with the new constraint

setOptim(optim_method, popsize, maxiter)

Set parameters for optimization.

Parameters:
  • optim_method (<string>) – Currently only genetic algorithm (“GA”) possible.

  • popsize (<integer>) – Positive integer for the population size in GA.

  • maxiter (<integer>) – Positive integer for the maximal number of GA iterations.

setWeights(weights, block_list=None, block_matrix=None)

Set prior weights.

Parameters:
  • weights (<list>) – A list of integers defining the prior weights of the features. If a list with only one entry is used, this value is assigned to each feature as prior weight. Block assignment information for features.

  • block_matrix (<np.array>) – Numpy array matrix definint the block assignment information for features.

train()

Train the UBaymodel.

Returns:

  • <pandas dataframe> with the optimal feature set and their names as index

  • <list> of selected feature names