Skip to contents

Build a data structure for UBayFS and train an ensemble of elementary feature selectors.

Usage

build.UBaymodel(
  data,
  target,
  M = 100,
  tt_split = 0.75,
  nr_features = "auto",
  method = "mRMR",
  prior_model = "dirichlet",
  weights = 1,
  constraints = NULL,
  lambda = 1,
  optim_method = "GA",
  popsize = 50,
  maxiter = 100,
  shiny = FALSE,
  ...
)

Arguments

data

a matrix of input data

target

a vector of input labels; for binary problems a factor variable should be used

M

the number of elementary models to be trained in the ensemble

tt_split

the ratio of samples drawn for building an elementary model (train-test-split)

nr_features

number of features to select in each elementary model; if "auto" a randomized number of features is used in each elementary model

method

a vector denoting the method(s) used as elementary models; options: `mRMR`, `laplace` (Laplacian score) Also self-defined functions are possible methods; they must have the arguments X (data), y (target), n (number of features) and name (name of the function). For more details see examples.

prior_model

a string denoting the prior model to use; options: `dirichlet`, `wong`, `hankin`; `hankin` is the most general prior model, but also the most time consuming

weights

the vector of user-defined prior weights for each feature

constraints

a list containing a relaxed system `Ax<=b` of user constraints, given as matrix `A`, vector `b` and vector or scalar `rho` (relaxation parameter). At least one max-size constraint must be contained. For details, see buildConstraints.

lambda

a positive scalar denoting the overall strength of the constraints

optim_method

the method to evaluate the posterior distribution. Currently, only the option `GA` (genetic algorithm) is supported.

popsize

size of the initial population of the genetic algorithm for model optimization

maxiter

maximum number of iterations of the genetic algorithm for model optimization

shiny

TRUE indicates that the function is called from Shiny dashboard

...

additional arguments

Value

a `UBaymodel` object containing the following list elements:

  • `data` - the input dataset

  • `target` - the input target

  • `lambda` - the input lambda value (constraint strength)

  • `prior_model` - the chosen prior model

  • `ensemble.params` - information about input and output of ensemble feature selection

  • `constraint.params` - parameters representing the constraints

  • `user.params` - parameters representing the user's prior knowledge

  • `optim.params` - optimization parameters

Details

The function aggregates input parameters for UBayFS - including data, parameters defining ensemble and user knowledge and parameters specifying the optimization procedure - and trains the ensemble model.

Examples

# build a UBayFS model using Breast Cancer Wisconsin dataset
data(bcw) # dataset
c <- buildConstraints(constraint_types = "max_size",
                      constraint_vars = list(10),
                      num_elements = ncol(bcw$data),
                      rho = 1) # prior constraints
w <- rep(1, ncol(bcw$data)) # weights
model <- build.UBaymodel(
                     data = bcw$data,
                     target = bcw$labels,
                     constraints = c,
                     weights = w
)

# use a function computing a decision tree as input
library("rpart")
decision_tree <- function(X, y, n, name = "tree"){
rf_data = as.data.frame(cbind(y, X))
colnames(rf_data) <- make.names(colnames(rf_data))
tree = rpart::rpart(y~., data = rf_data)
return(list(ranks= which(colnames(X) %in% names(tree$variable.importance)[1:n]),
           name = name))
}

model <- build.UBaymodel(
                     data = bcw$data,
                     target = bcw$labels,
                     constraints = c,
                     weights = w,
                     method = decision_tree
)

# include block-constraints
c_block <- buildConstraints(constraint_types = "max_size",
                            constraint_vars = list(2),
                            num_elements = length(bcw$blocks),
                            rho = 10,
                            block_list = bcw$blocks)

model <- setConstraints(model, c_block)