Build an ensemble for UBayFS
build.UBaymodel.Rd
Build a data structure for UBayFS and train an ensemble of elementary feature selectors.
Usage
build.UBaymodel(
data,
target,
M = 100,
tt_split = 0.75,
nr_features = "auto",
method = "mRMR",
prior_model = "dirichlet",
weights = 1,
constraints = NULL,
lambda = 1,
optim_method = "GA",
popsize = 50,
maxiter = 100,
shiny = FALSE,
...
)
Arguments
- data
a matrix of input data
- target
a vector of input labels; for binary problems a factor variable should be used
- M
the number of elementary models to be trained in the ensemble
- tt_split
the ratio of samples drawn for building an elementary model (train-test-split)
- nr_features
number of features to select in each elementary model; if "auto" a randomized number of features is used in each elementary model
- method
a vector denoting the method(s) used as elementary models; options: `mRMR`, `laplace` (Laplacian score) Also self-defined functions are possible methods; they must have the arguments X (data), y (target), n (number of features) and name (name of the function). For more details see examples.
- prior_model
a string denoting the prior model to use; options: `dirichlet`, `wong`, `hankin`; `hankin` is the most general prior model, but also the most time consuming
- weights
the vector of user-defined prior weights for each feature
- constraints
a list containing a relaxed system `Ax<=b` of user constraints, given as matrix `A`, vector `b` and vector or scalar `rho` (relaxation parameter). At least one max-size constraint must be contained. For details, see buildConstraints.
- lambda
a positive scalar denoting the overall strength of the constraints
- optim_method
the method to evaluate the posterior distribution. Currently, only the option `GA` (genetic algorithm) is supported.
- popsize
size of the initial population of the genetic algorithm for model optimization
- maxiter
maximum number of iterations of the genetic algorithm for model optimization
- shiny
TRUE indicates that the function is called from Shiny dashboard
- ...
additional arguments
Value
a `UBaymodel` object containing the following list elements:
`data` - the input dataset
`target` - the input target
`lambda` - the input lambda value (constraint strength)
`prior_model` - the chosen prior model
`ensemble.params` - information about input and output of ensemble feature selection
`constraint.params` - parameters representing the constraints
`user.params` - parameters representing the user's prior knowledge
`optim.params` - optimization parameters
Details
The function aggregates input parameters for UBayFS - including data, parameters defining ensemble and user knowledge and parameters specifying the optimization procedure - and trains the ensemble model.
Examples
# build a UBayFS model using Breast Cancer Wisconsin dataset
data(bcw) # dataset
c <- buildConstraints(constraint_types = "max_size",
constraint_vars = list(10),
num_elements = ncol(bcw$data),
rho = 1) # prior constraints
w <- rep(1, ncol(bcw$data)) # weights
model <- build.UBaymodel(
data = bcw$data,
target = bcw$labels,
constraints = c,
weights = w
)
# use a function computing a decision tree as input
library("rpart")
decision_tree <- function(X, y, n, name = "tree"){
rf_data = as.data.frame(cbind(y, X))
colnames(rf_data) <- make.names(colnames(rf_data))
tree = rpart::rpart(y~., data = rf_data)
return(list(ranks= which(colnames(X) %in% names(tree$variable.importance)[1:n]),
name = name))
}
model <- build.UBaymodel(
data = bcw$data,
target = bcw$labels,
constraints = c,
weights = w,
method = decision_tree
)
# include block-constraints
c_block <- buildConstraints(constraint_types = "max_size",
constraint_vars = list(2),
num_elements = length(bcw$blocks),
rho = 10,
block_list = bcw$blocks)
model <- setConstraints(model, c_block)