{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "8f4fe37e",
   "metadata": {},
   "source": [
    "# A quick tour through UBayFS"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c50798f",
   "metadata": {},
   "source": [
    "## Introduction\n",
    "The UBayFS package implements the framework proposed in [Jenul et al. (2022)](https://link.springer.com/article/10.1007/s10994-022-06221-9). UBayFS is an ensemble feature selection technique embedded in a Bayesian statistical framework. The method combines data and user knowledge, where the first is extracted via data-driven ensemble feature selection. The user can control the feature selection by assigning prior weights to features and penalizing specific feature combinations. In particular, the user can define a maximum number of selected features and must-link constraints (features must be selected together) or cannot-link constraints (features must not be selected together). A parameter $\\rho$ regulates the shape of a penalty term accounting for side constraints, where feature sets that violate constraints lead to a lower target value. \n",
    "\n",
    "In this notebook, we use the [Breast Cancer Wisconsin dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html) for demonstration. Specifically, the dataset consists of 569 samples and 30 features. The dataset describes a classification problem, where the aim is to distinguish between malignant and benign cancer based on image data. Features are derived from 10 image characteristics, where each characteristic is represented by three features (summary statistics) in the dataset. For instance, the characteristic *radius* is represented by features *radius mean*, *radius standard deviation*, and *radius worst*.\n",
    "\n",
    "\n",
    "## Requirements and dependencies\n",
    "- numpy>=1.23.5\n",
    "- pandas>=1.5.3\n",
    "- scikit-learn>=1.2.2\n",
    "- scipy>=1.10.0\n",
    "- random\n",
    "- sklearn-features>=1.1.0\n",
    "- mrmr>=0.2.6\n",
    "- pygad>=3.0.1\n",
    "- math\n",
    "\n",
    "To run UBayFS in Python we must import the classes UBaymodel and UBayconstraint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "0c0dddd5",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import sys\n",
    "sys.path.append(\"../../src/UBayFS\")\n",
    "from UBaymodel import UBaymodel\n",
    "from UBayconstraint import UBayconstraint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "80d7a1ac",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = pd.read_csv(\"./data/data.csv\")\n",
    "labels = pd.read_csv(\"./data/labels.csv\").replace((\"M\",\"B\"),(0,1)).astype(int)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4cbea67d",
   "metadata": {},
   "source": [
    "## Background\n",
    "This section summarizes the core parts of UBayFS, where a central part is Bayes' Theorem for two random variables $\\boldsymbol{\\theta}$ and $\\boldsymbol{y}$:\n",
    "$$p(\\boldsymbol{\\theta}|\\boldsymbol{y})\\propto p(\\boldsymbol{y}|\\boldsymbol{\\theta})\\cdot p(\\boldsymbol{\\theta}),$$ where $\\boldsymbol{\\theta}$ represents an importance parameter of single features and $\\boldsymbol{y}$ collects evidence about $\\boldsymbol{\\theta}$ from an ensemble of elementary feature selectors. In the following, the concept will be outlined. \n",
    "\n",
    "### Ensemble feature selection as likelihood\n",
    "The first step in UBayFS is to build $M$ ensembles of elementary feature selectors. Each elementary feature selector $m=1,\\dots,M$ selects features, denoted by a binary membership vector $\\boldsymbol{\\delta}^{(m)} \\in \\{0,1\\}^N$, based on a randomly selected training dataset, where $N$ denotes the total number of features in the dataset. In the binary membership vector $\\boldsymbol{\\delta}^{(m)}$, a component $\\delta_i^{(m)}=1$ indicates that feature $i\\in\\{1,\\dots,N\\}$ is selected, and $\\delta_i^{(m)}=0$ otherwise. Statistically, we interpret the result from each elementary feature selector as a realization from a multinomial distribution with parameters $\\boldsymbol{\\theta}$ and $l$, where $\\boldsymbol{\\theta}\\in[0,1]^N$ defines the success probabilities of sampling each feature in an individual feature selection and $l$ corresponds to the number of features selected in $\\boldsymbol{\\delta}^{(m)}$. Therefore, the joint probability density of the observed data $\\boldsymbol{y} = \\sum\\limits_{m=1}^{M}\\boldsymbol{\\delta}^{(m)}\\in\\{0,\\dots,M\\}^N$ --- the likelihood function --- has the form \n",
    "$$ p(\\boldsymbol{y}|\\boldsymbol{\\theta}) = \\prod\\limits_{m=1}^{M} f_{\\text{mult}}(\\boldsymbol{\\delta}^{(m)};\\boldsymbol{\\theta},l),$$\n",
    "where $f_{\\text{mult}}$ is the probability density function of the multinomial distribution.\n",
    "\n",
    "### Expert knowledge as prior\n",
    "UBayFS includes two types of expert knowledge: prior feature weights and feature set constraints. \n",
    "\n",
    "#### Prior feature weights\n",
    "\n",
    "To introduce expert knowledge about the importance of features, the user may define a vector $\\boldsymbol{\\alpha} = (\\alpha_1,\\dots,\\alpha_N)$, $\\alpha_i>0$ for all $i=1,\\dots,N$, assigning a weight to each feature. High weights indicate that a feature is important. By default, if all features are equally important or no prior weighting is used, $\\boldsymbol{\\alpha}$ is set to the 1-vector of length $N$. With the weighting in place, we assume the a-priori feature importance parameter $\\boldsymbol{\\theta}$ follows a Dirichlet distribution [@R:DirichletReg]\n",
    "$$p(\\boldsymbol{\\theta}) = f_{\\text{Dir}}(\\boldsymbol{\\theta};\\boldsymbol{\\alpha}),$$\n",
    "where the probability density function of the Dirichlet distribution is given as\n",
    "$$f_{\\text{Dir}}(\\boldsymbol{\\theta};\\boldsymbol{\\alpha}) = \\frac{1}{\\text{B}(\\boldsymbol{\\alpha})} \\prod\\limits_{n=1}^N \\theta_n^{\\alpha_n-1},$$\n",
    "where $\\text{B}(.)$ denotes the multivariate Beta function. Generalizations of the Dirichlet distribution [@wong:gdirichlet,@hankin:hyperdirichlet] are also implemented in UBayFS.\n",
    "\n",
    "Since the Dirichlet distribution is the conjugate prior with respect to a multivariate likelihood, the posterior density is given as\n",
    "$$p(\\boldsymbol{\\theta}|\\boldsymbol{y}) \\propto f_{\\text{Dir}}(\\boldsymbol{\\theta};\\boldsymbol{\\alpha}^\\circ),$$\n",
    "with\n",
    "$$\\boldsymbol{\\alpha}^\\circ = \\left( \\alpha_1 + \\sum\\limits_{m=1}^M \\delta_1^{(m)}, \\dots, \\alpha_N + \\sum\\limits_{m=1}^M \\delta_N^{(m)}  \\right)$$\n",
    "representing the posterior parameter vector $\\boldsymbol{\\alpha}^\\circ$.\n",
    "\n",
    "#### Feature set constraints\n",
    "In addition to the prior weighting of features, the UBayFS user can also add different types of constraints to the feature selection:\n",
    "\n",
    "- *max-size constraint*: Maximum number of features that shall be selected.\n",
    "- *must-link constraint*: For a pair of features, either both or none is selected (defined as pairwise constraints, one for each pair of features).\n",
    "- *cannot-link constraint*: Used if a pair of features must not be selected jointly.\n",
    "\n",
    "All constraints can be defined *block-wise* between feature blocks (instead of individual features).\n",
    "Constraints are represented as a linear system of linear inequalities $\\boldsymbol{A}\\boldsymbol{\\delta}-\\boldsymbol{b}\\leq \\boldsymbol{0}$, where $\\boldsymbol{A}\\in\\mathbb{R}^{K\\times N}$ and $\\boldsymbol{b}\\in\\mathbb{R}^K$. $K$ denotes the total number of constraints. For constraint $k \\in 1,..,K$, a feature set $\\boldsymbol{\\delta}$ is admissible only if $\\left(\\boldsymbol{a}^{(k)}\\right)^T\\boldsymbol{\\delta} - b^{(k)} \\leq 0$, leading to the inadmissibility function (penalty term)\n",
    "\n",
    "$$\n",
    "\\kappa_{k,\\rho}(\\boldsymbol{\\delta}) = \\left\\{\n",
    "    \\begin{array}{l l}\n",
    "    0 & \\text{if}~\\left(\\boldsymbol{a}^{(k)}\\right)^T\\boldsymbol{\\delta}\\leq b^{(k)}\\\\\n",
    "    1 & \\text{if}~ \\left(\\boldsymbol{a}^{(k)}\\right)^T\\boldsymbol{\\delta}> b^{(k)} \\land \\rho =\\infty\\\\\n",
    "    \\frac{1-\\xi_{k,\\rho}}{1 + \\xi_{k,\\rho}} & \\text{otherwise},\n",
    "    \\end{array}\n",
    "    \\right.\n",
    "$$\n",
    "    \n",
    "where $\\rho\\in\\mathbb{R}^+ \\cup \\{\\infty\\}$ denotes a relaxation parameter and\n",
    "$\\xi_{k,\\rho} = \\exp\\left(-\\rho \\left(\\left( \\boldsymbol{a}^{(k)}\\right)^T\\boldsymbol{\\delta} - b^{(k)}\\right)\\right)$ defines the exponential term of a logistic function. To handle $K$ different constraints for one feature selection problem, the joint inadmissibility function is given as\n",
    "$$ \\kappa(\\boldsymbol{\\delta})\n",
    "    = 1 - \\prod\\limits_{k=1}^{K} \\left(1 -\\kappa_{k,\\rho}(\\boldsymbol{\\delta})\\right)$$\n",
    "which originates from the idea that $\\kappa = 1$ (maximum penalization) if at least one $\\kappa_{k,\\rho}=1$, while $\\kappa=0$ (no penalization) if all $\\kappa_{k,\\rho}=0$. \n",
    "\n",
    "To obtain an optimal feature set $\\boldsymbol{\\delta}^\\star$, we use a target function $U(\\boldsymbol{\\delta}, \\boldsymbol{\\theta})$ which represents a posterior expected utility of feature sets $\\boldsymbol{\\delta}$ given the posterior feature importance parameter $\\boldsymbol{\\theta}$, regularized by the inadmissibility function $\\kappa(.)$.\n",
    "\n",
    "$$\\mathbb{E}_{\\boldsymbol{\\theta}|\\boldsymbol{y}}[U(\\boldsymbol{\\delta}, \\boldsymbol{\\theta}(\\boldsymbol{y}))] = \\boldsymbol{\\delta}^T \\mathbb{E}_{\\boldsymbol{\\boldsymbol{\\delta}}|\\boldsymbol{y}}[\\boldsymbol{\\theta}(\\boldsymbol{y})]-\\lambda\\kappa(\\boldsymbol{\\delta})\\longrightarrow \\underset{\\boldsymbol{\\delta}\\in\\{0,1\\}^N}{\\text{arg max}}\n",
    "$$\n",
    "    \n",
    "Since an exact optimization is impossible due to the non-linear function $\\kappa$, we use a genetic algorithm to find an appropriate feature set. In detail, the genetic algorithm is initialized via a Greedy algorithm and computes combinations of the given feature sets with regard to a fitness function in each iteration.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "adaf3378",
   "metadata": {},
   "source": [
    "## Application of UBayFS\n",
    " \n",
    "### Ensemble Training\n",
    "The class ``UBaymodel()`` initializes the UBayFS model and trains an ensemble of elementary feature selectors. The training dataset and target are initialized with the arguments ``data`` and ``target``. Although the UBayFS concept permits unsupervised, multiclass, or regression setups, the current implementation supports binary target and regression variables only. While ``M`` defines the ensemble size (number of elementary feature selectors), the types of the elementary feature selectors is set via ``method``. The mRMR feature selector is implemented as baseline and can be called directlz with \"mrmr\". In general, the ``method`` argument allows for each self-implemented feature selection function with the arguments ``X`` (numpy array describing the data), ``y`` (numpy array describing the target), and ``n`` (describing the number of features that shall be selected. The function must return the indices of the selected features. Some examples are shown below.\n",
    "\n",
    "Each ensemble model is trained on a random subset comprising ``tt_split``$\\cdot 100$ percent of the train data. Using the argument ``prior_model`` the user specifies whether the standard Dirichlet distribution or a generalized variant should be used as prior model. Furthermore, the number of features selected in each ensemble can be controlled by the parameter ``nr_features``.\n",
    "\n",
    "The class ``UBayconstraint`` provides an easy way to define side constraints for the model. The generated object can be easily added to the UBaymodel by assigning it to the argument ``constraints`` which is by default ``None``.\n",
    "\n",
    "For the standard UBayFS initialization, all prior feature weights are set to 0.01, and only the required ``max_size`` constraint is included. Ensemble counts indicate how often a feature was selected over the ensemble feature selections. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "32e591a6",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = UBaymodel(data = data,\n",
    "                  target = labels,\n",
    "                  feat_names = data.columns,\n",
    "                  M = 100,tt_split = 0.75,\n",
    "                  nr_features = 10,\n",
    "                  method = [\"mrmr\"],\n",
    "                  weights = [0.01],l = 1,\n",
    "                  constraints = UBayconstraint(rho=np.array([1]), \n",
    "                                               constraint_types=[\"max_size\"], \n",
    "                                               constraint_vars=[3], \n",
    "                                               num_elements=data.shape[1]),\n",
    "                  optim_method = \"GA\",\n",
    "                  popsize = 100,\n",
    "                  maxiter = 100,\n",
    "                  random_state = 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "06a2ea5d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(30,)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.shape(model.getWeights())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "3bfb085b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mean.radius</th>\n",
       "      <th>mean.texture</th>\n",
       "      <th>mean.perimeter</th>\n",
       "      <th>mean.area</th>\n",
       "      <th>mean.smoothness</th>\n",
       "      <th>mean.compactness</th>\n",
       "      <th>mean.concavity</th>\n",
       "      <th>mean.concave.points</th>\n",
       "      <th>mean.symmetry</th>\n",
       "      <th>mean.fractal.dimension</th>\n",
       "      <th>...</th>\n",
       "      <th>worst.radius</th>\n",
       "      <th>worst.texture</th>\n",
       "      <th>worst.perimeter</th>\n",
       "      <th>worst.area</th>\n",
       "      <th>worst.smoothness</th>\n",
       "      <th>worst.compactness</th>\n",
       "      <th>worst.concavity</th>\n",
       "      <th>worst.concave.points</th>\n",
       "      <th>worst.symmetry</th>\n",
       "      <th>worst.fractal.dimension</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>ensemble count</th>\n",
       "      <td>100</td>\n",
       "      <td>0</td>\n",
       "      <td>100</td>\n",
       "      <td>100</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>100</td>\n",
       "      <td>100</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>100</td>\n",
       "      <td>0</td>\n",
       "      <td>100</td>\n",
       "      <td>100</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>98</td>\n",
       "      <td>100</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>prior weight</th>\n",
       "      <td>0.01</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.01</td>\n",
       "      <td>...</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.01</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2 rows × 30 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "               mean.radius mean.texture mean.perimeter mean.area  \\\n",
       "ensemble count         100            0            100       100   \n",
       "prior weight          0.01         0.01           0.01      0.01   \n",
       "\n",
       "               mean.smoothness mean.compactness mean.concavity  \\\n",
       "ensemble count               0                0            100   \n",
       "prior weight              0.01             0.01           0.01   \n",
       "\n",
       "               mean.concave.points mean.symmetry mean.fractal.dimension  ...  \\\n",
       "ensemble count                 100             0                      0  ...   \n",
       "prior weight                  0.01          0.01                   0.01  ...   \n",
       "\n",
       "               worst.radius worst.texture worst.perimeter worst.area  \\\n",
       "ensemble count          100             0             100        100   \n",
       "prior weight           0.01          0.01            0.01       0.01   \n",
       "\n",
       "               worst.smoothness worst.compactness worst.concavity  \\\n",
       "ensemble count                0                 0              98   \n",
       "prior weight               0.01              0.01            0.01   \n",
       "\n",
       "               worst.concave.points worst.symmetry worst.fractal.dimension  \n",
       "ensemble count                  100              0                       0  \n",
       "prior weight                   0.01           0.01                    0.01  \n",
       "\n",
       "[2 rows x 30 columns]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame(np.stack([np.sum(model.ensemble_matrix, axis=0), model.getWeights()], axis=0), \n",
    "             columns = model.feat_names, index = [\"ensemble count\", \"prior weight\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition to ``mrmr``, we add a function ``decision_tree()``, that computes features based on decision tree importances. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.tree import DecisionTreeClassifier\n",
    "def decision_tree(X,y,n):\n",
    "    clf = DecisionTreeClassifier()\n",
    "    clf.fit(X, y)\n",
    "\n",
    "    feat_importance = clf.tree_.compute_feature_importances(normalize=False)\n",
    "    return np.flip(np.argsort(feat_importance))[:n]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = UBaymodel(data = data,\n",
    "                  target = labels,\n",
    "                  feat_names = data.columns,\n",
    "                  M = 100,tt_split = 0.75,\n",
    "                  nr_features = 10,\n",
    "                  method = [\"mrmr\", decision_tree],\n",
    "                  weights = [0.01],l = 1,\n",
    "                  constraints = UBayconstraint(rho=np.array([1]), \n",
    "                                               constraint_types=[\"max_size\"], \n",
    "                                               constraint_vars=[3], \n",
    "                                               num_elements=data.shape[1]),\n",
    "                  optim_method = \"GA\",\n",
    "                  popsize = 100,\n",
    "                  maxiter = 100,\n",
    "                  random_state = 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Examples for more feature selection methods are:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# mrmr (is already implemented as baseline - this code is just to illustrate the concept of using own functions)\n",
    "import mrmr\n",
    "def mrmr_fs(X,y,n):\n",
    "    ranks = mrmr.mrmr_classif(pd.DataFrame(X), y, n, show_progress=False)\n",
    "    return ranks\n",
    "\n",
    "# recursive feature elimination with logistic regression classifier\n",
    "from sklearn.feature_selection import RFECV\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.model_selection import StratifiedKFold\n",
    "\n",
    "def rfe(X,y,n):\n",
    "\n",
    "    clf = LogisticRegression()\n",
    "    cv = StratifiedKFold(5)\n",
    "\n",
    "    rfecv = RFECV(\n",
    "        estimator=clf,\n",
    "        step=1,\n",
    "        cv=cv,\n",
    "        scoring=\"accuracy\",\n",
    "        min_features_to_select=n,\n",
    "    )\n",
    "    rfecv.fit(X, y)\n",
    "    return np.where(rfecv.ranking_==1)[0]\n",
    "\n",
    "# RENT feature selection with default setup - n is not used in this method\n",
    "from RENT import RENT\n",
    "\n",
    "def RENT_fs(X,y,n):\n",
    "    my_C_params = [0.1, 1]\n",
    "    my_l1_ratios = [0.5, 0.9]\n",
    "\n",
    "    model = RENT.RENT_Classification(data=pd.DataFrame(X), \n",
    "                                     target=y, \n",
    "                                     C=my_C_params, \n",
    "                                     l1_ratios=my_l1_ratios)\n",
    "    model.train()\n",
    "    selected_features = model.select_features()\n",
    "    return selected_features"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8545dfb0",
   "metadata": {},
   "source": [
    "### User knowledge\n",
    "Using the function ``setWeights()`` the user is able to change the feature weights from the standard initialization to desired values. In our example, we assign equal weights to features originating from the same image characteristic. Weights can be on an arbitrary scale. As it is difficult to specify prior weights in real-life applications, we suggest to define them on a normalized scale.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "4acfd145",
   "metadata": {},
   "outputs": [],
   "source": [
    "weights = np.tile(np.array([10,15,20,16,15,10,12,17,21,14]),3)\n",
    "strength = 1\n",
    "weights = weights * strength / np.sum(weights)\n",
    "model.setWeights(weights)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f066ec9",
   "metadata": {},
   "source": [
    "In addition to prior weights, feature set constraints may be specified. Internally, constraints are implemented via the class ``UBayconstraint``. The input ``rho`` corresponds to the relaxation parameter of the admissibility function. Further, ``constraint_types`` consists of a list, where all constraint types are defined. Then, with ``constraint_vars``, the user specifies details about the constraint: for max-size, the number of features to select is provided, while for must-link and cannot-link, the list of feature indices to be linked must be provided. Each list entry corresponds to one constraint in ``constraint_types``. For block constraints, information about the block structure is included either with ``block_list``or ``block_matrix`` - if both arguments are ``None``, feature-wise constraints are generated.\n",
    "\n",
    "Applying ``get_constraints()`` demonstrates that, the matrix ``A`` has ten rows to represent four constraints. While *max-size* and *cannot-link* can be expressed in one equation each, *must-link* is a pairwise constraint. In specific, the *must-link* constraint between $n$ features produces $\\frac{n!}{(n-2)!}$ elementary constraints. Hence, six equations represent the *must-link* constraint. The ``UBaymodel``function ``setConstraints()`` integrates the constraints into the UBayFS model. With the argument ``append=True``, the constraints are added to existing constraints. Otherwise the constraints are overwritten. In this case we define a new ``max_size`` constraint and overwrite the old one. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "522e4441",
   "metadata": {},
   "outputs": [],
   "source": [
    "constraints = UBayconstraint(rho=np.array([np.Inf, 0.1, 1, 1]), \n",
    "                             constraint_types=[\"max_size\", \"must_link\", \"cannot_link\", \"cannot_link\"], \n",
    "                             constraint_vars=[10, [0,10,20], [0,9], [19,22,23]], \n",
    "                             num_elements=data.shape[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "0cd87260",
   "metadata": {},
   "outputs": [],
   "source": [
    "model.setConstraints(constraints, append=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "c5a0527b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'A': array([[ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,\n",
       "          1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,\n",
       "          1.,  1.,  1.,  1.],\n",
       "        [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1.,  0.,  0.,\n",
       "          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,\n",
       "          0.,  0.,  0.,  0.],\n",
       "        [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,\n",
       "          0.,  0.,  0.,  0.,  0.,  0.,  0., -1.,  0.,  0.,  0.,  0.,  0.,\n",
       "          0.,  0.,  0.,  0.],\n",
       "        [-1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,\n",
       "          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,\n",
       "          0.,  0.,  0.,  0.],\n",
       "        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,\n",
       "          0.,  0.,  0.,  0.,  0.,  0.,  0., -1.,  0.,  0.,  0.,  0.,  0.,\n",
       "          0.,  0.,  0.,  0.],\n",
       "        [-1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,\n",
       "          0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,\n",
       "          0.,  0.,  0.,  0.],\n",
       "        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1.,  0.,  0.,\n",
       "          0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,\n",
       "          0.,  0.,  0.,  0.],\n",
       "        [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,\n",
       "          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,\n",
       "          0.,  0.,  0.,  0.],\n",
       "        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,\n",
       "          0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  1.,  1.,  0.,  0.,\n",
       "          0.,  0.,  0.,  0.]]),\n",
       " 'b': array([10.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.]),\n",
       " 'rho': array([inf, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 1. , 1. ]),\n",
       " 'block_matrix': array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
       "         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])}"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "constraints.get_constraints()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6416a763",
   "metadata": {},
   "source": [
    "### Optimization and evaluation\n",
    "A genetic algorithm, described by [@givens:compstat], searches for the optimal feature set in the UBayFS framework. Using ``setOptim()`` we update the genetic algorithm parameters. Furthermore, ``popsize`` indicates the number of candidate feature sets created in each iteration, and ``maxiter`` is the number of iterations. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "412cac99",
   "metadata": {},
   "outputs": [],
   "source": [
    "model.setOptim(optim_method=\"GA\", \n",
    "               popsize=100, \n",
    "               maxiter=200)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6ddf935",
   "metadata": {},
   "source": [
    "At this point, we have initialized prior weights, constraints, and the optimization procedure --- we can now train the UBayFS model using the function ``train()``, relying on a genetic algorithm.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "988c3d26",
   "metadata": {},
   "outputs": [],
   "source": [
    "result = model.train()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['mean.radius',\n",
       " 'mean.area',\n",
       " 'mean.concave.points',\n",
       " 'radius.error',\n",
       " 'area.error',\n",
       " 'worst.radius',\n",
       " 'worst.area',\n",
       " 'worst.concavity',\n",
       " 'worst.concave.points',\n",
       " 'worst.fractal.dimension']"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3defbd6a",
   "metadata": {},
   "source": [
    "After training the model, we receive a feature selection result. The final feature set and its additional properties can be evaluated with ``evaluateFS()``:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "3c9622ab",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'cardinality': 10,\n",
       " 'total utility': 0.567,\n",
       " 'posterior feature utility': 0.567,\n",
       " 'admissibility': 1.0,\n",
       " 'number of violated constraints': 0,\n",
       " 'average feature correlation': 0.643}"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model.evaluateFS(result[0].values[:,0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "146fc8d4",
   "metadata": {},
   "source": [
    "The output contains the following information:\n",
    "\n",
    "* **cardinality**: number of selected features\n",
    "* **log total utility**: value of the target function for optimization\n",
    "* **log posterior feature utility**: cumulated importances of selected features before substracting a penalization term\n",
    "* **log admissibility**: if 0, all constraints are fulfilled, otherwise at least one constraint is violated\n",
    "* **number violated constraints**: number of violated constraints\n",
    "* **avg feature correlation**: average correlation between features in dataset"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}