Feature selection is a key task in statistical pattern recognition. Most feature selection algorithms have been proposed based on specific objective functions which are usually intuitively reasonable but can sometimes be far from the more basic objectives of the feature selection. This paper describes how to select features such that the basic objectives, e.g., classification or clustering accuracies, can be optimized in a more direct way. The analysis requires that the contribution of each feature to the evaluation metrics can be quantitatively described by some score function. Motivated by the conditional independence structure in probabilistic distributions, the analysis uses a leave-one-out feature selection algorithm which provides an approximate solution. The leave-one- out algorithm improves the conventional greedy backward elimination algorithm by preserving more interactions among features in the selection process, so that the various feature selection objectives can be optimized in a unified way. Experiments on six real-world datasets with different feature evaluation metrics have shown that this algorithm outperforms popular feature selection algorithms in most situations.
Principal component analysis(PCA) is fundamental in many pattern recognition applications.Much research has been performed to minimize the reconstruction error in L1-norm based reconstruction error minimization(L1-PCA-REM) since conventional L2-norm based PCA(L2-PCA) is sensitive to outliers.Recently,the variance maximization formulation of PCA with L1-norm(L1-PCA-VM) has been proposed,where new greedy and nongreedy solutions are developed.Armed with the gradient ascent perspective for optimization,we show that the L1-PCA-VM formulation is problematic in learning principal components and that only a greedy solution can achieve robustness motivation,which are verified by experiments on synthetic and real-world datasets.