Mining from ambiguous data is very important in data mining. This paper discusses one of the tasks for mining from ambiguous data known as multi-instance problem. In multi-instance problem, each pattern is a labeled bag that consists of a number of unlabeled instances. A bag is negative if all instances in it are negative. A bag is positive if it has at least one positive instance. Because the instances in the positive bag are not labeled, each positive bag is an ambiguous. The mining aim is to classify unseen bags. The main idea of existing multi-instance algorithms is to find true positive instances in positive bags and convert the multi-instance problem to the supervised problem, and get the labels of test bags according to predict the labels of unknown instances. In this paper, we aim at mining the multi-instance data from another point of view, i.e., excluding the false positive instances in positive bags and predicting the label of an entire unknown bag. We propose an algorithm called Multi-Instance Covering kNN (MICkNN) for mining from multi-instance data. Briefly, constructive covering algorithm is utilized to restructure the structure of the original multi-instance data at first. Then, the kNN algorithm is applied to discriminate the false positive instances. In the test stage, we label the tested bag directly according to the similarity between the unseen bag and sphere neighbors obtained from last two steps. Experimental results demonstrate the proposed algorithm is competitive with most of the state-of-the-art multi-instance methods both in classification accuracy and running time.
The concept of deep learning has been applied to many domains, but the definition of a suitable problem depth has not been sufficiently explored. In this study, we propose a new Hierarchical Covering Algorithm (HCA) method to determine the levels of a hierarchical structure based on the Covering Algorithm (CA). The CA constructs neural networks based on samples' own characteristics, and can effectively handle multi-category classification and large-scale data. Further, we abstract characters based on the CA to automatically embody the feature of a deep structure. We apply CA to construct hidden nodes at the lower level, and define a fuzzy equivalence relation R on upper spaces to form a hierarchical architecture based on fuzzy quotient space theory. The covering tree naturally becomes from R. HCA experiments performed on MNIST dataset show that the covering tree embodies the deep architecture of the problem, and the effects of a deep structure are shown to be better than having a single level.