Navegando por Palavras-chave "Label noise"
Agora exibindo 1 - 3 de 3
Resultados por página
Opções de Ordenação
- ItemSomente MetadadadosEffect of label noise in the complexity of classification problems(Elsevier B.V., 2015-07-21) Garcia, Luis P. F.; Carvalho, Andre C. P. L. F. de; Lorena, Ana C. [UNIFESP]; Universidade de São Paulo (USP); Universidade Federal de São Paulo (UNIFESP)Noisy data are common in real-World problems and may have several causes, like inaccuracies, distortions or contamination during data collection, storage and/or transmission. the presence of noise in data can affect the complexity of classification problems, making the discrimination of objects from different classes more difficult, and requiring more complex decision boundaries for data separation. in this paper, we investigate how noise affects the complexity of classification problems, by monitoring the sensitivity of several indices of data complexity in the presence of different label noise levels. To characterize the complexity of a classification dataset, we use geometric, statistical and structural measures extracted from data. the experimental results show that some measures are more sensitive than others to the addition of noise in a dataset These measures can be used in the development of new preprocessing techniques for noise identification and novel label noise tolerant algorithms. We thereby show preliminary results on a new filter for noise identification, which is based on two of the complexity measures which were more sensitive to the presence of label noise. (C) 2015 Elsevier B.V. All rights reserved.
- ItemSomente MetadadadosEnsembles of label noise filters: a ranking approach(Springer, 2016) Garcia, Luis P. F.; Lorena, Ana C. [UNIFESP]; Matwin, Stan; de Carvalho, Andre C. P. L. F.Label noise can be a major problem in classification tasks, since most machine learning algorithms rely on data labels in their inductive process. Thereupon, various techniques for label noise identification have been investigated in the literature. The bias of each technique defines how suitable it is for each dataset. Besides, while some techniques identify a large number of examples as noisy and have a high false positive rate, others are very restrictive and therefore not able to identify all noisy examples. This paper investigates how label noise detection can be improved by using an ensemble of noise filtering techniques. These filters, individual and ensembles, are experimentally compared. Another concern in this paper is the computational cost of ensembles, once, for a particular dataset, an individual technique can have the same predictive performance as an ensemble. In this case the individual technique should be preferred. To deal with this situation, this study also proposes the use of meta-learning to recommend, for a new dataset, the best filter. An extensive experimental evaluation of the use of individual filters, ensemble filters and meta-learning was performed using public datasets with imputed label noise. The results show that ensembles of noise filters can improve noise filtering performance and that a recommendation system based on meta-learning can successfully recommend the best filtering technique for new datasets. A case study using a real dataset from the ecological niche modeling domain is also presented and evaluated, with the results validated by an expert.
- ItemSomente MetadadadosParticle competition and cooperation for semi-supervised learning with label noise(Elsevier B.V., 2015-07-21) Breve, Fabricio A.; Zhao, Liang; Quiles, Marcos G. [UNIFESP]; São Paulo State Univ UNESP; Universidade de São Paulo (USP); Universidade Federal de São Paulo (UNIFESP)Semi-supervised learning methods are usually employed in the classification of data sets where only a small subset of the data items is labeled. in these scenarios, label noise is a crucial issue, since the noise may easily spread to a large portion or even the entire data set, leading to major degradation in classification accuracy. Therefore, the development of new techniques to reduce the nasty effects of label noise in semi-supervised learning is a vital issue. Recently, a graph-based semi-supervised learning approach based on particle competition and cooperation was developed. in this model, particles walk in the graphs constructed from the data sets. Competition takes place among particles representing different class labels, while the cooperation occurs among particles with the same label. This paper presents a new particle competition and cooperation algorithm, specifically designed to increase the robustness to the presence of label noise, improving its label noise tolerance. Different from other methods, the proposed one does not require a separate technique to deal with label noise. It performs classification of unlabeled nodes and reclassification of the nodes affected by label noise in a unique process. Computer simulations show the classification accuracy of the proposed method when applied to some artificial and real-world data sets, in which we introduce increasing amounts of label noise. the classification accuracy is compared to those achieved by previous particle competition and cooperation algorithms and other representative graph-based semi-supervised learning methods using the same scenarios. Results show the effectiveness of the proposed method. (C) 2015 Elsevier B.V. All rights reserved.