Automatic feature engineering for regression models with machine learning: An evolutionary computation and statistics hybrid

dc.citation.volume430
dc.contributor.authorde Melo, Vinicius Veloso [UNIFESP]
dc.contributor.authorBanzhaf, Wolfgang
dc.coverageNew York
dc.date.accessioned2020-07-08T13:09:36Z
dc.date.available2020-07-08T13:09:36Z
dc.date.issued2018
dc.description.abstractSymbolic Regression (SR) is a well-studied task in Evolutionary Computation (EC), where adequate free-form mathematical models must be automatically discovered from observed data. Statisticians, engineers, and general data scientists still prefer traditional regression methods over EC methods because of the solid mathematical foundations, the interpretability of the models, and the lack of randomness, even though such deterministic methods tend to provide lower quality prediction than stochastic EC methods. On the other hand, while EC solutions can be big and uninterpretable, they can be created with less bias, finding high-quality solutions that would be avoided by human researchers. Another interesting possibility is using EC methods to perform automatic feature engineering for a deterministic regression method instead of evolving a single modelen
dc.description.abstractthis may lead to smaller solutions that can be easy to understand. In this contribution, we evaluate an approach called Kaizen Programming (KP) to develop a hybrid method employing EC and Statistics. While the EC method builds the features, the statistical method efficiently builds the models, which are also used to provide the importance of the featuresen
dc.description.abstractthus, features are improved over the iterations resulting in better models. Here we examine a large set of benchmark SR problems known from the EC literature. Our experiments show that KP out-performs traditional Genetic Programming - a popular EC method for SR - and also shows improvements over other methods, including other hybrids and well-known statistical and Machine Learning (ML) ones. More in line with ML than EC approaches, KP is able to provide high-quality solutions while requiring only a small number of function evaluations. (C) 2017 Elsevier Inc. All rights reserved.en
dc.description.affiliationFed Univ Sao Paulo UNIFESP, Inst Sci & Technol ICT, Sao Jose Dos Campos, SP, Brazil
dc.description.affiliationMichigan State Univ, Dept Comp Sci & Engn, E Lansing, MI 48864 USA
dc.description.affiliationMichigan State Univ, BEACON Ctr Study Evolut Act, E Lansing, MI 48864 USA
dc.description.affiliationUnifespFed Univ Sao Paulo UNIFESP, Inst Sci & Technol ICT, Sao Jose Dos Campos, SP, Brazil
dc.description.sourceWeb of Science
dc.description.sponsorshipBrazilian Government CNPq (Universal) [486950/2013-1]
dc.description.sponsorshipCAPES (Science without Borders) [12180-13-0]
dc.description.sponsorshipCanada's NSERC Discovery grant RGPIN [283304-2012]
dc.format.extent287-313
dc.identifierhttp://dx.doi.org/10.1016/j.ins.2017.11.041
dc.identifier.citationInformation Sciences. New York, v. 430, p. 287-313, 2018.
dc.identifier.doi10.1016/j.ins.2017.11.041
dc.identifier.issn0020-0255
dc.identifier.urihttps://repositorio.unifesp.br/handle/11600/54092
dc.identifier.wosWOS:000424174700020
dc.language.isoeng
dc.publisherElsevier Science Inc
dc.relation.ispartofInformation Sciences
dc.rightsinfo:eu-repo/semantics/restrictedAccess
dc.subjectFeature engineeringen
dc.subjectMachine learningen
dc.subjectSymbolic regressionen
dc.subjectKaizen programmingen
dc.subjectLinear regressionen
dc.subjectGenetic programmingen
dc.subjectHybriden
dc.titleAutomatic feature engineering for regression models with machine learning: An evolutionary computation and statistics hybriden
dc.typeinfo:eu-repo/semantics/article
Arquivos
Coleções