Semantic description and internal validation of clusters for applications in categorical data sets
dc.contributor.advisor | Curtis, Vitor Venceslau | |
dc.contributor.advisor-co | Verri, Filipe Alves Neto | |
dc.contributor.advisor-coLattes | http://lattes.cnpq.br/0145582312635382 | |
dc.contributor.advisorLattes | http://lattes.cnpq.br/1785341067396776 | |
dc.contributor.author | Aquino, Roberto Douglas Guimarães de [UNIFESP] | |
dc.contributor.authorLattes | http://lattes.cnpq.br/2373005809061037 | |
dc.coverage.spatial | Instituto Tecnológico de Aeronáutica | |
dc.date.accessioned | 2024-07-23T10:53:01Z | |
dc.date.available | 2024-07-23T10:53:01Z | |
dc.date.issued | 2024-06-19 | |
dc.description.abstract | In clustering problems whose objective is not based specifically on spatial proximity but rather on feature patterns, traditional cluster validation indices may not be appropriate. This work proposes a tool that performs the description of clusters and can be used as an internal validation index to suggest the most appropriate number of clusters for applications in categorical data sets. To evaluate our index, we also propose a categorical synthetic data generator specifically designed for this application. We tested synthetic and real data sets with different configurations to evaluate the performance of the proposed index in comparison with well-known indexes in the literature. Thus, we demonstrate that the index has great potential to describe clusters and discover the number of most suitable clusters. The synthetic data generator is capable of producing relevant data sets for the internal validation process. | |
dc.description.sponsorship | Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) | |
dc.emailadvisor.custom | curtis@ita.br | |
dc.format.extent | 76 f. | |
dc.identifier.uri | https://hdl.handle.net/11600/71444 | |
dc.language | eng | |
dc.publisher | Universidade Federal de São Paulo | |
dc.rights | info:eu-repo/semantics/openAccess | |
dc.subject | cluster analysis | |
dc.subject | semantic description | |
dc.subject | internal clustering validation index | |
dc.subject | synthetic data | |
dc.title | Semantic description and internal validation of clusters for applications in categorical data sets | |
dc.type | info:eu-repo/semantics/doctoralThesis | |
unifesp.campus | Instituto de Ciência e Tecnologia (ICT) | |
unifesp.graduateProgram | Pesquisa Operacional | |
unifesp.knowledgeArea | Ciência de dados | |
unifesp.researchArea | Ciência de dados |