Semantic description and internal validation of clusters for applications in categorical data sets

dc.contributor.advisorCurtis, Vitor Venceslau
dc.contributor.advisor-coVerri, Filipe Alves Neto
dc.contributor.advisor-coLatteshttp://lattes.cnpq.br/0145582312635382
dc.contributor.advisorLatteshttp://lattes.cnpq.br/1785341067396776
dc.contributor.authorAquino, Roberto Douglas Guimarães de [UNIFESP]
dc.contributor.authorLatteshttp://lattes.cnpq.br/2373005809061037
dc.coverage.spatialInstituto Tecnológico de Aeronáutica
dc.date.accessioned2024-07-23T10:53:01Z
dc.date.available2024-07-23T10:53:01Z
dc.date.issued2024-06-19
dc.description.abstractIn clustering problems whose objective is not based specifically on spatial proximity but rather on feature patterns, traditional cluster validation indices may not be appropriate. This work proposes a tool that performs the description of clusters and can be used as an internal validation index to suggest the most appropriate number of clusters for applications in categorical data sets. To evaluate our index, we also propose a categorical synthetic data generator specifically designed for this application. We tested synthetic and real data sets with different configurations to evaluate the performance of the proposed index in comparison with well-known indexes in the literature. Thus, we demonstrate that the index has great potential to describe clusters and discover the number of most suitable clusters. The synthetic data generator is capable of producing relevant data sets for the internal validation process.
dc.description.sponsorshipCoordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
dc.emailadvisor.customcurtis@ita.br
dc.format.extent76 f.
dc.identifier.urihttps://hdl.handle.net/11600/71444
dc.languageeng
dc.publisherUniversidade Federal de São Paulo
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectcluster analysis
dc.subjectsemantic description
dc.subjectinternal clustering validation index
dc.subjectsynthetic data
dc.titleSemantic description and internal validation of clusters for applications in categorical data sets
dc.typeinfo:eu-repo/semantics/doctoralThesis
unifesp.campusInstituto de Ciência e Tecnologia (ICT)
unifesp.graduateProgramPesquisa Operacional
unifesp.knowledgeAreaCiência de dados
unifesp.researchAreaCiência de dados
Arquivos
Pacote Original
Agora exibindo 1 - 1 de 1
Carregando...
Imagem de Miniatura
Nome:
PhD Thesis vITA.pdf
Tamanho:
3.05 MB
Formato:
Adobe Portable Document Format
Descrição:
Licença do Pacote
Agora exibindo 1 - 1 de 1
Carregando...
Imagem de Miniatura
Nome:
license.txt
Tamanho:
5.55 KB
Formato:
Item-specific license agreed upon to submission
Descrição: