Applying optimized hierarchical NCM classification to public purchases of products in Brazil

dc.contributor.advisorXavier Júnior, João Carlos
dc.contributor.advisorLatteshttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4758203U5pt_BR
dc.contributor.authorAlves Sobrinho, Pitágoras de Azevedo
dc.contributor.authorLatteshttp://lattes.cnpq.br/0435510237375618pt_BR
dc.contributor.referees1Oliveira, Marcel Vinicius Medeiros
dc.contributor.referees1Latteshttp://lattes.cnpq.br/1756952696097255pt_BR
dc.contributor.referees2Santos, Ilueny Constâncio Chaves dos
dc.contributor.referees2Latteshttp://lattes.cnpq.br/8930351118408164pt_BR
dc.date.accessioned2022-07-04T14:51:29Z
dc.date.available2022-07-04T14:51:29Z
dc.date.issued2022-06-15
dc.description.abstractThe use of free text to categorize any type of entity causes, in most cases, difficulties related to the identification of such entities. In the Electronic Fiscal Receipt (“Nota Fiscal Eletrônica”, NF-e), issued for all public purchases in Brazil, products are categorized within the Mercosul Common Nomenclature (NCM). Such an identifier is necessary to calculate taxes, but it is often filled in wrongly, which makes it difficult to detect irregularities in prices and monitor public expenditures. In this context, an automatic product categorization system was developed based on the textual descriptions present in the NF-e. It consists of a categorization tree that follows the NCM product hierarchy, using the Local Classifier per Parent Node pattern. Each node in the tree is trained to encode the textual descriptions in Document Embeddings and then use a supervised classification algorithm to decide the NCM code. Tree nodes are optimized by selecting classification algorithms as well as parameters, testing the performance of various random configurations. In the results, the hierarchical classification presented a higher F1 score than the flat classification experiments and the error propagation problem was mitigated.pt_BR
dc.description.resumoThe use of free text to categorize any type of entity causes, in most cases, difficulties related to the identification of such entities. In the Electronic Fiscal Receipt (“Nota Fiscal Eletrônica”, NF-e), issued for all public purchases in Brazil, products are categorized within the Mercosul Common Nomenclature (NCM). Such an identifier is necessary to calculate taxes, but it is often filled in wrongly, which makes it difficult to detect irregularities in prices and monitor public expenditures. In this context, an automatic product categorization system was developed based on the textual descriptions present in the NF-e. It consists of a categorization tree that follows the NCM product hierarchy, using the Local Classifier per Parent Node pattern. Each node in the tree is trained to encode the textual descriptions in Document Embeddings and then use a supervised classification algorithm to decide the NCM code. Tree nodes are optimized by selecting classification algorithms as well as parameters, testing the performance of various random configurations. In the results, the hierarchical classification presented a higher F1 score than the flat classification experiments and the error propagation problem was mitigated.pt_BR
dc.identifier.citationALVES SOBRINHO, Pitágoras de Azevedo, Applying optimized hierarchical NCM classification to public purchases of products in Brazil. 2022. 19f. Trabalho de Conclusão de Curso (Residência em Tecnologia da Informação). Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, Natal, 2022.pt_BR
dc.identifier.urihttps://repositorio.ufrn.br/handle/123456789/48321
dc.languageenpt_BR
dc.publisherUniversidade Federal do Rio Grande do Nortept_BR
dc.publisher.countryBrasilpt_BR
dc.publisher.departmentInstituto Metrópole Digitalpt_BR
dc.publisher.initialsUFRNpt_BR
dc.publisher.programResidência em Tecnologia da Informaçãopt_BR
dc.subjectSupervised classificationpt_BR
dc.subjectMachine learningpt_BR
dc.subjectHierarchical classificationpt_BR
dc.subjectNota fiscal eletrônicapt_BR
dc.subjectProduct classificationpt_BR
dc.titleApplying optimized hierarchical NCM classification to public purchases of products in Brazilpt_BR
dc.title.alternativeApplying optimized hierarchical NCM classification to public purchases of products in Brazilpt_BR
dc.typebachelorThesispt_BR

Arquivos

Pacote Original

Agora exibindo 1 - 1 de 1
Nenhuma Miniatura disponível
Nome:
ApplyingOptimizedHierarchical_AlvesSobrinho_2022.pdf
Tamanho:
4.24 MB
Formato:
Adobe Portable Document Format
Descrição:
TCC - Final
Nenhuma Miniatura disponível
Baixar

Licença do Pacote

Agora exibindo 1 - 1 de 1
Nenhuma Miniatura disponível
Nome:
license.txt
Tamanho:
1.45 KB
Formato:
Item-specific license agreed upon to submission
Nenhuma Miniatura disponível
Baixar