Repository landing page

We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.

Classification of consumer goods into 5-digit COICOP 2018 codes

Abstract

The survey of consumer expenditure is a national survey conducted by Statistics Norway (SSB) with the purpose of collecting detailed data about Norwegian households’ annual consumption of different goods and services. The survey has up until its most recent publication in 2012 relied on employees at SSB to manually categorise all registered expenditures into COICOP (Classification of Individual Consumption by Purpose) item codes to produce consumption statistics. This has involved large workloads and high implementation costs, and because of this, SSB wants to modernise and improve the efficiency of the survey for its next planned implementation in 2022. This study is the result of a 3-month collaboration with SSB to explore the application of supervised machine learning for classification of consumer goods to 5-digit COICOP codes. The purpose of this study has been to explore the potential of using machine learning to automate parts of the survey of consumer expenditure. This thesis demonstrates how different data sets from separate sources can be combined into a COICOP training data set that can be used to develop and evaluate COICOP classification models. Furthermore, this study explores how these models can be incorporated into a ”human-in-the-loop”-based classification system to facilitate automatic classification of consumer goods while also maintaining sufficient levels of data quality. The findings indicate that supervised machine learning is a suited method for classifying consumer goods into 5-digit COICOP codes. Additionally, the results show that the models’ prediction probabilities are good indicators of where misclassifications occur. Together, these findings show a promising potential for implementation of a ”human-in-the-loop”-based classification system for reliable classification of consumer goods. At the same time, the findings uncover important limitations with the data used in this thesis, as the models were trained on data that the survey of consumer expenditure will not be based on. This thesis has used data sets that were available, and these were not necessarily the most relevant. Therefore, it is not expected that the developed models will provide immediate value to the objectives of SSB without first being trained on more relevant data

Similar works

Full text

thumbnail-image

NORA - Norwegian Open Research Archives

redirect
Last time updated on 12/05/2022

This paper was published in NORA - Norwegian Open Research Archives.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.