Repository landing page

We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.

Language identification for South African Bantu languages Using Rank Order Statistics

Abstract

Language identification is an important pre-process in many data management and information retrieval and transformation systems. However, Bantu languages are known to be difficult to identify because of lack of data and language similarity. This paper investigates the performance of n-gram counting using rank orders in order to discriminate among the different Bantu languages spoken in South Africa, using varying test and training data sizes. The highest average accuracy obtained was 99.3% with a testing size of 495 characters and training size of 600000 characters. The lowest average accuracy obtained was 78.72% when the testing size was 15 characters and learning size was 200000 characters

Similar works

Full text

thumbnail-image

UCT Computer Science Research Document Archive

redirect
Last time updated on 28/10/2019

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.