Repository landing page

We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.

An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms

Abstract

We propose a framework for the exact probabilistic \nanalysis of window-based pattern matching algorithms, such as \nBoyer--Moore, Horspool, Backward DAWG Matching, Backward Oracle \nMatching, and more. In particular, we develop an algorithm that \nefficiently computes the distribution of a pattern matching \nalgorithm\'s running time cost (such as the number of text character \naccesses) for any given pattern in a random text model. Text models \nrange from simple uniform models to higher-order Markov models or \nhidden Markov models (HMMs). Furthermore, we provide an algorithm to \ncompute the exact distribution of \\emph{differences} in running time \ncost of two pattern matching algorithms. Methodologically, we use \nextensions of finite automata which we call \\emph{deterministic \narithmetic automata} (DAAs) and \\emph{probabilistic arithmetic \nautomata} (PAAs)~\\cite{Marschall2008}. Given an algorithm, a \npattern, and a text model, a PAA is constructed from which the \nsought distributions can be derived using dynamic programming. To \nour knowledge, this is the first time that substring- or \nsuffix-based pattern matching algorithms are analyzed exactly by \ncomputing the whole distribution of running time cost. \nExperimentally, we compare Horspool\'s algorithm, Backward DAWG \nMatching, and Backward Oracle Matching on prototypical patterns of \nshort length and provide statistics on the size of minimal DAAs for \nthese computations

Similar works

This paper was published in CWI's Institutional Repository.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.