An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms

Marschall, T. (Tobias); Rahmann, S. (Sven)

Repository landing page

An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms

Authors: T. (Tobias) Marschall
S. (Sven) Rahmann
Publication date: 1 October 2011
Publisher: 'MDPI AG'

Abstract

We propose a framework for the exact probabilistic \nanalysis of window-based pattern matching algorithms, such as \nBoyer--Moore, Horspool, Backward DAWG Matching, Backward Oracle \nMatching, and more. In particular, we develop an algorithm that \nefficiently computes the distribution of a pattern matching \nalgorithm\'s running time cost (such as the number of text character \naccesses) for any given pattern in a random text model. Text models \nrange from simple uniform models to higher-order Markov models or \nhidden Markov models (HMMs). Furthermore, we provide an algorithm to \ncompute the exact distribution of \\emph{differences} in running time \ncost of two pattern matching algorithms. Methodologically, we use \nextensions of finite automata which we call \\emph{deterministic \narithmetic automata} (DAAs) and \\emph{probabilistic arithmetic \nautomata} (PAAs)~\\cite{Marschall2008}. Given an algorithm, a \npattern, and a text model, a PAA is constructed from which the \nsought distributions can be derived using dynamic programming. To \nour knowledge, this is the first time that substring- or \nsuffix-based pattern matching algorithms are analyzed exactly by \ncomputing the whole distribution of running time cost. \nExperimentally, we compare Horspool\'s algorithm, Backward DAWG \nMatching, and Backward Oracle Matching on prototypical patterns of \nshort length and provide statistics on the size of minimal DAAs for \nthese computations

Similar works

Full text

Open in the Core reader

Download PDF

CWI's Institutional Repository

oai:cwi.nl:18708

Last time updated on 18/04/2020

This paper was published in CWI's Institutional Repository.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.