Quantifying similarity in animal vocal sequences: Which metric performs best?

Kershenbaum, A; Garland, EC

Quantifying similarity in animal vocal sequences: Which metric performs best?

Repository URI

https://www.repository.cam.ac.uk/handle/1810/248821

Files

Kershenbaum_Garland-2015-Methods_in_Ecology_and_Evolution.pdf (487.72 KB)

Type

Article

Authors

Kershenbaum, Arik

https://orcid.org/0000-0003-0464-0243

Garland, EC

Abstract

jats:titleSummary</jats:title>jats:p jats:list

jats:list-item jats:pMany animals communicate using sequences of discrete acoustic elements which can be complex, vary in their degree of stereotypy, and are potentially open‐ended. Variation in sequences can provide important ecological, behavioural or evolutionary information about the structure and connectivity of populations, mechanisms for vocal cultural evolution and the underlying drivers responsible for these processes. Various mathematical techniques have been used to form a realistic approximation of sequence similarity for such tasks.</jats:p></jats:list-item>

jats:list-item jats:pHere, we use both simulated and empirical data sets from animal vocal sequences (rock hyrax, jats:italic<jats:styled-content style="fixed-case">P</jats:styled-content>rocavia capensis</jats:italic>; humpback whale, jats:italic<jats:styled-content style="fixed-case">M</jats:styled-content>egaptera novaeangliae</jats:italic>; bottlenose dolphin, jats:italic<jats:styled-content style="fixed-case">T</jats:styled-content>ursiops truncatus</jats:italic>; and <jats:styled-content style="fixed-case">C</jats:styled-content>arolina chickadee, jats:italic<jats:styled-content style="fixed-case">P</jats:styled-content>oecile carolinensis</jats:italic>) to test which of eight sequence analysis metrics are more likely to reconstruct the information encoded in the sequences, and to test the fidelity of estimation of model parameters, when the sequences are assumed to conform to particular statistical models.</jats:p></jats:list-item>

jats:list-item jats:pResults from the simulated data indicated that multiple metrics were equally successful in reconstructing the information encoded in the sequences of simulated individuals (<jats:styled-content style="fixed-case">M</jats:styled-content>arkov chains, jats:italicn</jats:italic>‐gram models, repeat distribution and edit distance) and data generated by different stochastic processes (entropy rate and jats:italicn</jats:italic>‐grams). However, the string edit (<jats:styled-content style="fixed-case">L</jats:styled-content>evenshtein) distance performed consistently and significantly better than all other tested metrics (including entropy, <jats:styled-content style="fixed-case">M</jats:styled-content>arkov chains, jats:italicn</jats:italic>‐grams, mutual information) for all empirical data sets, despite being less commonly used in the field of animal acoustic communication.</jats:p></jats:list-item>

jats:list-item jats:pThe <jats:styled-content style="fixed-case">L</jats:styled-content>evenshtein distance metric provides a robust analytical approach that should be considered in the comparison of animal acoustic sequences in preference to other commonly employed techniques (such as <jats:styled-content style="fixed-case">M</jats:styled-content>arkov chains, hidden <jats:styled-content style="fixed-case">M</jats:styled-content>arkov models or <jats:styled-content style="fixed-case">S</jats:styled-content>hannon entropy). The recent discovery that non‐<jats:styled-content style="fixed-case">M</jats:styled-content>arkovian vocal sequences may be more common in animal communication than previously thought provides a rich area for future research that requires non‐<jats:styled-content style="fixed-case">M</jats:styled-content>arkovian‐based analysis techniques to investigate animal grammars and potentially the origin of human language.</jats:p></jats:list-item> </jats:list> </jats:p>

Keywords

animal communication, edit distance, Markov, sequence, stochastic processes, vocal

Journal Title

Methods in Ecology and Evolution

Journal ISSN

2041-210X
2041-210X

Publisher

Wiley

Publisher DOI

https://doi.org/10.1111/2041-210X.12433

Rights

http://www.rioxx.net/licenses/all-rights-reserved

Sponsorship

We thank Melinda Rekdahl, Todd Freeberg and his graduate students, Amiyaal Ilany, Elizabeth Hobson, and Jessica Crance for providing comments of on a previous version of this manuscript. We thank Mike Noad, Melinda Rekdahl, and Claire Garrigue for assistance with humpback whale song collection and initial categorisation of the song, Vincent Janik and Laela Sayigh for assistance with signature whistle collection, Todd Freeberg with chickadee recordings, and Eli Geffen and Amiyaal Ilany for assistance with hyrax song collection and analysis. E.C.G is supported by a Newton International Fellowship. Part of this work was conducted while E.C.G. was supported by a National Research Council (National Academy of Sciences) Postdoctoral Fellowship at the National Marine Mammal Laboratory, AFSC, NMFS, NOAA. The findings and conclusions in this paper are those of the authors and do not necessarily represent the views of the National Marine Fisheries Service. We would also like to thank Randall Wells and the Sarasota Dolphin Research Program for the opportunity to record the Sarasota dolphins, where data were collected under a series of National Marine Fisheries Service Scientific Research Permits issued to Randall Wells. A.K. is supported by the Herchel Smith Postdoctoral Fellowship Fund. Part of this work was conducted while A.K. was a Postdoctoral Fellow at the National Institute for Mathematical and Biological Synthesis, an Institute sponsored by the National Science Foundation through NSF Award #DBI-1300426, with additional support from The University of Tennessee, Knoxville.

Collections

Scholarly Works - Zoology
Symplectic mapped items for data match