Approximate online pattern matching in sublinear time

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Approximate online pattern matching in sublinear time. / Chakraborty, Diptarka; Das, Debarati; Koucký, Michal.

39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2019. ed. / Arkadev Chattopadhyay; Paul Gastin. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2019. 10 (Leibniz International Proceedings in Informatics, LIPIcs, Vol. 150).

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Chakraborty, D, Das, D & Koucký, M 2019, Approximate online pattern matching in sublinear time. in A Chattopadhyay & P Gastin (eds), 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2019., 10, Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, Leibniz International Proceedings in Informatics, LIPIcs, vol. 150, 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2019, Bombay, India, 11/12/2019. https://doi.org/10.4230/LIPIcs.FSTTCS.2019.10

APA

Chakraborty, D., Das, D., & Koucký, M. (2019). Approximate online pattern matching in sublinear time. In A. Chattopadhyay, & P. Gastin (Eds.), 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2019 [10] Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. Leibniz International Proceedings in Informatics, LIPIcs Vol. 150 https://doi.org/10.4230/LIPIcs.FSTTCS.2019.10

Vancouver

Chakraborty D, Das D, Koucký M. Approximate online pattern matching in sublinear time. In Chattopadhyay A, Gastin P, editors, 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2019. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. 2019. 10. (Leibniz International Proceedings in Informatics, LIPIcs, Vol. 150). https://doi.org/10.4230/LIPIcs.FSTTCS.2019.10

Author

Chakraborty, Diptarka ; Das, Debarati ; Koucký, Michal. / Approximate online pattern matching in sublinear time. 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2019. editor / Arkadev Chattopadhyay ; Paul Gastin. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2019. (Leibniz International Proceedings in Informatics, LIPIcs, Vol. 150).

Bibtex

@inproceedings{673429df948e44cb9a1588be3a9c1f91,
title = "Approximate online pattern matching in sublinear time",
abstract = "We consider the approximate pattern matching problem under edit distance. In this problem we are given a pattern P of length m and a text T of length n over some alphabet Σ, and a positive integer k. The goal is to find all the positions j in T such that there is a substring of T ending at j which has edit distance at most k from the pattern P. Recall, the edit distance between two strings is the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. For a position t in {1,...,n}, let kt be the smallest edit distance between P and any substring of T ending at t. In this paper we give a constant factor approximation to the sequence k1,k2,...,kn. We consider both offline and online settings. In the offline setting, where both P and T are available, we present an algorithm that for all t in {1,...,n}, computes the value of kt approximately within a constant factor. The worst case running time of our algorithm is {\~O}(nm3/4). In the online setting, we are given P and then T arrives one symbol at a time. We design an algorithm that upon arrival of the t-th symbol of T computes kt approximately within O(1)multiplicative factor and m8/9-additive error. Our algorithm takes {\~O}(m1−(7/54)) amortized time per symbol arrival and takes {\~O}(m1−(1/54)) additional space apart from storing the pattern P. Both of our algorithms are randomized and produce correct answer with high probability. To the best of our knowledge this is the first algorithm that takes worst-case sublinear (in the length of the pattern) time and sublinear extra space for the online approximate pattern matching problem. To get our result we build on the technique of Chakraborty, Das, Goldenberg, Kouck{\'y} and Saks [FOCS'18] for computing a constant factor approximation of edit distance in sub-quadratic time.",
keywords = "Approximate Pattern Matching, Edit Distance, Online Pattern Matching, Streaming Algorithm, Sublinear Algorithm",
author = "Diptarka Chakraborty and Debarati Das and Michal Kouck{\'y}",
year = "2019",
doi = "10.4230/LIPIcs.FSTTCS.2019.10",
language = "English",
series = "Leibniz International Proceedings in Informatics, LIPIcs",
publisher = "Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing",
editor = "Arkadev Chattopadhyay and Paul Gastin",
booktitle = "39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2019",
note = "39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2019 ; Conference date: 11-12-2019 Through 13-12-2019",

}

RIS

TY - GEN

T1 - Approximate online pattern matching in sublinear time

AU - Chakraborty, Diptarka

AU - Das, Debarati

AU - Koucký, Michal

PY - 2019

Y1 - 2019

N2 - We consider the approximate pattern matching problem under edit distance. In this problem we are given a pattern P of length m and a text T of length n over some alphabet Σ, and a positive integer k. The goal is to find all the positions j in T such that there is a substring of T ending at j which has edit distance at most k from the pattern P. Recall, the edit distance between two strings is the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. For a position t in {1,...,n}, let kt be the smallest edit distance between P and any substring of T ending at t. In this paper we give a constant factor approximation to the sequence k1,k2,...,kn. We consider both offline and online settings. In the offline setting, where both P and T are available, we present an algorithm that for all t in {1,...,n}, computes the value of kt approximately within a constant factor. The worst case running time of our algorithm is Õ(nm3/4). In the online setting, we are given P and then T arrives one symbol at a time. We design an algorithm that upon arrival of the t-th symbol of T computes kt approximately within O(1)multiplicative factor and m8/9-additive error. Our algorithm takes Õ(m1−(7/54)) amortized time per symbol arrival and takes Õ(m1−(1/54)) additional space apart from storing the pattern P. Both of our algorithms are randomized and produce correct answer with high probability. To the best of our knowledge this is the first algorithm that takes worst-case sublinear (in the length of the pattern) time and sublinear extra space for the online approximate pattern matching problem. To get our result we build on the technique of Chakraborty, Das, Goldenberg, Koucký and Saks [FOCS'18] for computing a constant factor approximation of edit distance in sub-quadratic time.

AB - We consider the approximate pattern matching problem under edit distance. In this problem we are given a pattern P of length m and a text T of length n over some alphabet Σ, and a positive integer k. The goal is to find all the positions j in T such that there is a substring of T ending at j which has edit distance at most k from the pattern P. Recall, the edit distance between two strings is the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. For a position t in {1,...,n}, let kt be the smallest edit distance between P and any substring of T ending at t. In this paper we give a constant factor approximation to the sequence k1,k2,...,kn. We consider both offline and online settings. In the offline setting, where both P and T are available, we present an algorithm that for all t in {1,...,n}, computes the value of kt approximately within a constant factor. The worst case running time of our algorithm is Õ(nm3/4). In the online setting, we are given P and then T arrives one symbol at a time. We design an algorithm that upon arrival of the t-th symbol of T computes kt approximately within O(1)multiplicative factor and m8/9-additive error. Our algorithm takes Õ(m1−(7/54)) amortized time per symbol arrival and takes Õ(m1−(1/54)) additional space apart from storing the pattern P. Both of our algorithms are randomized and produce correct answer with high probability. To the best of our knowledge this is the first algorithm that takes worst-case sublinear (in the length of the pattern) time and sublinear extra space for the online approximate pattern matching problem. To get our result we build on the technique of Chakraborty, Das, Goldenberg, Koucký and Saks [FOCS'18] for computing a constant factor approximation of edit distance in sub-quadratic time.

KW - Approximate Pattern Matching

KW - Edit Distance

KW - Online Pattern Matching

KW - Streaming Algorithm

KW - Sublinear Algorithm

U2 - 10.4230/LIPIcs.FSTTCS.2019.10

DO - 10.4230/LIPIcs.FSTTCS.2019.10

M3 - Article in proceedings

AN - SCOPUS:85077470629

T3 - Leibniz International Proceedings in Informatics, LIPIcs

BT - 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2019

A2 - Chattopadhyay, Arkadev

A2 - Gastin, Paul

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

T2 - 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2019

Y2 - 11 December 2019 through 13 December 2019

ER -

ID: 241101333