Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor. / Cohen-Addad, Vincent; Das, Debarati; Kipouridis, Evangelos; Parotsidis, Nikos; Thorup, Mikkel.

In: Journal of the ACM, Vol. 71, No. 2, 10, 2024.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Cohen-Addad, V, Das, D, Kipouridis, E, Parotsidis, N & Thorup, M 2024, 'Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor', Journal of the ACM, vol. 71, no. 2, 10. https://doi.org/10.1145/3639453

APA

Cohen-Addad, V., Das, D., Kipouridis, E., Parotsidis, N., & Thorup, M. (2024). Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor. Journal of the ACM, 71(2), [10]. https://doi.org/10.1145/3639453

Vancouver

Cohen-Addad V, Das D, Kipouridis E, Parotsidis N, Thorup M. Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor. Journal of the ACM. 2024;71(2). 10. https://doi.org/10.1145/3639453

Author

Cohen-Addad, Vincent ; Das, Debarati ; Kipouridis, Evangelos ; Parotsidis, Nikos ; Thorup, Mikkel. / Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor. In: Journal of the ACM. 2024 ; Vol. 71, No. 2.

Bibtex

@article{8d288d48abc7414fbcefc7ac16ee5937,
title = "Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor",
abstract = "We consider the numerical taxonomy problem of fitting a positive distance function D: (S2) → R>0 by a tree metric. We want a tree T with positive edge weights and including S among the vertices so that their distances in T match those in D. A nice application is in evolutionary biology where the tree T aims to approximate thebranching process leading to the observed distances in D [Cavalli-Sforza and Edwards 1967]. We consider the total error, that is, the sum of distance errors over all pairs of points. We present a deterministic polynomial time algorithm minimizing the total error within a constant factor. We can do this both for general trees and for the special case of ultrametrics with a root having the same distance to all vertices in S. The problems are APX-hard, so a constant factor is the best we can hope for in polynomial time. The best previous approximation factor was O((log n)(log log n)) by Ailon and Charikar [2005], who wrote “determining whether an O(1) approximation can be obtained is a fascinating question.”",
keywords = "Approximation algorithms, hierarchical clustering, phylogenic reconstructions, tree metrics, ultrametrics",
author = "Vincent Cohen-Addad and Debarati Das and Evangelos Kipouridis and Nikos Parotsidis and Mikkel Thorup",
note = "Publisher Copyright: {\textcopyright} 2024 Copyright held by the owner/author(s).",
year = "2024",
doi = "10.1145/3639453",
language = "English",
volume = "71",
journal = "Journal of the ACM",
issn = "0004-5411",
publisher = "Association for Computing Machinery",
number = "2",

}

RIS

TY - JOUR

T1 - Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor

AU - Cohen-Addad, Vincent

AU - Das, Debarati

AU - Kipouridis, Evangelos

AU - Parotsidis, Nikos

AU - Thorup, Mikkel

N1 - Publisher Copyright: © 2024 Copyright held by the owner/author(s).

PY - 2024

Y1 - 2024

N2 - We consider the numerical taxonomy problem of fitting a positive distance function D: (S2) → R>0 by a tree metric. We want a tree T with positive edge weights and including S among the vertices so that their distances in T match those in D. A nice application is in evolutionary biology where the tree T aims to approximate thebranching process leading to the observed distances in D [Cavalli-Sforza and Edwards 1967]. We consider the total error, that is, the sum of distance errors over all pairs of points. We present a deterministic polynomial time algorithm minimizing the total error within a constant factor. We can do this both for general trees and for the special case of ultrametrics with a root having the same distance to all vertices in S. The problems are APX-hard, so a constant factor is the best we can hope for in polynomial time. The best previous approximation factor was O((log n)(log log n)) by Ailon and Charikar [2005], who wrote “determining whether an O(1) approximation can be obtained is a fascinating question.”

AB - We consider the numerical taxonomy problem of fitting a positive distance function D: (S2) → R>0 by a tree metric. We want a tree T with positive edge weights and including S among the vertices so that their distances in T match those in D. A nice application is in evolutionary biology where the tree T aims to approximate thebranching process leading to the observed distances in D [Cavalli-Sforza and Edwards 1967]. We consider the total error, that is, the sum of distance errors over all pairs of points. We present a deterministic polynomial time algorithm minimizing the total error within a constant factor. We can do this both for general trees and for the special case of ultrametrics with a root having the same distance to all vertices in S. The problems are APX-hard, so a constant factor is the best we can hope for in polynomial time. The best previous approximation factor was O((log n)(log log n)) by Ailon and Charikar [2005], who wrote “determining whether an O(1) approximation can be obtained is a fascinating question.”

KW - Approximation algorithms

KW - hierarchical clustering

KW - phylogenic reconstructions

KW - tree metrics

KW - ultrametrics

U2 - 10.1145/3639453

DO - 10.1145/3639453

M3 - Journal article

AN - SCOPUS:85190943260

VL - 71

JO - Journal of the ACM

JF - Journal of the ACM

SN - 0004-5411

IS - 2

M1 - 10

ER -

ID: 391118904