Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor. / Cohen-Addad, Vincent; Das, Debarati; Kipouridis, Evangelos; Parotsidis, Nikos; Thorup, Mikkel.
In: Journal of the ACM, Vol. 71, No. 2, 10, 2024.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor
AU - Cohen-Addad, Vincent
AU - Das, Debarati
AU - Kipouridis, Evangelos
AU - Parotsidis, Nikos
AU - Thorup, Mikkel
N1 - Publisher Copyright: © 2024 Copyright held by the owner/author(s).
PY - 2024
Y1 - 2024
N2 - We consider the numerical taxonomy problem of fitting a positive distance function D: (S2) → R>0 by a tree metric. We want a tree T with positive edge weights and including S among the vertices so that their distances in T match those in D. A nice application is in evolutionary biology where the tree T aims to approximate thebranching process leading to the observed distances in D [Cavalli-Sforza and Edwards 1967]. We consider the total error, that is, the sum of distance errors over all pairs of points. We present a deterministic polynomial time algorithm minimizing the total error within a constant factor. We can do this both for general trees and for the special case of ultrametrics with a root having the same distance to all vertices in S. The problems are APX-hard, so a constant factor is the best we can hope for in polynomial time. The best previous approximation factor was O((log n)(log log n)) by Ailon and Charikar [2005], who wrote “determining whether an O(1) approximation can be obtained is a fascinating question.”
AB - We consider the numerical taxonomy problem of fitting a positive distance function D: (S2) → R>0 by a tree metric. We want a tree T with positive edge weights and including S among the vertices so that their distances in T match those in D. A nice application is in evolutionary biology where the tree T aims to approximate thebranching process leading to the observed distances in D [Cavalli-Sforza and Edwards 1967]. We consider the total error, that is, the sum of distance errors over all pairs of points. We present a deterministic polynomial time algorithm minimizing the total error within a constant factor. We can do this both for general trees and for the special case of ultrametrics with a root having the same distance to all vertices in S. The problems are APX-hard, so a constant factor is the best we can hope for in polynomial time. The best previous approximation factor was O((log n)(log log n)) by Ailon and Charikar [2005], who wrote “determining whether an O(1) approximation can be obtained is a fascinating question.”
KW - Approximation algorithms
KW - hierarchical clustering
KW - phylogenic reconstructions
KW - tree metrics
KW - ultrametrics
U2 - 10.1145/3639453
DO - 10.1145/3639453
M3 - Journal article
AN - SCOPUS:85190943260
VL - 71
JO - Journal of the ACM
JF - Journal of the ACM
SN - 0004-5411
IS - 2
M1 - 10
ER -
ID: 391118904