On the accuracy of short read mapping

Research output: Chapter in Book/Report/Conference proceedingBook chapterResearchpeer-review

Standard

On the accuracy of short read mapping. / Menzel, Karl Peter; Frellsen, Jes; Plass, Mireya; Rasmussen, Simon Horskjær; Krogh, Anders.

Deep Sequencing Data Analysis. ed. / Noam Shomron. Springer Science+Business Media, 2013. p. 39-59 (Methods in Molecular Biology, Vol. 1038).

Research output: Chapter in Book/Report/Conference proceedingBook chapterResearchpeer-review

Harvard

Menzel, KP, Frellsen, J, Plass, M, Rasmussen, SH & Krogh, A 2013, On the accuracy of short read mapping. in N Shomron (ed.), Deep Sequencing Data Analysis. Springer Science+Business Media, Methods in Molecular Biology, vol. 1038, pp. 39-59. https://doi.org/10.1007/978-1-62703-514-9_3

APA

Menzel, K. P., Frellsen, J., Plass, M., Rasmussen, S. H., & Krogh, A. (2013). On the accuracy of short read mapping. In N. Shomron (Ed.), Deep Sequencing Data Analysis (pp. 39-59). Springer Science+Business Media. Methods in Molecular Biology Vol. 1038 https://doi.org/10.1007/978-1-62703-514-9_3

Vancouver

Menzel KP, Frellsen J, Plass M, Rasmussen SH, Krogh A. On the accuracy of short read mapping. In Shomron N, editor, Deep Sequencing Data Analysis. Springer Science+Business Media. 2013. p. 39-59. (Methods in Molecular Biology, Vol. 1038). https://doi.org/10.1007/978-1-62703-514-9_3

Author

Menzel, Karl Peter ; Frellsen, Jes ; Plass, Mireya ; Rasmussen, Simon Horskjær ; Krogh, Anders. / On the accuracy of short read mapping. Deep Sequencing Data Analysis. editor / Noam Shomron. Springer Science+Business Media, 2013. pp. 39-59 (Methods in Molecular Biology, Vol. 1038).

Bibtex

@inbook{45c0618e44fa43de893edcdbdfd28d02,
title = "On the accuracy of short read mapping",
abstract = "The development of high-throughput sequencing technologies has revolutionized the way we study genomes and gene regulation. In a single experiment, millions of reads are produced. To gain knowledge from these experiments the first thing to be done is finding the genomic origin of the reads, i.e., mapping the reads to a reference genome. In this new situation, conventional alignment tools are obsolete, as they cannot handle this huge amount of data in a reasonable amount of time. Thus, new mapping algorithms have been developed, which are fast at the expense of a small decrease in accuracy. In this chapter we discuss the current problems in short read mapping and show that mapping reads correctly is a nontrivial task. Through simple experiments with both real and synthetic data, we demonstrate that different mappers can give different results depending on the type of data, and that a considerable fraction of uniquely mapped reads is potentially mapped to an incorrect location. Furthermore, we provide simple statistical results on the expected number of random matches in a genome (E-value) and the probability of a random match as a function of read length. Finally, we show that quality scores contain valuable information for mapping and why mapping quality should be evaluated in a probabilistic manner. In the end, we discuss the potential of improving the performance of current methods by considering these quality scores in a probabilistic mapping program.",
author = "Menzel, {Karl Peter} and Jes Frellsen and Mireya Plass and Rasmussen, {Simon Horskj{\ae}r} and Anders Krogh",
year = "2013",
doi = "10.1007/978-1-62703-514-9_3",
language = "English",
isbn = "978-1-62703-513-2",
series = "Methods in Molecular Biology",
publisher = "Springer Science+Business Media",
pages = "39--59",
editor = "Noam Shomron",
booktitle = "Deep Sequencing Data Analysis",
address = "Singapore",

}

RIS

TY - CHAP

T1 - On the accuracy of short read mapping

AU - Menzel, Karl Peter

AU - Frellsen, Jes

AU - Plass, Mireya

AU - Rasmussen, Simon Horskjær

AU - Krogh, Anders

PY - 2013

Y1 - 2013

N2 - The development of high-throughput sequencing technologies has revolutionized the way we study genomes and gene regulation. In a single experiment, millions of reads are produced. To gain knowledge from these experiments the first thing to be done is finding the genomic origin of the reads, i.e., mapping the reads to a reference genome. In this new situation, conventional alignment tools are obsolete, as they cannot handle this huge amount of data in a reasonable amount of time. Thus, new mapping algorithms have been developed, which are fast at the expense of a small decrease in accuracy. In this chapter we discuss the current problems in short read mapping and show that mapping reads correctly is a nontrivial task. Through simple experiments with both real and synthetic data, we demonstrate that different mappers can give different results depending on the type of data, and that a considerable fraction of uniquely mapped reads is potentially mapped to an incorrect location. Furthermore, we provide simple statistical results on the expected number of random matches in a genome (E-value) and the probability of a random match as a function of read length. Finally, we show that quality scores contain valuable information for mapping and why mapping quality should be evaluated in a probabilistic manner. In the end, we discuss the potential of improving the performance of current methods by considering these quality scores in a probabilistic mapping program.

AB - The development of high-throughput sequencing technologies has revolutionized the way we study genomes and gene regulation. In a single experiment, millions of reads are produced. To gain knowledge from these experiments the first thing to be done is finding the genomic origin of the reads, i.e., mapping the reads to a reference genome. In this new situation, conventional alignment tools are obsolete, as they cannot handle this huge amount of data in a reasonable amount of time. Thus, new mapping algorithms have been developed, which are fast at the expense of a small decrease in accuracy. In this chapter we discuss the current problems in short read mapping and show that mapping reads correctly is a nontrivial task. Through simple experiments with both real and synthetic data, we demonstrate that different mappers can give different results depending on the type of data, and that a considerable fraction of uniquely mapped reads is potentially mapped to an incorrect location. Furthermore, we provide simple statistical results on the expected number of random matches in a genome (E-value) and the probability of a random match as a function of read length. Finally, we show that quality scores contain valuable information for mapping and why mapping quality should be evaluated in a probabilistic manner. In the end, we discuss the potential of improving the performance of current methods by considering these quality scores in a probabilistic mapping program.

U2 - 10.1007/978-1-62703-514-9_3

DO - 10.1007/978-1-62703-514-9_3

M3 - Book chapter

C2 - 23872968

SN - 978-1-62703-513-2

T3 - Methods in Molecular Biology

SP - 39

EP - 59

BT - Deep Sequencing Data Analysis

A2 - Shomron, Noam

PB - Springer Science+Business Media

ER -

ID: 51422216