MRPack: Multi-algorithm execution using compute-intensive approach in MapReduce

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Standard

MRPack : Multi-algorithm execution using compute-intensive approach in MapReduce. / Idris, Muhammad; Hussain, Shujaat; Siddiqi, Muhammad Hameed; Hassan, Waseem; Bilal, Hafiz Syed Muhammad; Lee, Sungyoung.

I: PLoS ONE, Bind 10, Nr. 8, e0136259, 25.08.2015.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Harvard

Idris, M, Hussain, S, Siddiqi, MH, Hassan, W, Bilal, HSM & Lee, S 2015, 'MRPack: Multi-algorithm execution using compute-intensive approach in MapReduce', PLoS ONE, bind 10, nr. 8, e0136259. https://doi.org/10.1371/journal.pone.0136259

APA

Idris, M., Hussain, S., Siddiqi, M. H., Hassan, W., Bilal, H. S. M., & Lee, S. (2015). MRPack: Multi-algorithm execution using compute-intensive approach in MapReduce. PLoS ONE, 10(8), [e0136259]. https://doi.org/10.1371/journal.pone.0136259

Vancouver

Idris M, Hussain S, Siddiqi MH, Hassan W, Bilal HSM, Lee S. MRPack: Multi-algorithm execution using compute-intensive approach in MapReduce. PLoS ONE. 2015 aug. 25;10(8). e0136259. https://doi.org/10.1371/journal.pone.0136259

Author

Idris, Muhammad ; Hussain, Shujaat ; Siddiqi, Muhammad Hameed ; Hassan, Waseem ; Bilal, Hafiz Syed Muhammad ; Lee, Sungyoung. / MRPack : Multi-algorithm execution using compute-intensive approach in MapReduce. I: PLoS ONE. 2015 ; Bind 10, Nr. 8.

Bibtex

@article{8564a019769a4b788a878af21a21f9f8,
title = "MRPack: Multi-algorithm execution using compute-intensive approach in MapReduce",
abstract = "Large quantities of data have been generated from multiple sources at exponential rates in the last few years. These data are generated at high velocity as real time and streaming data in variety of formats. These characteristics give rise to challenges in its modeling, computation, and processing. Hadoop MapReduce (MR) is a well known data-intensive distributed processing framework using the distributed file system (DFS) for Big Data. Current implementations of MR only support execution of a single algorithm in the entire Hadoop cluster. In this paper, we propose MapReducePack (MRPack), a variation of MR that supports execution of a set of related algorithms in a single MR job. We exploit the computational capability of a cluster by increasing the compute-intensiveness of MapReduce while maintaining its data-intensive approach. It uses the available computing resources by dynamically managing the task assignment and intermediate data. Intermediate data from multiple algorithms are managed using multi-key and skew mitigation strategies. The performance study of the proposed system shows that it is time, I/O, and memory efficient compared to the default MapReduce. The proposed approach reduces the execution time by 200% with an approximate 50% decrease in I/O cost. Complexity and qualitative results analysis shows significant performance improvement.",
author = "Muhammad Idris and Shujaat Hussain and Siddiqi, {Muhammad Hameed} and Waseem Hassan and Bilal, {Hafiz Syed Muhammad} and Sungyoung Lee",
note = "Publisher Copyright: {\textcopyright} 2015 Idris et al.This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.",
year = "2015",
month = aug,
day = "25",
doi = "10.1371/journal.pone.0136259",
language = "English",
volume = "10",
journal = "PLoS ONE",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "8",

}

RIS

TY - JOUR

T1 - MRPack

T2 - Multi-algorithm execution using compute-intensive approach in MapReduce

AU - Idris, Muhammad

AU - Hussain, Shujaat

AU - Siddiqi, Muhammad Hameed

AU - Hassan, Waseem

AU - Bilal, Hafiz Syed Muhammad

AU - Lee, Sungyoung

N1 - Publisher Copyright: © 2015 Idris et al.This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PY - 2015/8/25

Y1 - 2015/8/25

N2 - Large quantities of data have been generated from multiple sources at exponential rates in the last few years. These data are generated at high velocity as real time and streaming data in variety of formats. These characteristics give rise to challenges in its modeling, computation, and processing. Hadoop MapReduce (MR) is a well known data-intensive distributed processing framework using the distributed file system (DFS) for Big Data. Current implementations of MR only support execution of a single algorithm in the entire Hadoop cluster. In this paper, we propose MapReducePack (MRPack), a variation of MR that supports execution of a set of related algorithms in a single MR job. We exploit the computational capability of a cluster by increasing the compute-intensiveness of MapReduce while maintaining its data-intensive approach. It uses the available computing resources by dynamically managing the task assignment and intermediate data. Intermediate data from multiple algorithms are managed using multi-key and skew mitigation strategies. The performance study of the proposed system shows that it is time, I/O, and memory efficient compared to the default MapReduce. The proposed approach reduces the execution time by 200% with an approximate 50% decrease in I/O cost. Complexity and qualitative results analysis shows significant performance improvement.

AB - Large quantities of data have been generated from multiple sources at exponential rates in the last few years. These data are generated at high velocity as real time and streaming data in variety of formats. These characteristics give rise to challenges in its modeling, computation, and processing. Hadoop MapReduce (MR) is a well known data-intensive distributed processing framework using the distributed file system (DFS) for Big Data. Current implementations of MR only support execution of a single algorithm in the entire Hadoop cluster. In this paper, we propose MapReducePack (MRPack), a variation of MR that supports execution of a set of related algorithms in a single MR job. We exploit the computational capability of a cluster by increasing the compute-intensiveness of MapReduce while maintaining its data-intensive approach. It uses the available computing resources by dynamically managing the task assignment and intermediate data. Intermediate data from multiple algorithms are managed using multi-key and skew mitigation strategies. The performance study of the proposed system shows that it is time, I/O, and memory efficient compared to the default MapReduce. The proposed approach reduces the execution time by 200% with an approximate 50% decrease in I/O cost. Complexity and qualitative results analysis shows significant performance improvement.

UR - http://www.scopus.com/inward/record.url?scp=84942881313&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0136259

DO - 10.1371/journal.pone.0136259

M3 - Journal article

C2 - 26305223

AN - SCOPUS:84942881313

VL - 10

JO - PLoS ONE

JF - PLoS ONE

SN - 1932-6203

IS - 8

M1 - e0136259

ER -

ID: 388954442