escience2019 has ended
Back To Schedule
Thursday, September 26 • 2:00pm - 2:30pm
On Distributed Information Composition in Big Data Systems

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Haifa AlQuwaiee (New Jersey Institute of Technology), Songlin He (New Jersey Institute of Technology), Chase Wu (New Jersey Institute of Technology), Qiang Tang (New Jersey Institute of Technology), and Xuewen Shen (New Jersey Institute of Technology)

Modern big data computing systems exemplified by Hadoop employ parallel processing based on distributed storage. The results produced by parallel tasks such as computing modules in scientific workflows or reducers in the MapReduce framework are typically distributed across different data nodes. However, most existing systems do not provide a mechanism to composite such distributed information, as required by many big data applications. We construct analytical cost models and formulate a Distributed Information Composition problem in Big Data Systems, referred to as DIC-BDS, to aggregate multiple datasets stored as data blocks in Hadoop Distributed File System (HDFS) using a composition operator of specific complexity to produce one final output. We rigorously prove that DIC-BDS is NP-complete, and propose two heuristic algorithms: Fixed-windowDistributed Composition Scheme (FDCS) and Dynamic-window Distributed Composition Scheme with Delay (DDCS-D). We conduct extensive experiments in Google clouds with various composition operators of commonly considered degrees of complexity including O(n), O(n logn), and O(n2). Our experimental results show the performance superiority of the proposed solutions over existing methods. Specically, FDCS outperforms all other algorithms in comparison with a composition operator of complexity O(n) or O(n logn), while DDCS-D achieves the minimum total composition time with a composition operator of complexity O(n2). The proposed algorithms provide an additional level of data processing for efficient information aggregation in existing workow and big data systems.


Haifa AlQuwaiee

New Jersey Institute of Technology

Thursday September 26, 2019 2:00pm - 2:30pm PDT
Kon Tiki Room