ISSN 0253-2778

CN 34-1054/N

open

PipelineJoin:A new MapReduce-based multi-table join algorithm

  • MapReduce, a parallel and distributed computing model, has been widely used to process join operations for two or more large tables. The existing MapReduce-based multi-table join algorithms all have some limitations when dealing with chain join. Some methods can not process join operations for multi large tables, and others involve sequentially running too many MapReduce tasks, which leads to low efficiency. Here a new MapReduce-based multi-table join algorithm, PipelineJoin, is proposed to process chain join of a number of tables. PipelineJoin adopts a pipeline model and a scheduler to allow the overlapping execution of a series of Map tasks and Reduce tasks in the whole join process so as to enhance the efficiency of multi-table join, while effectively overcoming the deficiency of the existing methods. Extensive experimental results based on various synthetic datasets show that the proposed algorithm can greatly reduce join operation time compared with the existing chain join algorithms.
  • loading

Catalog

    {{if article.pdfAccess}}
    {{if article.articleBusiness.pdfLink && article.articleBusiness.pdfLink != ''}} {{else}} {{/if}}PDF
    {{/if}}
    XML

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return