Please use this identifier to cite or link to this item:
|Title:||Synchronous vs Asynchronous Computation on Graph Blocks: a Comparative Study|
|Department:||Department of Computer Science|
|Supervisor:||Supervisor: Prof. Jia, Xiaohua; First Reader: Dr. Wong, Ka Chun; Second Reader: Prof. Li, Qing|
|Abstract:||Large graphs are omnipresent in today's information based world. A few examples of graphs that we encounter in our day to day lives include the hyperlink structure of the web spanning billions of inter linked web pages, social networks that connect billions of people from all over the world and other technological networks such as power grids, telephone networks and transportation networks. As one can imagine, the sheer size of these graphs, often spanning millions and billions of nodes and vertices, makes it difficult to process or analyse them. Graph algorithms at scale are beyond the capabilities of individual machines (single node systems) and thus researchers are continually turning towards distributed computational architectures to tackle the problem of analyzing large graphs. Up until very recently, relational database systems were an all in one solution for storing and even analyzing all kinds of data. But as we move into the era of Big Data, where organizations are dealing with terabytes and petabytes of data, these traditional relational database systems are not able to solve all the problems that arise due to the volume, velocity and variety of the data being produced. Organizations have figured that RDBMS systems suffer from a performance bottlenecks, especially when dealing with the highly iterative and analytical operations for large scale graph data. The rise of big data, advancement in cloud computing and the incapability of traditional RDBMS tools to keep up with the scale of current data, especially in the context of graph data, has led to the development of specialized tools, namely Distributed Large Scale Graph Processing Systems, the most popular of which are Apache Giraph, GraphX and GraphLab. The most recent development in this area is the system known as Blogel, a system that relies on BSP message passing, much like Pregel. Blogel uses smart partitioning of the graph into blocks, reducing the messaging (communication) overhead, which is very expensive with commodity hardware network interconnects. This system provided orders of magnitude increase in performance for certain algorithms and certain datasets, compared to the previous frameworks. An asynchronous mode was recently released, which allows for asynchronous computation on graph blocks. The aim of this project is to assess if asynchronous computation on graph blocks provides better results) compared to synchronous computation, how good/bad are the performance gains/penalties, and also which datasets/algorithms are better suited for asynchronous computing on graph blocks. The problem that I wish to solve through this project is the problem that arises due the fundamental nature of the BSP Paradigm- the synchronization barrier. I aim to prove (via experimental results) that certain graph problems can be solved more efficiently using an asynchronous parallel programming model. The graph problem that I have decided to explore is the problem that arises very frequently in Geolocation Systems – Finding the K Nearest Points of Interest (POI) in a Spatial Network. Through this project, I aim to prove that given certain conditions and pre-requisites, an asynchronous parallel programming model performs better than the Bulk Synchronous Parallel Model for computing the K Nearest Points of Interest in a Spatial Network Graph.|
|Appears in Collections:||Computer Science - Undergraduate Final Year Projects |
Items in Digital CityU Collections are protected by copyright, with all rights reserved, unless otherwise indicated.