Conception And Analysis Of A Raspberry Pi Cluster With Apache Spark

Kuhaupt, Nicolas (2017) Conception And Analysis Of A Raspberry Pi Cluster With Apache Spark. Masters thesis, Ulm University.

[img] PDF - Registered users only - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader


Due to the latest development in the context of Internet of Things, the amount of generated and collected data increases. Business and science applications are interested in finding data patterns and correlations between the recorded data sets. To generate immediate insights, we need fast algorithms, that take advantage of distributed calculations. The growth of single computers performance stagnates and there is more potential in tackling the problems of big data by combining computers to scale computing power. Therefore, computers are connected to build clusters. The cluster management, responsible for the division of labor between the single nodes, is executed by new tools such as Apache Spark. Spark holds the record for big data sorting in 2014 and is widely used. It offers in-memory computing for faster calculations, an easy and high-level Machine Learning API and fits well into the Hadoop ecosystem for big data. We evaluate the performance of a cluster. The test setup includes a set of Raspberry Pi mini computers with installed Hadoop and Spark environment. We want to examine the scaling performance of chosen algorithms, such as Wordcount, Kolmogorov-Smirnov Test, Frequent Pattern Growth, Support Vector Machines, Linear Regression, and K-Means. The parameters for these tests are the dataset size and the number of computation nodes. The results offer an indication of the required number of nodes for a problem definition. Furthermore, we analyzed the mentioned algorithms and their used data structures to explain their performance, represented by scaling patterns. Last, the implementation and abstractions of Apache Spark are examined for potential bottlenecks.

Item Type:Thesis (Masters)
Subjects:DBIS Research > Master and Phd-Thesis
ID Code:1490
Deposited By: Herr Burkhard Hoppenstedt
BibTex Export:BibTeX
Deposited On:16 May 2017 11:06
Last Modified:16 May 2017 11:06

Repository Staff Only: item control page