Skip to content
This repository was archived by the owner on Nov 24, 2018. It is now read-only.

Commit 16a44d4

Browse files
authored
Update README.md
1 parent cac79a9 commit 16a44d4

File tree

1 file changed

+16
-10
lines changed

1 file changed

+16
-10
lines changed

README.md

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,23 @@
11
# "Big Data" processing with MapReduce framework
22
![](mapreduce.jpg)<sup>1</sup>
33

4-
## Runnables
5-
* The latest binaries for all implementations are found zipped on the [releases page](https://github.com/qhua948/SE751Research/releases)
6-
* This includes inputs and instructions to run so you can reproduce our results
4+
## Research
5+
This was a research project where we learnt the MapReduce programming model and investigated a few frameworks. We have two implemenations:
6+
7+
### K-means (main) implementation
8+
* This processes images to determine the most common colour values.
9+
* The source code for the k-means implementation is found under the [k-means directory](https://github.com/wilmol/MapReduce-K-means-image-processing/tree/master/k-means/spark-scala-kmeans).
10+
* This includes instructions to run.
711

8-
## K-means (main) implementation
9-
* The source code for the k-means implementation is found under the [k-means directory](https://github.com/qhua948/SE751Research/tree/master/k-means/spark-scala-kmeans)
10-
* This includes instructions to run
12+
### reddit comment implementation
13+
* This calculates the average comment score per sub reddit and was used to compare frameworks.
14+
* The source code for the reddit comment implementations is found under the [reddit-comments directory](https://github.com/wilmol/MapReduce-K-means-image-processing/tree/master/reddit-comments).
15+
* This has been grouped by framework (couchDB, Hadoop, Spark, Cloud Haskell).
16+
* The sequential Java version is found within the Hadoop source code or [here](https://github.com/wilmol/MapReduce-K-means-image-processing/tree/master/reddit-comments/hadoop-reddit/src/main/java/nz/ac/auckland/mapreduce/NoFrameWorkMain.java).
1117

12-
## reddit comment implementation
13-
* The source code for the reddit comment implementations is found under the [reddit-comments directory](https://github.com/qhua948/SE751Research/tree/master/reddit-comments)
14-
* This has been grouped by framework (couchDB, Hadoop, Spark, Cloud Haskell)
15-
* The sequential Java version is found within the Hadoop source code or [here](https://github.com/qhua948/SE751Research/blob/master/reddit-comments/hadoop-reddit/src/main/java/nz/ac/auckland/mapreduce/NoFrameWorkMain.java)
18+
## Runnables
19+
* The latest binaries for all implementations are found zipped on the [releases page](https://github.com/wilmol/MapReduce-K-means-image-processing/releases).
20+
* This includes input images/video (see the resources directory) and instructions to run so you can reproduce our results.
21+
* The reddit comment data set from taken from [here](https://www.kaggle.com/reddit/reddit-comments-may-2015). We uncompressed it and took the first 20,000,000 lines (approx 11GB of JSON).
1622

1723
<sup>1</sup> Image credit: http://www.well-typed.com/blog/73/

0 commit comments

Comments
 (0)