|
1 | 1 | # "Big Data" processing with MapReduce framework
|
2 | 2 | <sup>1</sup>
|
3 | 3 |
|
4 |
| -## Runnables |
5 |
| -* The latest binaries for all implementations are found zipped on the [releases page](https://github.com/qhua948/SE751Research/releases) |
6 |
| -* This includes inputs and instructions to run so you can reproduce our results |
| 4 | +## Research |
| 5 | +This was a research project where we learnt the MapReduce programming model and investigated a few frameworks. We have two implemenations: |
| 6 | + |
| 7 | +### K-means (main) implementation |
| 8 | +* This processes images to determine the most common colour values. |
| 9 | +* The source code for the k-means implementation is found under the [k-means directory](https://github.com/wilmol/MapReduce-K-means-image-processing/tree/master/k-means/spark-scala-kmeans). |
| 10 | +* This includes instructions to run. |
7 | 11 |
|
8 |
| -## K-means (main) implementation |
9 |
| -* The source code for the k-means implementation is found under the [k-means directory](https://github.com/qhua948/SE751Research/tree/master/k-means/spark-scala-kmeans) |
10 |
| -* This includes instructions to run |
| 12 | +### reddit comment implementation |
| 13 | +* This calculates the average comment score per sub reddit and was used to compare frameworks. |
| 14 | +* The source code for the reddit comment implementations is found under the [reddit-comments directory](https://github.com/wilmol/MapReduce-K-means-image-processing/tree/master/reddit-comments). |
| 15 | +* This has been grouped by framework (couchDB, Hadoop, Spark, Cloud Haskell). |
| 16 | +* The sequential Java version is found within the Hadoop source code or [here](https://github.com/wilmol/MapReduce-K-means-image-processing/tree/master/reddit-comments/hadoop-reddit/src/main/java/nz/ac/auckland/mapreduce/NoFrameWorkMain.java). |
11 | 17 |
|
12 |
| -## reddit comment implementation |
13 |
| -* The source code for the reddit comment implementations is found under the [reddit-comments directory](https://github.com/qhua948/SE751Research/tree/master/reddit-comments) |
14 |
| -* This has been grouped by framework (couchDB, Hadoop, Spark, Cloud Haskell) |
15 |
| -* The sequential Java version is found within the Hadoop source code or [here](https://github.com/qhua948/SE751Research/blob/master/reddit-comments/hadoop-reddit/src/main/java/nz/ac/auckland/mapreduce/NoFrameWorkMain.java) |
| 18 | +## Runnables |
| 19 | +* The latest binaries for all implementations are found zipped on the [releases page](https://github.com/wilmol/MapReduce-K-means-image-processing/releases). |
| 20 | +* This includes input images/video (see the resources directory) and instructions to run so you can reproduce our results. |
| 21 | +* The reddit comment data set from taken from [here](https://www.kaggle.com/reddit/reddit-comments-may-2015). We uncompressed it and took the first 20,000,000 lines (approx 11GB of JSON). |
16 | 22 |
|
17 | 23 | <sup>1</sup> Image credit: http://www.well-typed.com/blog/73/
|
0 commit comments