Skip to content

Commit c96cd6b

Browse files
author
Calum Bell
authored
Create README.md
1 parent 3a47636 commit c96cd6b

File tree

1 file changed

+59
-0
lines changed

1 file changed

+59
-0
lines changed

README.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# GoLang Keyword Density Analyser
2+
3+
Implemented in GoLang, this application is used to utilise concurrency to increase the speedup of keyword density analysis on large strings between **50k - 500k** words long.
4+
5+
Compared to a basic sequential implementation in C++, this application runs **500x faster** on a dataset of 70k words on an intel i5 3.3GHz CPU.
6+
7+
## About
8+
9+
This application was built in parallel with a CUDA (C++) implementation, also available on my Github, which takes advantage of GPU threads. Out of the two, this solution ran faster and scaled further.
10+
11+
## Getting Started
12+
13+
**Prerequisites**
14+
15+
1. Install Go by following the instructions as seen on: https://golang.org/doc/install
16+
17+
----------
18+
19+
**Get Started:**
20+
21+
1. Clone the Repo.
22+
2. Ensure your server has a route which expects a { keyword: "sample" } and returns a String.
23+
3. Open textprocessing.go in a text editor and change line 381 to your server API address.
24+
4. Open terminal/CMD and run go run textprocessing.go from within the directory.
25+
26+
## Files
27+
28+
- Test Strings.zip - Contains sample input files to provide guidance on data format.
29+
- results.txt - Sample output from the application.
30+
- sampleresponse.txt - This application expects to receive data from a server, this exemplifies the format which data is accepted.
31+
- textprocessing.go - GoLang file containing parseManager and parser go routines.
32+
33+
## Performance
34+
The performance of this application has been measured against a sequential C++ solution and a shared memory solution in NVIDIA's CUDA.
35+
36+
![Performance Analysis](https://i.imgur.com/xddRvbN.png)
37+
38+
**Notice for particularly small datasets, this solution is slower, due to an increased overhead.**
39+
40+
## Data Flow Diagram
41+
42+
The data flow for this program looks like:
43+
44+
**Can be viewed in stackedit.io**
45+
46+
```mermaid
47+
graph LR
48+
A[parseManager] -- Keyword --> G[server]
49+
G[server] -- Bulk String --> A[parseManager]
50+
A[parseManager] -- Sub-string --> B((parser))
51+
B((parser)) -- Frequency Map --> A[parseManager]
52+
A[parseManager] -- Sub-string --> C((parser))
53+
C((parser)) -- Frequency Map --> A[parseManager]
54+
A[parseManager] -- Sub-string --> D((parser))
55+
D((parser)) -- Frequency Map --> A[parseManager]
56+
A[parseManager] -- Sub-string --> E((parser))
57+
E((parser)) -- Frequency Map --> A[parseManager]
58+
A[parseManager] -- Prints Sorted Map --> F(Screen)
59+
```

0 commit comments

Comments
 (0)