|
| 1 | +# GoLang Keyword Density Analyser |
| 2 | + |
| 3 | +Implemented in GoLang, this application is used to utilise concurrency to increase the speedup of keyword density analysis on large strings between **50k - 500k** words long. |
| 4 | + |
| 5 | +Compared to a basic sequential implementation in C++, this application runs **500x faster** on a dataset of 70k words on an intel i5 3.3GHz CPU. |
| 6 | + |
| 7 | +## About |
| 8 | + |
| 9 | +This application was built in parallel with a CUDA (C++) implementation, also available on my Github, which takes advantage of GPU threads. Out of the two, this solution ran faster and scaled further. |
| 10 | + |
| 11 | +## Getting Started |
| 12 | + |
| 13 | +**Prerequisites** |
| 14 | + |
| 15 | +1. Install Go by following the instructions as seen on: https://golang.org/doc/install |
| 16 | + |
| 17 | +---------- |
| 18 | + |
| 19 | +**Get Started:** |
| 20 | + |
| 21 | +1. Clone the Repo. |
| 22 | +2. Ensure your server has a route which expects a { keyword: "sample" } and returns a String. |
| 23 | +3. Open textprocessing.go in a text editor and change line 381 to your server API address. |
| 24 | +4. Open terminal/CMD and run go run textprocessing.go from within the directory. |
| 25 | + |
| 26 | +## Files |
| 27 | + |
| 28 | + - Test Strings.zip - Contains sample input files to provide guidance on data format. |
| 29 | + - results.txt - Sample output from the application. |
| 30 | + - sampleresponse.txt - This application expects to receive data from a server, this exemplifies the format which data is accepted. |
| 31 | + - textprocessing.go - GoLang file containing parseManager and parser go routines. |
| 32 | + |
| 33 | +## Performance |
| 34 | +The performance of this application has been measured against a sequential C++ solution and a shared memory solution in NVIDIA's CUDA. |
| 35 | + |
| 36 | + |
| 37 | + |
| 38 | +**Notice for particularly small datasets, this solution is slower, due to an increased overhead.** |
| 39 | + |
| 40 | +## Data Flow Diagram |
| 41 | + |
| 42 | +The data flow for this program looks like: |
| 43 | + |
| 44 | +**Can be viewed in stackedit.io** |
| 45 | + |
| 46 | +```mermaid |
| 47 | +graph LR |
| 48 | +A[parseManager] -- Keyword --> G[server] |
| 49 | +G[server] -- Bulk String --> A[parseManager] |
| 50 | +A[parseManager] -- Sub-string --> B((parser)) |
| 51 | +B((parser)) -- Frequency Map --> A[parseManager] |
| 52 | +A[parseManager] -- Sub-string --> C((parser)) |
| 53 | +C((parser)) -- Frequency Map --> A[parseManager] |
| 54 | +A[parseManager] -- Sub-string --> D((parser)) |
| 55 | +D((parser)) -- Frequency Map --> A[parseManager] |
| 56 | +A[parseManager] -- Sub-string --> E((parser)) |
| 57 | +E((parser)) -- Frequency Map --> A[parseManager] |
| 58 | +A[parseManager] -- Prints Sorted Map --> F(Screen) |
| 59 | +``` |
0 commit comments