|
5 | 5 |
|
6 | 6 | [](https://opensource.org/licenses/Artistic-2.0)
|
7 | 7 |
|
8 |
| -Raku package with distance functions implemented using Apple's Accelerate library. |
9 |
| -Generic C implementations are also provided. |
| 8 | +Raku package with distance functions implemented in C. |
| 9 | +Apple's Accelerate library is used if available. |
| 10 | + |
| 11 | +The primary motivation for making this library is to have fast sorting and nearest neighbors computations |
| 12 | +over collections of LLM-embedding vectors. |
| 13 | + |
| 14 | +------ |
| 15 | + |
| 16 | +## Usage examples |
| 17 | + |
| 18 | + |
| 19 | +### Regular vectors |
| 20 | + |
| 21 | +Make a large (largish) collection of large vectors and find Euclidean distances over them: |
| 22 | + |
| 23 | +```perl6 |
| 24 | +use Math::DistanceFunctions::Native; |
| 25 | + |
| 26 | +my @vecs = (^1000).map({ (^1000).map({1.rand}).cache.Array }).Array; |
| 27 | +my @searchVector = (^1000).map({1.rand}); |
| 28 | + |
| 29 | +my $start = now; |
| 30 | +my @dists = @vecs.map({ euclidean-distance($_, @searchVector)}); |
| 31 | +my $tend = now; |
| 32 | +say "Total time of computing {@vecs.elems} distances: {round($tend - $start, 10 ** -6)} s"; |
| 33 | +say "Average time of a single distance computation: {($tend - $start) / @vecs.elems} s"; |
| 34 | +``` |
| 35 | +``` |
| 36 | +# Total time of computing 1000 distances: 0.63326 s |
| 37 | +# Average time of a single distance computation: 0.0006332598499999999 s |
| 38 | +``` |
| 39 | + |
| 40 | +### `CArray` vectors |
| 41 | + |
| 42 | +Use `CArray` vectors instead: |
| 43 | + |
| 44 | +```perl6 |
| 45 | +use NativeCall; |
| 46 | +my @cvecs = @vecs.map({ CArray[num64].new($_) }); |
| 47 | +my $cSearchVector = CArray[num64].new(@searchVector); |
| 48 | + |
| 49 | +$start = now; |
| 50 | +my @cdists = @cvecs.map({ euclidean-distance($_, $cSearchVector)}); |
| 51 | +$tend = now; |
| 52 | +say "Total time of computing {@cvecs.elems} distances: {round($tend - $start, 10 ** -6)} s"; |
| 53 | +say "Average time of a single distance computation: {($tend - $start) / @cvecs.elems} s"; |
| 54 | +``` |
| 55 | +``` |
| 56 | +# Total time of computing 1000 distances: 0.002994 s |
| 57 | +# Average time of a single distance computation: 2.994124e-06 s |
| 58 | +``` |
| 59 | + |
| 60 | +I.e., we get ≈ 200 times speed-up using `CArray` vectors and the functions of this package. |
| 61 | + |
| 62 | +### Edit distance |
| 63 | + |
| 64 | +The loading of this package automatically loads the (C-implemented) function `edit-distance` of |
| 65 | +["Math::DistanceFunctions::Edit"](https://github.com/antononcube/Raku-Math-DistanceFunctions-Edit). |
| 66 | +Here is an example usage: |
| 67 | + |
| 68 | +```perl6 |
| 69 | +edit-distance('racoon', 'raccoon') |
| 70 | +``` |
| 71 | +``` |
| 72 | +# 1 |
| 73 | +``` |
0 commit comments