|
2 | 2 |
|
3 | 3 | Betfair Data is a very fast Betfair historical data file parsing library for python. It currently supports tar archives containing BZ2 compressed NLJSON files (the standard format provided by [Betfair's historic data portal](https://historicdata.betfair.com/#/home)).
|
4 | 4 |
|
5 |
| -The library is written in Rust and uses advanced performance enhancing techniques, like in place json deserialization and decompressing Bz2 encoded data on worker threads and is ideal for parsing large quantities of historic data that could otherwise take hours or days to parse. |
| 5 | +The library is written in Rust and uses advanced performance enhancing techniques, like in place json deserialization and decompressing Bz2/Gzip encoded data on worker threads and is ideal for parsing large quantities of historic data that could otherwise take hours or days to parse. |
| 6 | + |
| 7 | +This library is a work in progress and is still subject to breaking changes. |
6 | 8 |
|
7 | 9 | ## Installation
|
8 | 10 |
|
@@ -38,80 +40,76 @@ for market in betfair_data.TarBz2(paths).mutable():
|
38 | 40 | print(f"Markets {market_count} Updates {update_count}")
|
39 | 41 |
|
40 | 42 | ```
|
41 |
| -## Types |
42 |
| -IDE's should automatically detect the types and provide checking and auto complete. See the [pyi stub file](betfair_data.pyi) for a comprehensive view of the types and method available. |
43 |
| - |
44 |
| -<br /> |
45 |
| - |
46 |
| -## Benchmarks |
47 |
| - |
48 |
| -| Betfair Data (this) | [Betfairlightweight](https://github.com/liampauling/betfair/) | |
49 |
| -| ---------------------|---------------------| |
50 |
| -| 3m 37sec | 1hour 1min 45sec | |
51 |
| -| ~101 markets/sec | ~6 markets/sec | |
52 |
| -| ~768,000 updates/sec | ~45,500 updates/sec | |
53 |
| - |
54 |
| -Benchmarks were run against 3 months of Australian racing markets comprising roughly 22,000 markets. Benchmarks were run on a M1 Macbook Pro with 32GB ram. |
55 | 43 |
|
56 |
| -These results should only be used as a rough comparison, different machines, different sports and even different months can effect the performance and overall markets/updates per second. |
| 44 | +## Loading Files |
57 | 45 |
|
58 |
| -No disrespect is intended towards betfairlightweight, which remains an amazing library and a top choice for working with the Betfair API. Every effort was made to have its benchmark below run as fast as possible, and any improvements are welcome. |
59 |
| - |
60 |
| -<br> |
| 46 | +You can read in self recorded stream files. Make sure to set cumulative_runner_tv to False for self recorded files to make sure you get the correct runner and market volumes. |
| 47 | +```python |
| 48 | +import betfair_data |
| 49 | +import glob |
61 | 50 |
|
62 |
| -Betfair_Data benchmark show in the example above. |
63 |
| -<details><summary>Betfairlightweight Benchmark</summary> |
| 51 | +paths = glob.glob("data/*.gz") |
| 52 | +files = betfair_data.Files(paths, cumulative_runner_tv=False) |
| 53 | +``` |
| 54 | +Or you can read official Betfair Tar archives with bz2 encoded market files. |
64 | 55 |
|
65 | 56 | ```python
|
66 |
| -from typing import Sequence |
67 |
| - |
68 |
| -import unittest.mock |
69 |
| -import tarfile |
70 |
| -import bz2 |
71 |
| -import betfairlightweight |
| 57 | +import betfair_data |
| 58 | +import glob |
72 | 59 |
|
73 |
| -trading = betfairlightweight.APIClient("username", "password", "appkey") |
74 |
| -listener = betfairlightweight.StreamListener( |
75 |
| - max_latency=None, lightweight=True, update_clk=False, output_queue=None, cumulative_runner_tv=True, calculate_market_tv=True |
76 |
| -) |
| 60 | +paths = glob.glob("data/*.tar") |
| 61 | +files = betfair_data.TarBz2(paths, cumulative_runner_tv=True) |
| 62 | +``` |
77 | 63 |
|
78 |
| -paths = [ |
79 |
| - "data/2021_10_OctRacingAUPro.tar", |
80 |
| - "data/2021_11_NovRacingAUPro.tar", |
81 |
| - "data/2021_12_DecRacingAUPro.tar" |
82 |
| -] |
| 64 | +Or load the file through any other means and pass the bytes and name into the object constructors. |
83 | 65 |
|
84 |
| -def load_tar(file_paths: Sequence[str]): |
85 |
| - for file_path in file_paths: |
86 |
| - with tarfile.TarFile(file_path) as archive: |
87 |
| - for file in archive: |
88 |
| - yield bz2.open(archive.extractfile(file)) |
89 |
| - return None |
| 66 | +```python |
| 67 | +# generator to read in files |
| 68 | +def load_files(paths: str): |
| 69 | + for path in glob.glob(paths, recursive=True): |
| 70 | + with open(path, "rb") as file: |
| 71 | + yield (path, file.read()) |
| 72 | + |
| 73 | +# iterate over the files and convert into bflw iterator |
| 74 | +for name, bs in load_files("markets/*.json"): |
| 75 | + for market_books in bflw.BflwIter(name, bs): |
| 76 | + for market_book in market_books: |
| 77 | + # do stuff |
| 78 | + pass |
| 79 | +``` |
90 | 80 |
|
91 |
| -market_count = 0 |
92 |
| -update_count = 0 |
| 81 | +## Object Types |
93 | 82 |
|
94 |
| -for file_obj in load_tar(paths): |
95 |
| - with unittest.mock.patch("builtins.open", lambda f, _: f): |
96 |
| - stream = trading.streaming.create_historical_generator_stream( |
97 |
| - file_path=file_obj, |
98 |
| - listener=listener, |
99 |
| - ) |
100 |
| - gen = stream.get_generator() |
| 83 | +You can use differnt styles of objects, with pros or depending on your needs |
101 | 84 |
|
102 |
| - market_count += 1 |
103 |
| - for market_books in gen(): |
104 |
| - for market_book in market_books: |
105 |
| - update_count += 1 |
| 85 | +Mutable objects, generally the fastest, but can be hard to use. If you find yourself calling market.copy a lot, you may find immutable faster |
| 86 | +``` python |
| 87 | +# where files is loaded from a TarBz2 or Files source like above |
| 88 | +mut_iter = files.mutable() |
| 89 | +for market in mut_iter: # different markets per file |
| 90 | + while market.update(): # update the market in place |
| 91 | + pass |
| 92 | +``` |
106 | 93 |
|
107 |
| - print(f"Markets {market_count} Updates {update_count}", end='\r') |
108 |
| -print(f"Markets {market_count} Updates {update_count}") |
| 94 | +Immutable objects, slightly slower but can be easier to use. Equilivent of calling market.copy() on every update but faster, as only objects that change make new copies. ```NOT YET FINISHED``` |
| 95 | +``` python |
| 96 | +immut_iter = files.immutable() |
| 97 | +for market_iter in immut_iter: # different files |
| 98 | + for market in market_iter: # each update of a market/file |
| 99 | + pass |
| 100 | +``` |
109 | 101 |
|
| 102 | +Betfairlightweight compatible version, drop in replacement for bflw objects. |
| 103 | +```python |
| 104 | +bflw_iter = files.bflw() |
| 105 | +for file in bflw_iter: # different files |
| 106 | + for market_books in file: # different books per update |
| 107 | + for market in market_books: # each update of a market |
| 108 | + pass |
110 | 109 | ```
|
111 |
| -</details> |
112 | 110 |
|
113 |
| -<br> |
114 |
| -<br> |
| 111 | +## Types |
| 112 | +IDE's should automatically detect the types and provide checking and auto complete. See the [pyi stub file](betfair_data.pyi) for a comprehensive view of the types and method available. |
115 | 113 |
|
116 | 114 |
|
117 | 115 | ## Logging
|
|
0 commit comments