-
Notifications
You must be signed in to change notification settings - Fork 5
✨ Ensure file chunks are uploaded concurrently and improve PaginationIterator
#220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Ensure file chunks are uploaded concurrently and improve PaginationIterator
#220
Conversation
PaginationIterator
Before this PR here's what a profile of the upload function looked like: ╰─$ pyinstrument --show='*/osparc/*' -m pytest -k test_upload_download_file_ram_usage
================================================================================= test session starts =================================================================================
platform linux -- Python 3.10.11, pytest-7.4.4, pluggy-1.5.0
rootdir: /home/mads/Development/oSparc/osparc-simcore-clients/clients/python/test/e2e
configfile: pytest.ini
plugins: anyio-4.6.2.post1, respx-0.21.1, asyncio-0.23.8, mock-3.14.0, env-1.1.3, Faker-33.0.0, metadata-3.1.1, html-4.1.1
asyncio: mode=auto
collected 15 items / 14 deselected / 1 selected
test_files_api.py . [100%]
================================================================================== warnings summary ===================================================================================
../../../../.venv/lib/python3.10/site-packages/jupyter_client/connect.py:22
/home/mads/Development/oSparc/osparc-simcore-clients/.venv/lib/python3.10/site-packages/jupyter_client/connect.py:22: DeprecationWarning: Jupyter is migrating its paths to use standard platformdirs
given by the platformdirs library. To remove this warning and
see the appropriate new directories, set the environment variable
`JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
The use of platformdirs will be the default in `jupyter_core` v6
from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
----- generated xml file: /tmp/osparc-simcore-client/e2e/e2e_tutorial_tests/0.8.3.post0.dev20_api.osparc-master.speag.com/junit_0.8.3.post0.dev20_api.osparc-master.speag.com.xml -----
- Generated html report: file:///tmp/osparc-simcore-client/e2e/e2e_tutorial_tests/0.8.3.post0.dev20_api.osparc-master.speag.com/pytest_0.8.3.post0.dev20_api.osparc-master.speag.com.html -
=============================================================== 1 passed, 14 deselected, 1 warning in 86.71s (0:01:26) ================================================================
_ ._ __/__ _ _ _ _ _/_ Recorded: 10:01:16 Samples: 29134
/_//_/// /_\ / //_// / //_'/ // Duration: 87.436 CPU time: 32.808
/ _/ v5.0.0
Program: pytest -k test_upload_download_file_ram_usage
87.368 <module> pytest/__main__.py:1
└─ 87.368 console_main _pytest/config/__init__.py:185
[34 frames hidden] _pytest, pluggy
60.470 call_fixture_func _pytest/fixtures.py:886
└─ 60.440 large_server_file conftest.py:145
└─ 60.439 FilesApi.upload_file osparc/_api_files_api.py:115
└─ 60.439 run nest_asyncio.py:25
[5 frames hidden] nest_asyncio, selectors, asyncio
53.711 epoll.poll <built-in>
4.985 Task.__step asyncio/tasks.py:215
└─ 4.896 FilesApi.upload_file_async osparc/_api_files_api.py:125
├─ 3.238 FilesApi.get_upload_links ../artifacts/client/osparc_client/api/files_api.py:592
│ └─ 3.238 FilesApi.get_upload_links_with_http_info ../artifacts/client/osparc_client/api/files_api.py:617
│ └─ 3.238 ApiClient.call_api ../artifacts/client/osparc_client/api_client.py:310
│ └─ 3.238 ApiClient.__call_api ../artifacts/client/osparc_client/api_client.py:120
│ └─ 3.237 ApiClient.request ../artifacts/client/osparc_client/api_client.py:373
│ └─ 3.237 RESTClientObject.POST ../artifacts/client/osparc_client/rest.py:272
│ └─ 3.237 RESTClientObject.request ../artifacts/client/osparc_client/rest.py:109
│ └─ 3.237 PoolManager.request urllib3/_request_methods.py:69
│ [12 frames hidden] urllib3, http, socket, ssl, <built-in>
└─ 1.226 compute_sha256 osparc/_utils.py:110
└─ 0.877 HASH.update <built-in>
1.175 Task.__step asyncio/tasks.py:215
└─ 1.167 FilesApi.upload_file_async osparc/_api_files_api.py:125
└─ 1.157 FilesApi._upload_chunck osparc/_api_files_api.py:229
└─ 1.144 AsyncHttpClient.put osparc/_http_client.py:111
└─ 1.144 tuple._request osparc/_http_client.py:69
└─ 1.144 async_wrapped tenacity/asyncio/__init__.py:181
└─ 1.144 AsyncRetrying.__call__ tenacity/asyncio/__init__.py:104
└─ 1.141 _ osparc/_http_client.py:75
└─ 1.141 AsyncClient.put httpx/_client.py:1921
[14 frames hidden] httpx, httpcore, anyio
25.643 pytest_pyfunc_call _pytest/python.py:187
└─ 25.643 test_upload_download_file_ram_usage test_files_api.py:49
├─ 24.544 memory_usage memory_profiler.py:269
│ ├─ 22.450 FilesApi.download_file osparc/_api_files_api.py:62
│ │ └─ 22.450 run nest_asyncio.py:25
│ │ [8 frames hidden] nest_asyncio, asyncio, selectors, <bu...
│ │ 9.804 Task.__step asyncio/tasks.py:215
│ │ └─ 9.296 FilesApi.download_file_async osparc/_api_files_api.py:79
│ │ ├─ 7.014 Response.aiter_bytes httpx/_models.py:916
│ │ │ [13 frames hidden] httpx, httpcore, contextlib, anyio, h11
│ │ └─ 1.774 AsyncBufferedIOBase.method aiofiles/threadpool/utils.py:41
│ │ └─ 1.514 _UnixSelectorEventLoop.run_in_executor asyncio/base_events.py:807
│ │ 1.326 Task.__step asyncio/tasks.py:215
│ │ └─ 1.241 FilesApi.download_file_async osparc/_api_files_api.py:79
│ └─ 2.060 FilesApi.upload_file osparc/_api_files_api.py:115
│ └─ 2.060 run nest_asyncio.py:25
│ [4 frames hidden] nest_asyncio, asyncio
│ 1.749 Task.__step asyncio/tasks.py:215
│ └─ 1.728 FilesApi.upload_file_async osparc/_api_files_api.py:125
│ └─ 1.210 compute_sha256 osparc/_utils.py:110
│ └─ 1.021 HASH.update <built-in>
└─ 1.100 _hash_file test_files_api.py:21
└─ 0.887 HASH.update <built-in> |
And here is the profile after doing the upload concurrently: ╰─$ pyinstrument --show='*/osparc/*' -m pytest -k test_upload_download_file_ram_usage
================================================================================= test session starts =================================================================================
platform linux -- Python 3.10.11, pytest-7.4.4, pluggy-1.5.0
rootdir: /home/mads/Development/oSparc/osparc-simcore-clients/clients/python/test/e2e
configfile: pytest.ini
plugins: anyio-4.6.2.post1, respx-0.21.1, asyncio-0.23.8, mock-3.14.0, env-1.1.3, Faker-33.0.0, metadata-3.1.1, html-4.1.1
asyncio: mode=auto
collected 15 items / 14 deselected / 1 selected
test_files_api.py . [100%]
================================================================================== warnings summary ===================================================================================
../../../../.venv/lib/python3.10/site-packages/jupyter_client/connect.py:22
/home/mads/Development/oSparc/osparc-simcore-clients/.venv/lib/python3.10/site-packages/jupyter_client/connect.py:22: DeprecationWarning: Jupyter is migrating its paths to use standard platformdirs
given by the platformdirs library. To remove this warning and
see the appropriate new directories, set the environment variable
`JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
The use of platformdirs will be the default in `jupyter_core` v6
from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
----- generated xml file: /tmp/osparc-simcore-client/e2e/e2e_tutorial_tests/0.8.3.post0.dev20_api.osparc-master.speag.com/junit_0.8.3.post0.dev20_api.osparc-master.speag.com.xml -----
- Generated html report: file:///tmp/osparc-simcore-client/e2e/e2e_tutorial_tests/0.8.3.post0.dev20_api.osparc-master.speag.com/pytest_0.8.3.post0.dev20_api.osparc-master.speag.com.html -
==================================================================== 1 passed, 14 deselected, 1 warning in 48.52s =====================================================================
_ ._ __/__ _ _ _ _ _/_ Recorded: 11:09:13 Samples: 31556
/_//_/// /_\ / //_// / //_'/ // Duration: 49.249 CPU time: 35.423
/ _/ v5.0.0
Program: pytest -k test_upload_download_file_ram_usage
49.179 <module> pytest/__main__.py:1
└─ 49.179 console_main _pytest/config/__init__.py:185
[44 frames hidden] _pytest, pluggy
26.711 pytest_pyfunc_call _pytest/python.py:187
└─ 26.711 test_upload_download_file_ram_usage test_files_api.py:49
├─ 25.543 memory_usage memory_profiler.py:269
│ ├─ 23.295 FilesApi.download_file osparc/_api_files_api.py:64
│ │ └─ 23.295 run nest_asyncio.py:25
│ │ [15 frames hidden] nest_asyncio, asyncio, <built-in>, se...
│ │ 10.347 Task.__step asyncio/tasks.py:215
│ │ └─ 9.811 FilesApi.download_file_async osparc/_api_files_api.py:81
│ │ ├─ 7.435 Response.aiter_bytes httpx/_models.py:916
│ │ │ [20 frames hidden] httpx, httpcore, anyio, ssl, <built-i...
│ │ └─ 1.843 AsyncBufferedIOBase.method aiofiles/threadpool/utils.py:41
│ │ [4 frames hidden] asyncio, concurrent
│ │ 1.290 Task.__step asyncio/tasks.py:215
│ │ └─ 1.231 FilesApi.download_file_async osparc/_api_files_api.py:81
│ │ └─ 0.866 Response.aiter_bytes httpx/_models.py:916
│ │ [7 frames hidden] httpx, httpcore
│ └─ 2.147 FilesApi.upload_file osparc/_api_files_api.py:117
│ └─ 2.147 run nest_asyncio.py:25
│ [6 frames hidden] nest_asyncio, asyncio, selectors, <bu...
│ 0.875 Task.__step asyncio/tasks.py:215
│ └─ 0.817 FilesApi.upload_file_async osparc/_api_files_api.py:133
│ └─ 0.614 compute_sha256 osparc/_utils.py:110
└─ 1.168 _hash_file test_files_api.py:21
└─ 0.854 HASH.update <built-in>
20.801 call_fixture_func _pytest/fixtures.py:886
└─ 20.770 large_server_file conftest.py:145
└─ 20.769 FilesApi.upload_file osparc/_api_files_api.py:117
└─ 20.769 run nest_asyncio.py:25
[5 frames hidden] nest_asyncio, selectors, asyncio
12.702 epoll.poll <built-in>
4.896 Task.__step asyncio/tasks.py:215
└─ 4.561 FilesApi.upload_file_async osparc/_api_files_api.py:133
├─ 3.224 FilesApi.get_upload_links ../artifacts/client/osparc_client/api/files_api.py:592
│ └─ 3.224 FilesApi.get_upload_links_with_http_info ../artifacts/client/osparc_client/api/files_api.py:617
│ └─ 3.224 ApiClient.call_api ../artifacts/client/osparc_client/api_client.py:310
│ └─ 3.224 ApiClient.__call_api ../artifacts/client/osparc_client/api_client.py:120
│ └─ 3.223 ApiClient.request ../artifacts/client/osparc_client/api_client.py:373
│ └─ 3.223 RESTClientObject.POST ../artifacts/client/osparc_client/rest.py:272
│ └─ 3.223 RESTClientObject.request ../artifacts/client/osparc_client/rest.py:109
│ └─ 3.223 PoolManager.request urllib3/_request_methods.py:69
│ [12 frames hidden] urllib3, http, socket, ssl, <built-in>
└─ 1.125 compute_sha256 osparc/_utils.py:110
├─ 0.531 HASH.update <built-in>
└─ 0.528 AsyncBufferedReader.method aiofiles/threadpool/utils.py:41
2.383 Task.__step asyncio/tasks.py:215
└─ 2.339 FilesApi._upload_chunck osparc/_api_files_api.py:249
└─ 2.312 AsyncHttpClient.put osparc/_http_client.py:111
└─ 2.311 tuple._request osparc/_http_client.py:69
└─ 2.291 async_wrapped tenacity/asyncio/__init__.py:181
└─ 2.284 AsyncRetrying.__call__ tenacity/asyncio/__init__.py:104
└─ 2.266 _ osparc/_http_client.py:75
└─ 2.265 AsyncClient.put httpx/_client.py:1921
[16 frames hidden] httpx, httpcore, anyio, ssl, <built-in>
0.607 _teardown_yield_fixture _pytest/fixtures.py:906
└─ 0.607 large_server_file conftest.py:145
└─ 0.607 FilesApi.delete_file ../artifacts/client/osparc_client/api/files_api.py:272
└─ 0.607 FilesApi.delete_file_with_http_info ../artifacts/client/osparc_client/api/files_api.py:296
└─ 0.607 ApiClient.call_api ../artifacts/client/osparc_client/api_client.py:310
└─ 0.607 ApiClient.__call_api ../artifacts/client/osparc_client/api_client.py:120
└─ 0.607 ApiClient.request ../artifacts/client/osparc_client/api_client.py:373
└─ 0.607 RESTClientObject.DELETE ../artifacts/client/osparc_client/rest.py:263
└─ 0.607 RESTClientObject.request ../artifacts/client/osparc_client/rest.py:109
└─ 0.607 PoolManager.request urllib3/_request_methods.py:69
[12 frames hidden] urllib3, http, socket, ssl, <built-in>
To view this report with different options, run:
pyinstrument --load-prev 2024-11-18T11-09-13 [options]
As this profile shows the upload time is almost cut in half by doing it concurrently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pair-reviewed.
What do these changes do?
This PR improves several features of the
osparc
client:PaginationGenerator
->PaginationIterable
to better fit proper python patterns https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes.Related issue/s
How to test
For internal developers