Skip to content

✨ Ensure file chunks are uploaded concurrently and improve PaginationIterator #220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

bisgaard-itis
Copy link
Collaborator

@bisgaard-itis bisgaard-itis commented Nov 18, 2024

What do these changes do?

This PR improves several features of the osparc client:

Related issue/s

How to test

For internal developers

@bisgaard-itis bisgaard-itis changed the title ✨ start concurrent upload ✨ Ensure file chunks are uploaded concurrently and improve PaginationIterator Nov 18, 2024
@bisgaard-itis
Copy link
Collaborator Author

Before this PR here's what a profile of the upload function looked like:

╰─$ pyinstrument --show='*/osparc/*' -m pytest -k test_upload_download_file_ram_usage
================================================================================= test session starts =================================================================================
platform linux -- Python 3.10.11, pytest-7.4.4, pluggy-1.5.0
rootdir: /home/mads/Development/oSparc/osparc-simcore-clients/clients/python/test/e2e
configfile: pytest.ini
plugins: anyio-4.6.2.post1, respx-0.21.1, asyncio-0.23.8, mock-3.14.0, env-1.1.3, Faker-33.0.0, metadata-3.1.1, html-4.1.1
asyncio: mode=auto
collected 15 items / 14 deselected / 1 selected                                                                                                                                       

test_files_api.py .                                                                                                                                                             [100%]

================================================================================== warnings summary ===================================================================================
../../../../.venv/lib/python3.10/site-packages/jupyter_client/connect.py:22
  /home/mads/Development/oSparc/osparc-simcore-clients/.venv/lib/python3.10/site-packages/jupyter_client/connect.py:22: DeprecationWarning: Jupyter is migrating its paths to use standard platformdirs
  given by the platformdirs library.  To remove this warning and
  see the appropriate new directories, set the environment variable
  `JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
  The use of platformdirs will be the default in `jupyter_core` v6
    from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
----- generated xml file: /tmp/osparc-simcore-client/e2e/e2e_tutorial_tests/0.8.3.post0.dev20_api.osparc-master.speag.com/junit_0.8.3.post0.dev20_api.osparc-master.speag.com.xml -----
- Generated html report: file:///tmp/osparc-simcore-client/e2e/e2e_tutorial_tests/0.8.3.post0.dev20_api.osparc-master.speag.com/pytest_0.8.3.post0.dev20_api.osparc-master.speag.com.html -
=============================================================== 1 passed, 14 deselected, 1 warning in 86.71s (0:01:26) ================================================================

  _     ._   __/__   _ _  _  _ _/_   Recorded: 10:01:16  Samples:  29134
 /_//_/// /_\ / //_// / //_'/ //     Duration: 87.436    CPU time: 32.808
/   _/                      v5.0.0

Program: pytest -k test_upload_download_file_ram_usage

87.368 <module>  pytest/__main__.py:1
└─ 87.368 console_main  _pytest/config/__init__.py:185
      [34 frames hidden]  _pytest, pluggy
         60.470 call_fixture_func  _pytest/fixtures.py:886
         └─ 60.440 large_server_file  conftest.py:145
            └─ 60.439 FilesApi.upload_file  osparc/_api_files_api.py:115
               └─ 60.439 run  nest_asyncio.py:25
                     [5 frames hidden]  nest_asyncio, selectors, asyncio
                        53.711 epoll.poll  <built-in>
                        4.985 Task.__step  asyncio/tasks.py:215
                        └─ 4.896 FilesApi.upload_file_async  osparc/_api_files_api.py:125
                           ├─ 3.238 FilesApi.get_upload_links  ../artifacts/client/osparc_client/api/files_api.py:592
                           │  └─ 3.238 FilesApi.get_upload_links_with_http_info  ../artifacts/client/osparc_client/api/files_api.py:617
                           │     └─ 3.238 ApiClient.call_api  ../artifacts/client/osparc_client/api_client.py:310
                           │        └─ 3.238 ApiClient.__call_api  ../artifacts/client/osparc_client/api_client.py:120
                           │           └─ 3.237 ApiClient.request  ../artifacts/client/osparc_client/api_client.py:373
                           │              └─ 3.237 RESTClientObject.POST  ../artifacts/client/osparc_client/rest.py:272
                           │                 └─ 3.237 RESTClientObject.request  ../artifacts/client/osparc_client/rest.py:109
                           │                    └─ 3.237 PoolManager.request  urllib3/_request_methods.py:69
                           │                          [12 frames hidden]  urllib3, http, socket, ssl, <built-in>
                           └─ 1.226 compute_sha256  osparc/_utils.py:110
                              └─ 0.877 HASH.update  <built-in>
                        1.175 Task.__step  asyncio/tasks.py:215
                        └─ 1.167 FilesApi.upload_file_async  osparc/_api_files_api.py:125
                           └─ 1.157 FilesApi._upload_chunck  osparc/_api_files_api.py:229
                              └─ 1.144 AsyncHttpClient.put  osparc/_http_client.py:111
                                 └─ 1.144 tuple._request  osparc/_http_client.py:69
                                    └─ 1.144 async_wrapped  tenacity/asyncio/__init__.py:181
                                       └─ 1.144 AsyncRetrying.__call__  tenacity/asyncio/__init__.py:104
                                          └─ 1.141 _  osparc/_http_client.py:75
                                             └─ 1.141 AsyncClient.put  httpx/_client.py:1921
                                                   [14 frames hidden]  httpx, httpcore, anyio
         25.643 pytest_pyfunc_call  _pytest/python.py:187
         └─ 25.643 test_upload_download_file_ram_usage  test_files_api.py:49
            ├─ 24.544 memory_usage  memory_profiler.py:269
            │  ├─ 22.450 FilesApi.download_file  osparc/_api_files_api.py:62
            │  │  └─ 22.450 run  nest_asyncio.py:25
            │  │        [8 frames hidden]  nest_asyncio, asyncio, selectors, <bu...
            │  │           9.804 Task.__step  asyncio/tasks.py:215
            │  │           └─ 9.296 FilesApi.download_file_async  osparc/_api_files_api.py:79
            │  │              ├─ 7.014 Response.aiter_bytes  httpx/_models.py:916
            │  │              │     [13 frames hidden]  httpx, httpcore, contextlib, anyio, h11
            │  │              └─ 1.774 AsyncBufferedIOBase.method  aiofiles/threadpool/utils.py:41
            │  │                 └─ 1.514 _UnixSelectorEventLoop.run_in_executor  asyncio/base_events.py:807
            │  │           1.326 Task.__step  asyncio/tasks.py:215
            │  │           └─ 1.241 FilesApi.download_file_async  osparc/_api_files_api.py:79
            │  └─ 2.060 FilesApi.upload_file  osparc/_api_files_api.py:115
            │     └─ 2.060 run  nest_asyncio.py:25
            │           [4 frames hidden]  nest_asyncio, asyncio
            │              1.749 Task.__step  asyncio/tasks.py:215
            │              └─ 1.728 FilesApi.upload_file_async  osparc/_api_files_api.py:125
            │                 └─ 1.210 compute_sha256  osparc/_utils.py:110
            │                    └─ 1.021 HASH.update  <built-in>
            └─ 1.100 _hash_file  test_files_api.py:21
               └─ 0.887 HASH.update  <built-in>

@bisgaard-itis
Copy link
Collaborator Author

bisgaard-itis commented Nov 18, 2024

And here is the profile after doing the upload concurrently:

╰─$ pyinstrument --show='*/osparc/*' -m pytest -k test_upload_download_file_ram_usage
================================================================================= test session starts =================================================================================
platform linux -- Python 3.10.11, pytest-7.4.4, pluggy-1.5.0
rootdir: /home/mads/Development/oSparc/osparc-simcore-clients/clients/python/test/e2e
configfile: pytest.ini
plugins: anyio-4.6.2.post1, respx-0.21.1, asyncio-0.23.8, mock-3.14.0, env-1.1.3, Faker-33.0.0, metadata-3.1.1, html-4.1.1
asyncio: mode=auto
collected 15 items / 14 deselected / 1 selected                                                                                                                                       

test_files_api.py .                                                                                                                                                             [100%]

================================================================================== warnings summary ===================================================================================
../../../../.venv/lib/python3.10/site-packages/jupyter_client/connect.py:22
  /home/mads/Development/oSparc/osparc-simcore-clients/.venv/lib/python3.10/site-packages/jupyter_client/connect.py:22: DeprecationWarning: Jupyter is migrating its paths to use standard platformdirs
  given by the platformdirs library.  To remove this warning and
  see the appropriate new directories, set the environment variable
  `JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
  The use of platformdirs will be the default in `jupyter_core` v6
    from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
----- generated xml file: /tmp/osparc-simcore-client/e2e/e2e_tutorial_tests/0.8.3.post0.dev20_api.osparc-master.speag.com/junit_0.8.3.post0.dev20_api.osparc-master.speag.com.xml -----
- Generated html report: file:///tmp/osparc-simcore-client/e2e/e2e_tutorial_tests/0.8.3.post0.dev20_api.osparc-master.speag.com/pytest_0.8.3.post0.dev20_api.osparc-master.speag.com.html -
==================================================================== 1 passed, 14 deselected, 1 warning in 48.52s =====================================================================

  _     ._   __/__   _ _  _  _ _/_   Recorded: 11:09:13  Samples:  31556
 /_//_/// /_\ / //_// / //_'/ //     Duration: 49.249    CPU time: 35.423
/   _/                      v5.0.0

Program: pytest -k test_upload_download_file_ram_usage

49.179 <module>  pytest/__main__.py:1
└─ 49.179 console_main  _pytest/config/__init__.py:185
      [44 frames hidden]  _pytest, pluggy
         26.711 pytest_pyfunc_call  _pytest/python.py:187
         └─ 26.711 test_upload_download_file_ram_usage  test_files_api.py:49
            ├─ 25.543 memory_usage  memory_profiler.py:269
            │  ├─ 23.295 FilesApi.download_file  osparc/_api_files_api.py:64
            │  │  └─ 23.295 run  nest_asyncio.py:25
            │  │        [15 frames hidden]  nest_asyncio, asyncio, <built-in>, se...
            │  │           10.347 Task.__step  asyncio/tasks.py:215
            │  │           └─ 9.811 FilesApi.download_file_async  osparc/_api_files_api.py:81
            │  │              ├─ 7.435 Response.aiter_bytes  httpx/_models.py:916
            │  │              │     [20 frames hidden]  httpx, httpcore, anyio, ssl, <built-i...
            │  │              └─ 1.843 AsyncBufferedIOBase.method  aiofiles/threadpool/utils.py:41
            │  │                    [4 frames hidden]  asyncio, concurrent
            │  │           1.290 Task.__step  asyncio/tasks.py:215
            │  │           └─ 1.231 FilesApi.download_file_async  osparc/_api_files_api.py:81
            │  │              └─ 0.866 Response.aiter_bytes  httpx/_models.py:916
            │  │                    [7 frames hidden]  httpx, httpcore
            │  └─ 2.147 FilesApi.upload_file  osparc/_api_files_api.py:117
            │     └─ 2.147 run  nest_asyncio.py:25
            │           [6 frames hidden]  nest_asyncio, asyncio, selectors, <bu...
            │              0.875 Task.__step  asyncio/tasks.py:215
            │              └─ 0.817 FilesApi.upload_file_async  osparc/_api_files_api.py:133
            │                 └─ 0.614 compute_sha256  osparc/_utils.py:110
            └─ 1.168 _hash_file  test_files_api.py:21
               └─ 0.854 HASH.update  <built-in>
         20.801 call_fixture_func  _pytest/fixtures.py:886
         └─ 20.770 large_server_file  conftest.py:145
            └─ 20.769 FilesApi.upload_file  osparc/_api_files_api.py:117
               └─ 20.769 run  nest_asyncio.py:25
                     [5 frames hidden]  nest_asyncio, selectors, asyncio
                        12.702 epoll.poll  <built-in>
                        4.896 Task.__step  asyncio/tasks.py:215
                        └─ 4.561 FilesApi.upload_file_async  osparc/_api_files_api.py:133
                           ├─ 3.224 FilesApi.get_upload_links  ../artifacts/client/osparc_client/api/files_api.py:592
                           │  └─ 3.224 FilesApi.get_upload_links_with_http_info  ../artifacts/client/osparc_client/api/files_api.py:617
                           │     └─ 3.224 ApiClient.call_api  ../artifacts/client/osparc_client/api_client.py:310
                           │        └─ 3.224 ApiClient.__call_api  ../artifacts/client/osparc_client/api_client.py:120
                           │           └─ 3.223 ApiClient.request  ../artifacts/client/osparc_client/api_client.py:373
                           │              └─ 3.223 RESTClientObject.POST  ../artifacts/client/osparc_client/rest.py:272
                           │                 └─ 3.223 RESTClientObject.request  ../artifacts/client/osparc_client/rest.py:109
                           │                    └─ 3.223 PoolManager.request  urllib3/_request_methods.py:69
                           │                          [12 frames hidden]  urllib3, http, socket, ssl, <built-in>
                           └─ 1.125 compute_sha256  osparc/_utils.py:110
                              ├─ 0.531 HASH.update  <built-in>
                              └─ 0.528 AsyncBufferedReader.method  aiofiles/threadpool/utils.py:41
                        2.383 Task.__step  asyncio/tasks.py:215
                        └─ 2.339 FilesApi._upload_chunck  osparc/_api_files_api.py:249
                           └─ 2.312 AsyncHttpClient.put  osparc/_http_client.py:111
                              └─ 2.311 tuple._request  osparc/_http_client.py:69
                                 └─ 2.291 async_wrapped  tenacity/asyncio/__init__.py:181
                                    └─ 2.284 AsyncRetrying.__call__  tenacity/asyncio/__init__.py:104
                                       └─ 2.266 _  osparc/_http_client.py:75
                                          └─ 2.265 AsyncClient.put  httpx/_client.py:1921
                                                [16 frames hidden]  httpx, httpcore, anyio, ssl, <built-in>
         0.607 _teardown_yield_fixture  _pytest/fixtures.py:906
         └─ 0.607 large_server_file  conftest.py:145
            └─ 0.607 FilesApi.delete_file  ../artifacts/client/osparc_client/api/files_api.py:272
               └─ 0.607 FilesApi.delete_file_with_http_info  ../artifacts/client/osparc_client/api/files_api.py:296
                  └─ 0.607 ApiClient.call_api  ../artifacts/client/osparc_client/api_client.py:310
                     └─ 0.607 ApiClient.__call_api  ../artifacts/client/osparc_client/api_client.py:120
                        └─ 0.607 ApiClient.request  ../artifacts/client/osparc_client/api_client.py:373
                           └─ 0.607 RESTClientObject.DELETE  ../artifacts/client/osparc_client/rest.py:263
                              └─ 0.607 RESTClientObject.request  ../artifacts/client/osparc_client/rest.py:109
                                 └─ 0.607 PoolManager.request  urllib3/_request_methods.py:69
                                       [12 frames hidden]  urllib3, http, socket, ssl, <built-in>

To view this report with different options, run:
    pyinstrument --load-prev 2024-11-18T11-09-13 [options]

As this profile shows the upload time is almost cut in half by doing it concurrently

@bisgaard-itis bisgaard-itis self-assigned this Nov 18, 2024
@bisgaard-itis bisgaard-itis added this to the Event Horizon milestone Nov 18, 2024
@bisgaard-itis bisgaard-itis marked this pull request as ready for review November 18, 2024 13:49
Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link

@matusdrobuliak66 matusdrobuliak66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pair-reviewed.

@bisgaard-itis bisgaard-itis merged commit bb942e4 into ITISFoundation:master Nov 19, 2024
8 checks passed
@bisgaard-itis bisgaard-itis deleted the 219-iterator-and-concurrent-file-upload branch November 19, 2024 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes to define the returing value
4 participants