Skip to content

Feature/SK-1575 | Test local updates on Dev Studio #898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 70 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
b7e7b4b
changes to log handling and num workers to allow for 1000 clients con…
Apr 24, 2025
4777cc8
Updates for handling upto 1000 clients
Apr 24, 2025
75188f4
Updates to prolong keep alive timeout in conjunction with benjamins u…
Apr 29, 2025
8d859de
updates to combiner, in order to remove the issue with combinerunavai…
May 14, 2025
b768590
Update to gunicorn_app.py for worker timeout increased and for a limi…
May 19, 2025
36468b9
removed fork from gunicorn_app.py
May 19, 2025
0fb9cff
Added cache for nr_active_clients to reduce the list_active_clients c…
May 19, 2025
1478072
Setting _active_clients_cache and COMBINER_CACHE_COOLDOWN as instance…
May 19, 2025
163650a
Update to remove possible infinite loop for 2+ combiners
May 21, 2025
fbb84d6
fix
Wrede Apr 24, 2025
9976a4a
Bug/SK-1577 | Path bug fix in mnist-pytorch example (#899)
Hovstadius Apr 24, 2025
87bbe3b
Bug/SK-1467 | Mismatch in click.option input and method argument for …
KatHellg Apr 24, 2025
fe0c6b2
bump fedn
Wrede Apr 24, 2025
a00bc99
Feature/SK-1449 | List projects with info in tabular format (#876)
KatHellg Apr 25, 2025
d801f75
Feature/SK-1566 | Added Telemetry Store/DTO/Route/GRPC (#897)
carl-andersson Apr 25, 2025
7fe934e
Bug/SK-1582 | Running client.get_model_trail() returns only one model…
niklastheman Apr 28, 2025
32e12c2
Feature/SK-1361 | Write test for all API GET routes (#904)
niklastheman Apr 28, 2025
6bf76af
fix header
Wrede Apr 28, 2025
03cc11b
Bug/SK-1584 | Start session not working (api client) (#905)
niklastheman Apr 29, 2025
5aa6fbe
Feature/SK-1585 | Start comparing response in api client tests (count…
niklastheman Apr 29, 2025
6006075
Bug/SK-1581 | Telemetry_bp is missing from routes (#901)
carl-andersson Apr 29, 2025
08d1eaf
Bug/SK-1589 | Race condition in project create (#906)
KatHellg Apr 29, 2025
7a2e8f3
Feature/SK-1590 | Add ability to filter for null and grater than (etc…
niklastheman May 2, 2025
9d07d4c
Bugfix/SK-1591 | MongoClient was initialized before fork (#908)
carl-andersson May 2, 2025
f834010
Feature/SK-1579 | Add a field 'updated_at' similar to 'committed_at' …
niklastheman May 5, 2025
acd1673
Feature/SK-1596 | Added routes and ClientAPI methods (#911)
carl-andersson May 6, 2025
14e7bc7
Feature/SK-1598 | Integrate OpenTelemetry with controller, combiner (…
stefanhellander May 7, 2025
20b8c84
Bugfix/SK-1602 | Telemetries are not getting cleared
carl-andersson May 8, 2025
3df0819
Feature/SK-1578 | Add tests for server functions (#900)
viktorvaladi May 9, 2025
4789c23
Feature/SK-1595 | Api client for server functions + example (#912)
viktorvaladi May 9, 2025
bf5fff0
fix (#914)
niklastheman May 9, 2025
5735ef3
Feature/SK-1364 | Start existing session via API Client (#909)
niklastheman May 9, 2025
d1a62d3
Feature/SK-1605 | Improved error handling and grpc retries in python …
carl-andersson May 9, 2025
32d3180
Feature/SK-1353 | Extended tests in async-clients (#844)
benjaminastrand May 9, 2025
aa77a44
Feature/SK-1261 | Split Learning in FEDn (#776)
FrankJonasmoelle May 9, 2025
fa8417a
Feature/SK-1609 | Only init OTEL if configured, init tracer (#920)
stefanhellander May 12, 2025
d653ef7
deps: update protobuf requirement (#845)
dependabot[bot] May 12, 2025
da18b33
bump fedn
Wrede May 12, 2025
ad8a23c
Docs/SK-1613 | Updating Split Learning Example Docs (#921)
FrankJonasmoelle May 16, 2025
244fee3
Fix/SK-1615 | Update load-test to take float factor for array size (…
Wrede May 19, 2025
1414584
deps: bump flask from 3.1.0 to 3.1.1 in the pip group (#923)
dependabot[bot] May 19, 2025
a5bf1d8
Bug/SK-1628 | sender_id index should be sender.client_id #927
niklastheman May 20, 2025
9d4933b
bump fedn
niklastheman May 20, 2025
f16a5a0
Feature/SK-1586 | Switch from MinIO to Boto3 for storage management (…
Wrede May 21, 2025
3a83959
Updates for handling upto 1000 clients
Apr 24, 2025
6eaea2f
Updates to prolong keep alive timeout in conjunction with benjamins u…
Apr 29, 2025
69a9fe0
Update to gunicorn_app.py for worker timeout increased and for a limi…
May 19, 2025
b3e7243
removed old flags
May 21, 2025
54c9324
Update to how combiners write to round database, in order to avoid co…
May 30, 2025
b31ddef
MongoDBRoundStore had no attribute update_one, so testing with self.s…
May 30, 2025
77c52eb
Added method to roundstore, to be able to update the entries on Round…
Jun 9, 2025
c2d46b7
Fixing typo in update_one method for MongoDBstore
Jun 10, 2025
97ab421
Handling combinerunavailable error, in order not to break the thread,…
Jun 11, 2025
e2bb30d
Updated round logic to use session timeout, instead of waiting indefi…
Jun 11, 2025
85a9713
added a if condition if aggregator had not been configured in roundha…
Jun 11, 2025
1fb8361
disabled otel logging, to remove transient error relating to hoenycom…
Jun 13, 2025
9b726c5
Every combiner needs to report their results before moving on to next…
Jun 16, 2025
f7e0f79
Added logging for identifying slow parts of the script, and added rob…
Jun 17, 2025
3749a66
formatting logs
Jun 17, 2025
3ec4f8d
try except on list_active_clients, if client id would be None, the ro…
Jun 17, 2025
2e9341a
Update to _list_active_clients, skipping if None
Jun 17, 2025
615eb24
Added a slight buffer to the timeout to allow combiners to respond sl…
Jun 19, 2025
a45cfad
Skipping None clients in combiner and using Victors update on the ses…
Jun 19, 2025
9c46146
Restored to use check_combiners_done_reporting
Jun 19, 2025
18bdf93
Optional argument client_ids in _get_number_of_available_clients
Jun 19, 2025
4cc9c44
Removing skip clients in _list_active_clients
Jun 22, 2025
28e402d
Added max_concurrent_streams to the grpc server 10000, and updated th…
Jul 3, 2025
736620f
Adding robustness to model loading and log time taken
Jul 8, 2025
c43d779
Shorten the retry timeout and retry staging only once, 3x2 seconds wi…
Jul 8, 2025
65f65ab
Shorten the retry delay in load_model_update_bytesIO and retry only o…
Jul 8, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 43 additions & 1 deletion .ci/tests/examples/api_test.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import fire
import yaml
from server_functions import ServerFunctions

from fedn import APIClient

Expand Down Expand Up @@ -32,6 +33,15 @@ def test_api_get_methods():
assert clients_count
print("Clients count: ", clients_count, flush=True)

client_id = clients["result"][0]["client_id"]
client_obj = client.get_client(client_id)
assert client_obj
assert client_id == client_obj["client_id"]
print("Client: ", client_obj, flush=True)

assert clients_count == len(clients["result"])
assert clients_count == clients["count"]

# --- Combiners --- #

combiners = client.get_combiners()
Expand All @@ -46,6 +56,9 @@ def test_api_get_methods():
assert combiner
print("Combiner: ", combiner, flush=True)

assert combiners_count == len(combiners["result"])
assert combiners_count == combiners["count"]

# --- Controllers --- #

status = client.get_controller_status()
Expand All @@ -64,12 +77,16 @@ def test_api_get_methods():

models_from_trail = client.get_model_trail()
assert models_from_trail
print("Models: ", models_from_trail, flush=True)
assert len(models_from_trail) == models_count
print("Models (model trail): ", models_from_trail, flush=True)

active_model = client.get_active_model()
assert active_model
print("Active model: ", active_model, flush=True)

assert models_count == len(models["result"])
assert models_count == models["count"]

# --- Packages --- #

packages = client.get_packages()
Expand All @@ -88,6 +105,9 @@ def test_api_get_methods():
assert checksum
print("Checksum: ", checksum, flush=True)

assert packages_count == len(packages["result"])
assert packages_count == packages["count"]

# --- Rounds --- #

rounds = client.get_rounds()
Expand All @@ -98,6 +118,9 @@ def test_api_get_methods():
assert rounds_count
print("Rounds count: ", rounds_count, flush=True)

assert rounds_count == len(rounds["result"])
assert rounds_count == rounds["count"]

# --- Sessions --- #

sessions = client.get_sessions()
Expand All @@ -108,6 +131,14 @@ def test_api_get_methods():
assert sessions_count
print("Sessions count: ", sessions_count, flush=True)

session = client.get_session(id=sessions["result"][0]["session_id"])
assert session
assert session["session_id"] == sessions["result"][0]["session_id"]
print("Session: ", session, flush=True)

assert sessions_count == len(sessions["result"])
assert sessions_count == sessions["count"]

# --- Statuses --- #

statuses = client.get_statuses()
Expand All @@ -118,6 +149,9 @@ def test_api_get_methods():
assert statuses_count
print("Statuses count: ", statuses_count, flush=True)

assert statuses_count == len(statuses["result"])
assert statuses_count == statuses["count"]

# --- Validations --- #

validations = client.get_validations()
Expand All @@ -128,6 +162,13 @@ def test_api_get_methods():
assert validations_count
print("Validations count: ", validations_count, flush=True)

assert validations_count == len(validations["result"])
assert validations_count == validations["count"]


def start_sf_session(name, rounds, helper):
client = APIClient(host="localhost", port=8092)
client.start_session(name=name, rounds=rounds, helper=helper, server_functions=ServerFunctions)

if __name__ == '__main__':

Expand All @@ -136,6 +177,7 @@ def test_api_get_methods():
'set_seed': client.set_active_model,
'set_package': client.set_active_package,
'start_session': client.start_session,
'start_sf_session': start_sf_session,
'get_client_config': _download_config,
'test_api_get_methods': test_api_get_methods,
})
13 changes: 11 additions & 2 deletions .ci/tests/examples/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@ else
up -d --build combiner api-server mongo minio client1
fi

# add server functions to python path to import server functions code
export PYTHONPATH="$PYTHONPATH:../server-functions"

>&2 echo "Wait for reducer to start"
python ../../.ci/tests/examples/wait_for.py reducer

Expand All @@ -46,8 +49,14 @@ python ../../.ci/tests/examples/wait_for.py clients
>&2 echo "Upload seed"
python ../../.ci/tests/examples/api_test.py set_seed --path seed.npz

>&2 echo "Start session"
python ../../.ci/tests/examples/api_test.py start_session --name "session" --rounds 3 --helper "$helper"
if [ "$example" == "server-functions" ]; then
>&2 echo "Start serverfunctions session"
python ../../.ci/tests/examples/api_test.py start_sf_session --name "session" --rounds 3 --helper "$helper"
else
>&2 echo "Start session"
python ../../.ci/tests/examples/api_test.py start_session --name "session" --rounds 3 --helper "$helper"
fi


>&2 echo "Checking rounds success"
python ../../.ci/tests/examples/wait_for.py rounds
Expand Down
11 changes: 9 additions & 2 deletions .ci/tests/studio/studio.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ fi
fedn studio login -u $STUDIO_USER -P $STUDIO_PASSWORD -H $STUDIO_HOST
fedn project create -n citest -H $STUDIO_HOST --no-interactive
sleep 5
FEDN_PROJECT=$(fedn project list -H $STUDIO_HOST | awk 'NR>=1 {print $1; exit}')
FEDN_PROJECT=$(fedn project list -H $STUDIO_HOST --no-header | awk 'NR>=1 {print $3; exit}')
fedn project set-context -id $FEDN_PROJECT -H $STUDIO_HOST
pushd examples/$FEDN_EXAMPLE
fedn client get-config -n test -g $FEDN_NR_CLIENTS -H $STUDIO_HOST
Expand All @@ -43,4 +43,11 @@ for i in $(seq 0 $(($FEDN_NR_CLIENTS - 1))); do
done
popd
sleep 5
pytest .ci/tests/studio/tests.py
# add server functions so we can import it in start_session
export PYTHONPATH="$PYTHONPATH:$(pwd)/examples/server-functions"
pytest .ci/tests/studio/tests.py -x
sleep 5
# run with server functions
export FEDN_SERVER_FUNCTIONS="1"
export SESSION_NUMBER="2"
pytest .ci/tests/studio/tests.py -x
29 changes: 18 additions & 11 deletions .ci/tests/studio/tests.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
import os
import os, sys
import time
import pytest
from fedn import APIClient
from fedn.cli.shared import get_token, get_project_url
from server_functions import ServerFunctions
from fedn.common.log_config import logger

@pytest.fixture(scope="module")
def fedn_client():
token = get_token(token=None, usr_token=False)
host = get_project_url("", "", None, False)
print(f"Connecting to {host}")
client = APIClient(host=host, token=token, secure=True, verify=True)
return client

Expand All @@ -19,11 +20,13 @@ def fedn_env():
"FEDN_ROUND_TIMEOUT": int(os.environ.get("FEDN_ROUND_TIMEOUT", 180)),
"FEDN_BUFFER_SIZE": int(os.environ.get("FEDN_BUFFER_SIZE", -1)),
"FEDN_NR_CLIENTS": int(os.environ.get("FEDN_NR_CLIENTS", 2)),
"FEDN_CLIENT_TIMEOUT": int(os.environ.get("FEDN_CLIENT_TIMEOUT", 60)),
"FEDN_CLIENT_TIMEOUT": int(os.environ.get("FEDN_CLIENT_TIMEOUT", 600)),
"FEDN_FL_ALG": os.environ.get("FEDN_FL_ALG", "fedavg"),
"FEDN_NR_EXPECTED_AGG": int(os.environ.get("FEDN_NR_EXPECTED_AGG", 2)), # Number of expected aggregated models per combiner
"FEDN_SESSION_TIMEOUT": int(os.environ.get("FEDN_SESSION_TIMEOUT", 300)), # Session timeout in seconds, all rounds must be finished within this time
"FEDN_SESSION_NAME": os.environ.get("FEDN_SESSION_NAME", "test")
"FEDN_SESSION_NAME": os.environ.get("FEDN_SESSION_NAME", "test"),
"FEDN_SERVER_FUNCTIONS": os.environ.get("FEDN_SERVER_FUNCTIONS", 0),
"SESSION_NUMBER": os.environ.get("SESSION_NUMBER", 1)
}

@pytest.mark.order(1)
Expand All @@ -49,14 +52,16 @@ def test_start_session(self, fedn_client, fedn_env):
rounds=fedn_env["FEDN_NR_ROUNDS"],
round_buffer_size=fedn_env["FEDN_BUFFER_SIZE"],
min_clients=fedn_env["FEDN_NR_CLIENTS"],
requested_clients=fedn_env["FEDN_NR_CLIENTS"]
requested_clients=fedn_env["FEDN_NR_CLIENTS"],
server_functions=ServerFunctions if fedn_env["FEDN_SERVER_FUNCTIONS"] else None
)
assert result["message"] == "Session started", f"Expected status 'Session started', got {result['message']}"

@pytest.mark.order(3)
def test_session_completion(self, fedn_client, fedn_env):
session_obj = fedn_client.get_sessions()
assert session_obj["count"] == 1, f"Expected 1 session, got {session_obj['count']}"
session_number = int(fedn_env["SESSION_NUMBER"])
assert session_obj["count"] == session_number, f"Expected {session_number} session/s, got {session_obj['count']}"
session_result = session_obj["result"][0]

start_time = time.time()
Expand All @@ -77,13 +82,14 @@ def test_session_completion(self, fedn_client, fedn_env):
@pytest.mark.order(4)
def test_rounds_completion(self, fedn_client, fedn_env):
start_time = time.time()
session_number = int(fedn_env["SESSION_NUMBER"])
while time.time() - start_time < fedn_env["FEDN_SESSION_TIMEOUT"]:
rounds_obj = fedn_client.get_rounds()
if rounds_obj["count"] == fedn_env["FEDN_NR_ROUNDS"]:
if rounds_obj["count"] == session_number * fedn_env["FEDN_NR_ROUNDS"]:
break
time.sleep(5)
else:
raise TimeoutError(f"Expected {fedn_env['FEDN_NR_ROUNDS']} rounds, but got {rounds_obj['count']} within {fedn_env['FEDN_SESSION_TIMEOUT']} seconds")
raise TimeoutError(f"Expected {session_number * fedn_env['FEDN_NR_ROUNDS']} rounds, but got {rounds_obj['count']} within {fedn_env['FEDN_SESSION_TIMEOUT']} seconds")

rounds_result = rounds_obj["result"]
for round in rounds_result:
Expand All @@ -96,14 +102,15 @@ def test_rounds_completion(self, fedn_client, fedn_env):
@pytest.mark.order(5)
def test_validations(self, fedn_client, fedn_env):
start_time = time.time()
session_number = int(fedn_env["SESSION_NUMBER"])
while time.time() - start_time < fedn_env["FEDN_SESSION_TIMEOUT"]:
validation_obj = fedn_client.get_validations()
if validation_obj["count"] == fedn_env["FEDN_NR_ROUNDS"] * fedn_env["FEDN_NR_CLIENTS"]:
if validation_obj["count"] == session_number * fedn_env["FEDN_NR_ROUNDS"] * fedn_env["FEDN_NR_CLIENTS"]:
break
time.sleep(5)
else:
raise TimeoutError(f"Expected {fedn_env['FEDN_NR_ROUNDS'] * fedn_env['FEDN_NR_CLIENTS']} validations, but got {validation_obj['count']} within {fedn_env['FEDN_SESSION_TIMEOUT']} seconds")
raise TimeoutError(f"Expected {session_number * fedn_env['FEDN_NR_ROUNDS'] * fedn_env['FEDN_NR_CLIENTS']} validations, but got {validation_obj['count']} within {fedn_env['FEDN_SESSION_TIMEOUT']} seconds")

# We could assert or test model convergence here

print("All tests passed!", flush=True)
logger.info("All tests passed!")
1 change: 1 addition & 0 deletions .github/workflows/integration-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ jobs:
to_test:
- "mnist-keras numpyhelper"
- "mnist-pytorch numpyhelper"
- "server-functions numpyhelper"
python_version: ["3.9", "3.10", "3.11", "3.12"]
os:
- ubuntu-24.04
Expand Down
9 changes: 1 addition & 8 deletions .github/workflows/test-api-endpoints.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install .

- name: Start mongo and minio
run: |
docker compose up -d mongo minio

- name: Start FEDn controller
run: |
fedn controller start &

- name: Run tests
run: |
python3 -m unittest fedn.network.api.tests
4 changes: 2 additions & 2 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"type": "python",
"request": "launch",
"module": "pytest",
"justMyCode": true
"justMyCode": false
},
{
"args": [
Expand All @@ -17,7 +17,7 @@
"type": "python",
"request": "launch",
"module": "pytest",
"justMyCode": true
"justMyCode": false
},
{
"name": "Run current file",
Expand Down
Loading
Loading