Skip to content

Commit b7174b2

Browse files
Merge branch 'lp/ray-distributed' into main
2 parents 46892c4 + 48b43f1 commit b7174b2

File tree

2 files changed

+66
-0
lines changed

2 files changed

+66
-0
lines changed

docs/RayClusterSetup.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Ray Cluster Setup
2+
3+
## Manual Ray Cluster
4+
5+
Especially for a manual cluster we need to consider a few basic principles:
6+
7+
- All machines need to have **the same** basic setup with the package pre-installed, and all dependencies present
8+
9+
We can then initialize the headnode
10+
11+
```bash
12+
ray start --head
13+
```
14+
15+
and subsequently attach all the worker nodes
16+
17+
```bash
18+
ray start --address=IP_OF_THE_HEAD_NODE
19+
```
20+
21+
> If a fast interconnect is available e.g. higher-bandwidth ethernet, Omnipath, Infiniband, etc. consider using the custom IPs of that interconnect to force the usage of the faster connections.
22+
23+
With the cluster now initialized we can use it in any of our scripts With
24+
25+
```python
26+
import ray
27+
ray.init(address=IP_OF_THE_HEAD_NODE)
28+
```
29+
30+
31+
## Ray Cluster on Kubernetes
32+
33+
Ray can easily be installed on Kubernetes with [Helm](https://helm.sh) using the Ray Kubernetes Operator, the detailed documentation for which can be found in [Ray's documentation](https://docs.ray.io/en/latest/cluster/kubernetes.html).
34+
35+
> On clouds like AWS, Azure, and GCP this is entirely straightforward and automated away.
36+
37+
## Ray Cluster on Slurm
38+
39+
Ray on Slurm is currently still experimental in its support, but the progress and current approach can be viewed in [Ray's documentation](https://docs.ray.io/en/latest/cluster/slurm.html). The biggest difference here coming from the different way Ray and Slurm treat
40+
41+
1. Ports binding
42+
2. IP binding
43+
44+
But there exist example [launch script](https://docs.ray.io/en/latest/cluster/examples/slurm-launch.html#slurm-launch), and [templated launch scripts](https://docs.ray.io/en/latest/cluster/examples/slurm-template.html#slurm-template).
45+

hydrogym/core.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,27 @@ def render(self, **kwargs):
196196
raise NotImplementedError
197197

198198

199+
200+
@ray.remote
201+
class EvaluationActor:
202+
"""To remotely evaluate Firedrake solutions."""
203+
204+
def __init__(self, firedrake_instance: "Firedrake_instance", index: int, seeds: Union[ActorSeeds, tuple], state:dict):
205+
"""
206+
Initialize a remote runner for a Firedrake problem instance.
207+
208+
Args:
209+
firedrake_instance: The Firedrake problem instance to be run.
210+
index: The index of the actor in question.
211+
seed: An integer to be used as the seed.
212+
state: The state dictionary to be loaded.
213+
"""
214+
215+
# Add whole class and the two following to have the evaluation logic.
216+
# Pipe Firedrake problem into Evotorch's harness, then simmer down the logic needed to have this
217+
# code here running.
218+
219+
199220
class CallbackBase:
200221
def __init__(self, interval: int = 1):
201222
"""

0 commit comments

Comments
 (0)