Skip to content

On Offline-First Approach #779

@pandatix

Description

@pandatix

Where CM offline install 1-click ? 🐒

This issue is a discussion around having procedures and tests for Offline-First deployments. As of today, there are 3 models:

  1. Build & Run online (works fine until you eventually hit rate limiting) ;
  2. Build online, Run offline (for continuity even in case of network failure) ;
  3. Build & Run offline (air-gapped environments).

Only model 1 is currently supported.
One reason for this is the complexity of Chall-Manager internals and dependencies of various natures. Nevertheless, we are currently observing and discussing model 2 for resiliency purposes, internally, and model 3 for air-gapped environments. We need to improve in order for a technology as Chall-Manager to be operated with no Internet connection.

Dependencies

One problem is how to manage dependencies.
In the case of Chall-Manager, dependencies are required due to scenarios. There are currently two natures:

These nonetheless require an exhaustive list of required Go modules and Pulumi providers ahead of Run. It is feasible yet not as easy as Build & Run online.

A note for future work : #760 discuss improvements toward other ecosystems (Python, JS/TS, ...). These would also require procedures for each ecosystem.

SDK

The SDK is today managed as Pulumi programs served for outer use. It is used as a Go dependency.
But we shifted toward OCI-based distribution of scenarios, with improvement for reuse, "Build & Run offline", integrated within our Hauler pipelines, ...

These changes are especially the recipes and the OCI PR #625.
Moving to this OCI model reinforced the "Build & Run offline" procedure, providing assist for reproducible environments with a single Hauler archive.

NetworkPolicies

Note that the Kubernetes SDK creates NetworkPolicies for outer traffic as per RFC 1918 Section 3. This is based upon the assumption of IP ranges from the model "Build & Run online".
Configuration must be improved on this aspect for alternative models (e.g. only one range is accepted, and it lays in a private range as I'm in my air-gap environment).

Tests

Here is the million-dollar question: how to test Offline-First approach, efficiently in time, on an Online platform such as GitHub ?

I advocate it should run within our CI, but I'm unsure on how to do so.
Maybe this will be a limitation factor, and we would need CTFer.io's on-premise infrastructures, and a scheduler for remote work over the "XXX offline" models.

I would be happy to have a sponsor who accepts to host basic infrastructure for these tests, or share experience with similar work, but I don't have any who could do so for now...


Feel free to discuss or debate this root message.
It will be updated to track conception for solving this issue.

I don't think a single PR would be a good thing, as it is a big job. I'll let the future speak for itself, but I hope there will be iterations with Model 2 first, then Model 3.

Metadata

Metadata

Assignees

No one assigned

    Labels

    chall-managerRelated to chall-managerdocumentationImprovements or additions to documentationenhancementNew feature or requesthelp wantedExtra attention is neededsdkRelated to chall-manager SDK

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions