Simulator experiment runner #102

1ntEgr8 · 2025-05-28T14:51:15Z

Helper script to spawn simulator experiments.

It uses a YAML config file format to specify matrices of experiments.

Document YAML format in the README
Include an example config file

In a future PR, I will patch in support for running partial matrices. It will include a utility to split an experiment run into several subcommands that can be run on separate machines.

…rvice

…ling-simulator into tpch-loader

…s.path hack

…t public

…it in tpch_loader

We found that deadlines for task graphs weren't consistent between the simulator and Spark even with the same RNG seed being used, due to the fact that EventTime keeps a global RNG it uses for all of its fuzzing and both deadlines and runtime variances are fuzzed. Since in simulator runs, task deadlines are all calculated at the start and runtime variances are calculated later, and in Spark, task deadlines and runtime variances are calculated throughout the experiment lifecycle, different deadline variances are obtained between simulator and Spark runs on the same experiment. Our solution is to pass a unique RNG used just for calculating deadline variances to the fuzzer. This RNG is hardcoded with a seed of 42; this is fine for experiments but it should probably be changed to the random_seed command line flag.

Originally, the service named job graphs in the form Q<query>[<spark-app-id>], where Spark sets the app id to app-<timestamp>-<index>, while TPC-H data loader named job graphs in the form Q<query>[<index>]. This commit changes how the service names job graphs by passing the index as an argument to the TpchQuery Spark application, which will then be forwarded into the Servicer through RegisterTaskGraph as a part of the query name. RegisterTaskGraph then uses the index to name the job graph. This ensures that the job graph names are always the same between a Spark run and a simulator run, irrespective of when the task graphs are actually released during a Spark run (which can be nondeterministic). The intent is to use these names to generate deadlines for the task graphs, so that deadlines are always consistent between Spark and simulator runs. This change requires a corresponding change to tpch-spark to forward the index to the Servicer.

To avoid needless reruns, TetriSchedScheduler does not run the scheduler if there are no tasks which are not scheduled, not part of a task graph that has been previously considered, and not part of a task graph that has been cancelled. We remove this second condition to account for situations in which a task graph is considered and its tasks scheduled, but the tasks failed to be placed (for instance, if another task on the same worker finished late, taking up resources). In such cases, the task graph would not be cancelled and might still be able to be completed, so we need to run the scheduler again to try to schedule the tasks that could not be placed before.

This patch implements a data loader for TPC-H workloads. The loader is modeled after `data/alibaba_loader.py`.

Adjustments to metric calculation

* Implement TPC-H data loader * Bug fix: convert job graph to task graph * Make loop_timeout configurable * Make profile path configurable * release time handling on workload * Wrap up tpch loader implementation * scale runtime based on max number of tasks * fix bug in runtime calc * rename optimization_passes flag to opt_passes * add cloudlab support, fix runtime rounding bug, make rng gen match service * restore tpch_utils to main version * split tpch_replay config files * remove opt_passes flag * update tpch_utils.py * setup new service.py * update rpc proto dir hierarchy to resolve module import issue with sys.path hack * checkout dhruv's version of service.py * make workload_loader optional, make step and get_time_until_next_event public * implement register/deregister framework * refactor sim time calculation * implement RegisterWorker * factor out __get_worker_pool * refactor tpch loader * implement register task graph * add testing for service * implement register environment ready * init impl for get placements, readme with spark-erdos setup * WIP: service changes to handle first tpch taskgraph * Fix the tick() function to dequeue all events upto n, pass runtime_unit in tpch_loader * refactor runtime unit setting * reomve microsecond granularity for now, will add support later * rename sim_time to stime to make it less confusing * refactor tick * update rpc service to invoke new tick, update test * add comment explaining scheduler runtime zero setting * oops forget to add comment on update workload * refactor naming in tpch loader * fix documentation error in workload/tasks.py * add support for returning current task graph placements from simulator * fix logging * construct and return placements response * do not return placements if task graph is complete * fix placement time bug in simulator * implement orchestrated mode * remove redundant checks * enqueue scheduler start event in register task graph * bug fixes * add a lock, haven't checked all places * Add tests for notify task completion * add cancelled field to proto * populate cancelled field in rpc response * Updates to spark erdos service documentation * implement register driver * quick fixes in regsister and de-register driver * Updates to spark erdos documentation * register driver bug fix * Add override_worker_cpu_count flag * More documentation: tpch-spark and spark-mirror related * add delay in test_service for register env ready * doc update for tpch spark * Add test to invoke getPlacements before task registration * create task graph after environment is ready * update test * Separate out internal state mgmt for driver and application * re-add impl for override_worker_cpu_count * fix: correct enqueue of task_finished and sched_start in notifyTaskCompletion * enable task cancellations to be sent back to backend * update test script to include cancel_task scenario (needs to be sped up) * allow service to use different schedulers based on args * format service * add enforce deadlines flag * step workers during a tick * update placement time only if it is in the past * refactor file * document refactored simulate_f * handle case where task graph cannot be accommodated in worker pool * Potpourri of bug fixes and improvements - Undo stepping of worker every tick; it lead to some errors pertaining to task eviction. Punt investigation for later - Fix bug associated with mapping task ids to spark stage ids - Add a flag to control how the minimum amount by which a placement can be pushed into the future * Add log stats method to simulator * check for worker registration * update documentation in service * updating the service to operate in US instead of S time unit * misc improvements * override flags in service * fix workload release bug * update docs about sched time discretization * added comments about setting deadlines in generate_task_graph * updates to test script (verified across edf, fifo, dsched) * update documentation for the service * correctly handle task cancellations * simulator bug fixes and improvements - update placement time of pushed placement - dedup scheduler events with the same timestamp * add launch script * add shutdown rpc method * add service experiment runner * sleep for a bit before launching queries * remove extraneous sleep * set start_time of task to its placement time * add testcase to verify new taskgraph termination approach * nit documentation in erdos-spark setup * more explainable msgs from GetPlacements * reorder event queue priority to process scheduler events before task placement * improvements to experiment runner * fix profile path in tpch loader * add spark-master-ip flag * reinstate previous eventQueue priority order (task_placement before scheduer) * [simulator] Unschedule subtree rooted at task if task is unable to run at timestep * [service] log line to track tasks that get delayed in execution * sleep for some time before signalling shutdown * hack analyze pipeline to work with tpch output * [simulator] check task state before invoking unschedule on it * add support for tpch query partitioning * run_service_experiments: Log service stdout/stderr * run_service_experiments.py: Fix --dry_run * run_service_experiments: Timestamp results folder * run_service_experiments: Fix hang on exception Fixes an issue where if start-master.sh or start-worker.sh exits with a nonzero code, or more generally if an exception happens in Service.__enter__(), run_service_experiments.py hangs and doesn't report the exception. * Spark service: Correctly log stats on shutdown When the last application is deregistered from the spark service, execute all remaining events from the simulator. This allows the final LOG_STATS event to be processed so we can calculate the SLO attainment. Unlike normal runs of the simulator, a SIMULATOR_END event is not inserted as some tasks might not have finished in the simulator and it's unclear when they will finish. The simulator is patched to allow an empty event queue in Simulator.simulate(). * Set correct completion time for finishing Tasks On a TASK_FINISH event, set the task completion time to the time of the event rather than the last time the task was stepped. Resolves a bug in the service where tasks that finish later than the simulator's profiled runtime predicts get assigned the wrong completion time. * remove extraneous print * fix non-determinism in deadlines * Revert "fix non-determinism in deadlines" This reverts commit a22e406. * Ensure consistent deadlines between simulator and Spark (hack) We found that deadlines for task graphs weren't consistent between the simulator and Spark even with the same RNG seed being used, due to the fact that EventTime keeps a global RNG it uses for all of its fuzzing and both deadlines and runtime variances are fuzzed. Since in simulator runs, task deadlines are all calculated at the start and runtime variances are calculated later, and in Spark, task deadlines and runtime variances are calculated throughout the experiment lifecycle, different deadline variances are obtained between simulator and Spark runs on the same experiment. Our solution is to pass a unique RNG used just for calculating deadline variances to the fuzzer. This RNG is hardcoded with a seed of 42; this is fine for experiments but it should probably be changed to the random_seed command line flag. * run_service_experiments: print args to service and launcher * Fix some minor errors with deadline fuzzer * Spark service: Make job graph names the same as in TpchLoader Originally, the service named job graphs in the form Q<query>[<spark-app-id>], where Spark sets the app id to app-<timestamp>-<index>, while TPC-H data loader named job graphs in the form Q<query>[<index>]. This commit changes how the service names job graphs by passing the index as an argument to the TpchQuery Spark application, which will then be forwarded into the Servicer through RegisterTaskGraph as a part of the query name. RegisterTaskGraph then uses the index to name the job graph. This ensures that the job graph names are always the same between a Spark run and a simulator run, irrespective of when the task graphs are actually released during a Spark run (which can be nondeterministic). The intent is to use these names to generate deadlines for the task graphs, so that deadlines are always consistent between Spark and simulator runs. This change requires a corresponding change to tpch-spark to forward the index to the Servicer. * Deterministic task graph deadlines based on task graph names * TetriSchedScheduler: Reconsider tasks that couldn't be placed To avoid needless reruns, TetriSchedScheduler does not run the scheduler if there are no tasks which are not scheduled, not part of a task graph that has been previously considered, and not part of a task graph that has been cancelled. We remove this second condition to account for situations in which a task graph is considered and its tasks scheduled, but the tasks failed to be placed (for instance, if another task on the same worker finished late, taking up resources). In such cases, the task graph would not be cancelled and might still be able to be completed, so we need to run the scheduler again to try to schedule the tasks that could not be placed before. * Get rid of old service * address comments --------- Co-authored-by: Dhruv Garg <dhruvsgarg@hotmail.com> Co-authored-by: Rohan Bafna <130247393+rohanbafna@users.noreply.github.com>

* Implement TPC-H data loader * Bug fix: convert job graph to task graph * Make loop_timeout configurable * Make profile path configurable * release time handling on workload * Wrap up tpch loader implementation * scale runtime based on max number of tasks * fix bug in runtime calc * rename optimization_passes flag to opt_passes * add cloudlab support, fix runtime rounding bug, make rng gen match service * restore tpch_utils to main version * split tpch_replay config files * remove opt_passes flag * update tpch_utils.py * setup new service.py * update rpc proto dir hierarchy to resolve module import issue with sys.path hack * checkout dhruv's version of service.py * make workload_loader optional, make step and get_time_until_next_event public * implement register/deregister framework * refactor sim time calculation * implement RegisterWorker * factor out __get_worker_pool * refactor tpch loader * implement register task graph * add testing for service * implement register environment ready * init impl for get placements, readme with spark-erdos setup * WIP: service changes to handle first tpch taskgraph * Fix the tick() function to dequeue all events upto n, pass runtime_unit in tpch_loader * refactor runtime unit setting * reomve microsecond granularity for now, will add support later * rename sim_time to stime to make it less confusing * refactor tick * update rpc service to invoke new tick, update test * add comment explaining scheduler runtime zero setting * oops forget to add comment on update workload * refactor naming in tpch loader * fix documentation error in workload/tasks.py * add support for returning current task graph placements from simulator * fix logging * construct and return placements response * do not return placements if task graph is complete * fix placement time bug in simulator * implement orchestrated mode * remove redundant checks * enqueue scheduler start event in register task graph * bug fixes * add a lock, haven't checked all places * Add tests for notify task completion * add cancelled field to proto * populate cancelled field in rpc response * Updates to spark erdos service documentation * implement register driver * quick fixes in regsister and de-register driver * Updates to spark erdos documentation * register driver bug fix * Add override_worker_cpu_count flag * More documentation: tpch-spark and spark-mirror related * add delay in test_service for register env ready * doc update for tpch spark * Add test to invoke getPlacements before task registration * create task graph after environment is ready * update test * Separate out internal state mgmt for driver and application * re-add impl for override_worker_cpu_count * fix: correct enqueue of task_finished and sched_start in notifyTaskCompletion * enable task cancellations to be sent back to backend * update test script to include cancel_task scenario (needs to be sped up) * allow service to use different schedulers based on args * format service * add enforce deadlines flag * step workers during a tick * update placement time only if it is in the past * refactor file * document refactored simulate_f * handle case where task graph cannot be accommodated in worker pool * Potpourri of bug fixes and improvements - Undo stepping of worker every tick; it lead to some errors pertaining to task eviction. Punt investigation for later - Fix bug associated with mapping task ids to spark stage ids - Add a flag to control how the minimum amount by which a placement can be pushed into the future * Add log stats method to simulator * check for worker registration * update documentation in service * updating the service to operate in US instead of S time unit * misc improvements * override flags in service * fix workload release bug * update docs about sched time discretization * added comments about setting deadlines in generate_task_graph * updates to test script (verified across edf, fifo, dsched) * update documentation for the service * correctly handle task cancellations * simulator bug fixes and improvements - update placement time of pushed placement - dedup scheduler events with the same timestamp * add launch script * add shutdown rpc method * add service experiment runner * sleep for a bit before launching queries * remove extraneous sleep * set start_time of task to its placement time * add testcase to verify new taskgraph termination approach * nit documentation in erdos-spark setup * more explainable msgs from GetPlacements * reorder event queue priority to process scheduler events before task placement * improvements to experiment runner * fix profile path in tpch loader * add spark-master-ip flag * reinstate previous eventQueue priority order (task_placement before scheduer) * [simulator] Unschedule subtree rooted at task if task is unable to run at timestep * [service] log line to track tasks that get delayed in execution * sleep for some time before signalling shutdown * hack analyze pipeline to work with tpch output * [simulator] check task state before invoking unschedule on it * add support for tpch query partitioning * run_service_experiments: Log service stdout/stderr * run_service_experiments.py: Fix --dry_run * run_service_experiments: Timestamp results folder * run_service_experiments: Fix hang on exception Fixes an issue where if start-master.sh or start-worker.sh exits with a nonzero code, or more generally if an exception happens in Service.__enter__(), run_service_experiments.py hangs and doesn't report the exception. * Spark service: Correctly log stats on shutdown When the last application is deregistered from the spark service, execute all remaining events from the simulator. This allows the final LOG_STATS event to be processed so we can calculate the SLO attainment. Unlike normal runs of the simulator, a SIMULATOR_END event is not inserted as some tasks might not have finished in the simulator and it's unclear when they will finish. The simulator is patched to allow an empty event queue in Simulator.simulate(). * Set correct completion time for finishing Tasks On a TASK_FINISH event, set the task completion time to the time of the event rather than the last time the task was stepped. Resolves a bug in the service where tasks that finish later than the simulator's profiled runtime predicts get assigned the wrong completion time. * remove extraneous print * fix non-determinism in deadlines * Revert "fix non-determinism in deadlines" This reverts commit a22e406. * Ensure consistent deadlines between simulator and Spark (hack) We found that deadlines for task graphs weren't consistent between the simulator and Spark even with the same RNG seed being used, due to the fact that EventTime keeps a global RNG it uses for all of its fuzzing and both deadlines and runtime variances are fuzzed. Since in simulator runs, task deadlines are all calculated at the start and runtime variances are calculated later, and in Spark, task deadlines and runtime variances are calculated throughout the experiment lifecycle, different deadline variances are obtained between simulator and Spark runs on the same experiment. Our solution is to pass a unique RNG used just for calculating deadline variances to the fuzzer. This RNG is hardcoded with a seed of 42; this is fine for experiments but it should probably be changed to the random_seed command line flag. * run_service_experiments: print args to service and launcher * Fix some minor errors with deadline fuzzer * Spark service: Make job graph names the same as in TpchLoader Originally, the service named job graphs in the form Q<query>[<spark-app-id>], where Spark sets the app id to app-<timestamp>-<index>, while TPC-H data loader named job graphs in the form Q<query>[<index>]. This commit changes how the service names job graphs by passing the index as an argument to the TpchQuery Spark application, which will then be forwarded into the Servicer through RegisterTaskGraph as a part of the query name. RegisterTaskGraph then uses the index to name the job graph. This ensures that the job graph names are always the same between a Spark run and a simulator run, irrespective of when the task graphs are actually released during a Spark run (which can be nondeterministic). The intent is to use these names to generate deadlines for the task graphs, so that deadlines are always consistent between Spark and simulator runs. This change requires a corresponding change to tpch-spark to forward the index to the Servicer. * Deterministic task graph deadlines based on task graph names * TetriSchedScheduler: Reconsider tasks that couldn't be placed To avoid needless reruns, TetriSchedScheduler does not run the scheduler if there are no tasks which are not scheduled, not part of a task graph that has been previously considered, and not part of a task graph that has been cancelled. We remove this second condition to account for situations in which a task graph is considered and its tasks scheduled, but the tasks failed to be placed (for instance, if another task on the same worker finished late, taking up resources). In such cases, the task graph would not be cancelled and might still be able to be completed, so we need to run the scheduler again to try to schedule the tasks that could not be placed before. * Get rid of old service * Add TPCHSpark as a submodule * Move tpch-spark submodule to rpc/ --------- Co-authored-by: Elton Leander Pinto <eltonp3103@gmail.com> Co-authored-by: Dhruv Garg <dhruvsgarg@hotmail.com>

Setup for config searches with Ray Tune

* Update tpch-spark to include changes from pass-deadlines branch * Read workload spec json in launch_tpch_queries.py Changes the interface of launch_tpch_queries.py to accept a JSON file that contains the specification of the workload to launch. This avoids complications with simulator/service parity by making the workload fully deterministic. * specify deadline when creating task graph * Create script to generate a workload spec * modify loader to read from spec file * remove comment about metadata field * Spark service: accept deadlines from TPC-H query info * Update ray search script to use workload specs * bug fixes * add release time to deadline * corrected setup for config search 3/5 * fix dumb bug in ray search script * more dumb search script fixes also only apply the dsched slo penalty calculation in the metric if dsched's slo < 80% * Fix bug in service when deadline is set The deadline needs to be converted to an EventTime; otherwise the call to get_next_task_graph fails and the task graph is never created. * Spark service experiment setup for 3/06 * Experiment setup for 03/08 E/M/H mixture; extra datapoints for DSched and EDF * make tpch_plot work again * update ray search * add graphene config * tpch plot graphene + tetrisched baselines --------- Co-authored-by: Rohan Bafna <130247393+rohanbafna@users.noreply.github.com>

1ntEgr8 and others added 30 commits September 23, 2024 08:43

Implement TPC-H data loader

956afcd

Bug fix: convert job graph to task graph

a25fbe8

Make loop_timeout configurable

4d06a95

Make profile path configurable

c756d17

release time handling on workload

9f6aa8b

Wrap up tpch loader implementation

be66704

scale runtime based on max number of tasks

a4d0ded

fix bug in runtime calc

8111ba6

rename optimization_passes flag to opt_passes

e172b56

add cloudlab support, fix runtime rounding bug, make rng gen match se…

90e696c

…rvice

restore tpch_utils to main version

06cf4f7

split tpch_replay config files

fcb0180

remove opt_passes flag

9014090

Merge branch 'opt-passes-flag-fix' of github.com:1ntEgr8/erdos-schedu…

fbce571

…ling-simulator into tpch-loader

update tpch_utils.py

86420f3

setup new service.py

2f09d5e

update rpc proto dir hierarchy to resolve module import issue with sy…

eba9a45

…s.path hack

checkout dhruv's version of service.py

e6a364a

make workload_loader optional, make step and get_time_until_next_even…

69876ef

…t public

implement register/deregister framework

d68772d

refactor sim time calculation

320cb34

implement RegisterWorker

43b8331

factor out __get_worker_pool

eccddc5

refactor tpch loader

e94b1f1

implement register task graph

ed510ab

add testing for service

81c4307

implement register environment ready

e1faa7f

init impl for get placements, readme with spark-erdos setup

0025f3c

WIP: service changes to handle first tpch taskgraph

a7f18e3

Fix the tick() function to dequeue all events upto n, pass runtime_un…

942ead7

…it in tpch_loader

rohanbafna and others added 29 commits January 19, 2025 16:39

run_service_experiments: print args to service and launcher

337de56

Fix some minor errors with deadline fuzzer

ed9f03b

Deterministic task graph deadlines based on task graph names

6a1528b

Implement TPC-H data loader

efb7aaf

This patch implements a data loader for TPC-H workloads. The loader is modeled after `data/alibaba_loader.py`.

get rid of tpch_num_queries flag

80ca730

Rewrite TPC-H Spark service

f9214fd

Get rid of old service

21b4ecf

Add TPCHSpark as a submodule

5eb3795

Ray Tune config search setup

ea59702

Configurable TPCH query partitioning

705c7c7

Ray script setup for 2/18

7392887

add tpch_plot script

ff424a9

Ray script setup for 2/20

96c9ed5

Adjustments to metric calculation

Move tpch-spark submodule to rpc/

16328a6

Merge branch 'tpch-spark-submodule' into raytune-search

3896513

Merge branch 'elton/tpch_plot' into raytune-search

feb436a

Move raysearch.py and tpch_plot.py to scripts folder

e00ea33

Merge branch 'main' into raytune-search

16de77d

Make clipped SLO calculation configurable

4495595

Merge pull request #5 from gatech-sysml/raytune-search

939c528

Setup for config searches with Ray Tune

Get rid of selective rescheduling (#11)

fcd792f

Update gitignore

ffcd7c7

Add new simulator experiment runner

8147688

1ntEgr8 closed this May 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simulator experiment runner #102

Simulator experiment runner #102

Uh oh!

1ntEgr8 commented May 28, 2025

Uh oh!

Uh oh!

Simulator experiment runner #102

Simulator experiment runner #102

Uh oh!

Conversation

1ntEgr8 commented May 28, 2025

Uh oh!

Uh oh!