|
| 1 | +--- |
| 2 | +title: Basics of Cilk programming |
| 3 | +eleventyNavigation: |
| 4 | + order: -1 |
| 5 | +--- |
| 6 | + |
| 7 | +OpenCilk extends C and C++ with a few parallel keywords. Programmers use these keywords to allow computations in the program to be executed in parallel. The OpenCilk compiler and runtime system then efficiently execute those logically parallel computations on parallel processors. This guide overviews the basic Cilk keywords — `cilk_spawn`, `cilk_scope`, `cilk_for`, and `cilk_sync` — as well as their basic usage. |
| 8 | + |
| 9 | +To use the Cilk keywords, include the `cilk/cilk.h` header file in your source code and compile and link your program with the `-fopencilk` flag. |
| 10 | + |
| 11 | +## Spawning and synchronizing tasks |
| 12 | + |
| 13 | +The `cilk_spawn` and `cilk_scope` keywords allow programmers to spawn and synchronize parallel computations, or ***tasks***. |
| 14 | + |
| 15 | +A `cilk_spawn` can be inserted before a function call to ***spawn*** that function call, which allows that call to execute in parallel with its ***continuation***, that is, the statements after the call. |
| 16 | + |
| 17 | +The `cilk_scope` keyword defines a lexical scope that ***synchronizes*** spawned tasks. In particular, all tasks spawned within the scope must complete before the program execution leaves the scope. |
| 18 | + |
| 19 | +The following example shows how `cilk_spawn` and `cilk_scope` are used together to spawn and synchronize a parallel task. |
| 20 | + |
| 21 | +```cilkc |
| 22 | +#include <cilk/cilk.h> |
| 23 | +
|
| 24 | +int fib(int n) { |
| 25 | + if (n < 2) |
| 26 | + return n; |
| 27 | + int x, y; |
| 28 | + cilk_scope { |
| 29 | + x = cilk_spawn fib(n - 1); |
| 30 | + y = fib(n - 2); |
| 31 | + } |
| 32 | + return x + y; |
| 33 | +} |
| 34 | +``` |
| 35 | + |
| 36 | +In this example, the statement `x = cilk_spawn fib(n - 1)` spawns a task to compute `x = fib(n - 1)`. The spawn allows this task to execute in parallel with the continuation of the spawn statement, which, in this example, consists only of the statement `y = fib(n - 2)`. The `cilk_scope` ensures that the spawned computation of `x = fib(n - 1)` completes before `return x + y` reads the values of `x` and `y`. |
| 37 | + |
| 38 | +In this example, if `n` is sufficiently large, then the computations `fib(n - 1)` and `fib(n - 2)` can recursively spawn and synchronize their own subtasks. As a result, this example `fib` routine can spawn a large number of parallel tasks. OpenCilk takes advantage of these numerous parallel tasks to automatically schedule the computation efficiently on parallel computing hardware. |
| 39 | + |
| 40 | +## Parallel loops |
| 41 | + |
| 42 | +The `cilk_for` keyword provides a parallel loop construct. A `cilk_for` keyword can be used in place of ordinary `for` in C/C++ to allow all of the iterations of the loop to execute in parallel. |
| 43 | + |
| 44 | +The following example shows how `cilk_for` can be used to parallelize a SAXPY computation. |
| 45 | + |
| 46 | +```cilkc |
| 47 | +#include <cilk/cilk.h> |
| 48 | +
|
| 49 | +void saxpy(int n, float *z, const float a, const float *x, const float *y) { |
| 50 | + cilk_for (int i = 0; i < n; ++i) |
| 51 | + z[i] = a * x[i] + y[i]; |
| 52 | +} |
| 53 | +``` |
| 54 | + |
| 55 | +In this example, the `cilk_for` loop allows all `n` iterations of the loop over `i` to execute in parallel, which allows each entry `z[i]` is allowed to be computed in parallel. |
| 56 | + |
| 57 | +Cilk allows, and generally encourages, nesting of `cilk_for` loops, as in the following example. |
| 58 | + |
| 59 | +```cilkc |
| 60 | +#include <cilk/cilk.h> |
| 61 | +
|
| 62 | +void square_matmul(double *C, const double *A, const double *B, size_t n) { |
| 63 | + cilk_for (size_t i = 0; i < n; ++i) { |
| 64 | + cilk_for (size_t j = 0; j < n; ++j) { |
| 65 | + double sum = 0.0; |
| 66 | + for (size_t k = 0; k < n; ++k) |
| 67 | + sum += A[i * n + k] * B[k * n + j]; |
| 68 | + C[i * n + j] = sum; |
| 69 | + } |
| 70 | + } |
| 71 | +} |
| 72 | +``` |
| 73 | + |
| 74 | +The outer `cilk_for` loop allows the `n` iterations over `i` to execute in parallel. For each outer-loop-iteration `i`, the inner `cilk_for` loop allows the `n` iterations over `j` to execute in parallel. Together, these two nested `cilk_for` loops allow all `n`$^2$ entries `C[i * n + j]` to be computed in parallel. |
| 75 | + |
| 76 | +## Mixing Cilk keywords and C/C++ code |
| 77 | + |
| 78 | +The Cilk keywords can be combined flexibly with each other and with standard C/C++ code. The following synthetic example, inspired by real Cilk programs, demonstrates some of this flexibility. |
| 79 | + |
| 80 | +```cilkc |
| 81 | +#include <cilk/cilk.h> |
| 82 | +
|
| 83 | +// Tree-node structure where each node contains two arrays of size `n` and up |
| 84 | +// to two children. |
| 85 | +struct node { |
| 86 | + struct node *left_child; // Might be NULL |
| 87 | + struct node *right_child; // Might be NULL |
| 88 | + float *array_A, *array_B; |
| 89 | + int n; |
| 90 | +}; |
| 91 | +
|
| 92 | +// Recursively walk the tree rooted at `root` to count the number of nodes in |
| 93 | +// the tree and to process all arrays at all nodes. |
| 94 | +int count_and_process_tree(struct node *root) { |
| 95 | + int left_count = 0; |
| 96 | + int right_count = 0; |
| 97 | + cilk_scope { |
| 98 | + // Traverse the left and right children in parallel. |
| 99 | + if (root->left_child) |
| 100 | + left_count = cilk_spawn count_and_process_tree(root->left_child); |
| 101 | + if (root->right_child) |
| 102 | + right_count = cilk_spawn count_and_process_tree(root->right_child); |
| 103 | +
|
| 104 | + // Process the arrays at this node. |
| 105 | + cilk_for (int i = 0; i < root->n; ++i) { |
| 106 | + cilk_scope { |
| 107 | + cilk_spawn process_A(root->array_A[i]); |
| 108 | + process_B(root->array_B[i]); |
| 109 | + } |
| 110 | + combine(root->array_A[i], root->array_B[i]); |
| 111 | + } |
| 112 | + } |
| 113 | + return left_count + right_count + 1; |
| 114 | +} |
| 115 | +``` |
| 116 | + |
| 117 | +In this example, the `count_and_process_tree()` routine traverses a binary tree in parallel and processes arrays of elements at each node of that tree. This example uses the Cilk keywords in several notable ways. |
| 118 | + |
| 119 | +- The recursive spawns of `count_and_process_tree()` process the left and right children of a node in parallel, if they exist. These `cilk_spawn` statements are placed inside conditionals to ensure that the recursive spawns are performed only on non-null child nodes. |
| 120 | +- The recursive spawns of `count_and_process_tree()` allow the processing of these children to occur in parallel with the `cilk_for` that processes the arrays attached to the current node. |
| 121 | +- The `cilk_for` loop processes all `n` elements in the arrays `array_A` and `array_B` in parallel. |
| 122 | +- Each iteration `i` of the `cilk_for` loop uses `cilk_scope` and `cilk_spawn` to processes `array_A[i]` and `array_B[i]` separately in parallel, before those results are combined after the `cilk_scope`. |
| 123 | +- Each recursive spawn of `count_and_process_tree()` returns the number of nodes in the subtree rooted at that child node. The outer `cilk_scope` ensures that both of these recursive spawns have completed before returning 1 plus the sum of those counts. |
| 124 | + |
| 125 | +## Dynamic synchronization |
| 126 | + |
| 127 | +OpenCilk also supports the `cilk_sync` statement for synchronizing spawned tasks within a function, `cilk_scope`, or `cilk_for`. Although it is generally better programming practice to use `cilk_scope` for synchronization, the `cilk_sync` statement can be convenient in some situations. In addition, the `cilk_sync` statement supports dynamic synchronization of spawned tasks, as the following snippet from an all-pairs-shortest-paths Cilk program demonstrates. |
| 128 | + |
| 129 | +```cilkc |
| 130 | +cilk_scope { |
| 131 | + cilk_spawn recur(A, lda, im, i1, j0, j1, k0, k1); |
| 132 | + if (overlaps(im, i1, k0, k1)) |
| 133 | + cilk_sync; |
| 134 | + recur(A, lda, i0, im, j0, j1, k0, k1); |
| 135 | +} |
| 136 | +``` |
| 137 | + |
| 138 | +In this example, the `cilk_spawn` allows the call to `recur()` to execute in parallel with the continuation, which starts at the call to `overlaps()`. If this call returns true, then the `cilk_sync` statement will synchronize the spawned call to `recur()` before the second call to `recur()`, ensuring that the two calls execute sequentially. Otherwise, the second call to `recur()` will be allowed to execute in parallel. Finally, the `cilk_scope` ensures that both calls to `recur()` return before program execution leaves the scope. |
0 commit comments