-
Notifications
You must be signed in to change notification settings - Fork 7
SATE 6 Classic DARPA CGC report
This file contains an (ongoing) report about cases of undefined behavior and deviations from the ISO C standard found in the DARPA CGC corpus as part of the SATE 6 Classic Track.
These issues have been found with Frama-C/Eva, and manual inspection to propose patches for the code in some cases.
Update: The bugs reported here will be progressively reported upstream, to TrailOfBits, with proposed patches.
The issues were separated in the following categories:
-
Syntactic issues: either current Frama-C limitations (e.g. uncommon situations related to flexible array members), or deviations from the C standard (e.g. usage of Clang's blocks extension)
-
Unintentional undefined behavior in the challenges: each challenge contains a
README.md
file describing one or several intentional vulnerabilities in the code. The issues reported here seem unrelated to those vulnerabilities and are likely unintended bugs present in the code. When possible, a patch is suggested to fix the issue. Otherwise, the analysis is often unable to advance further. -
Intentional undefined behavior: in a few cases, the code deliberately violates the standard (often with comments indicating why). Such cases are not currently handled, although some patches could be proposed to ensure Frama-C/Eva could proceed with the analysis. Such patches would likely affect the intended exploits of such vulnerabilities.
-
Possible undefined behavior due to absence of checks: some challenges contain code that does not perform some validation tests, which leads to possible undefined behavior under specific circumstances. It is not entirely clear whether such cases are intended (to make the vulnerabilities exploitable) or accidental. They are nevertheless listed as spots for possible improvements in the code.
-
Undefined behavior in the "glue" code: in at least one file, there is a situation where the undefined behavior is present not in the challenges, but in the "glue" code written by Trail Of Bits to port the testcases to "common" operating systems: Windows, macOS and Linux.
Note: due to the large amount of test cases and manual effort required to analyze them, this report is not definitive. New cases may be periodically added to this report.
-
cgc_regexp.h
: cannot parse: contains a field declared with a type containing a flexible array member -
challenges/Corinth/lib/cgc_protocol.h
: prototype for functioncgc_protocol_with_recv_frame
uses Clang's blocks extension, which is non-portable. The same also happens with theQuery_Calculator
challenge. -
challenges/ShoutCTF/src/service.c:478
, functionmain
: The following expression performs arithmetic on a pointer to void, which is a non-C99 GNU extension:r = *(unsigned int *)secret_page ^ *(unsigned int *)&secret_page[20];
Replacing
void *
withchar *
in the code should not change its behavior (since it is later cast intounsigned int *
anyway). -
challenges/middleware_handshake/src/cgc_modes.h
: This program defines, incgc_modes.h
, a typemode_t
which is astruct
with several fields. There is an unfortunate collision with themode_t
type defined in POSIX, which is defined insys/types.h
and included by several other headers. Renaming this type tocgc_mode_t
or some other name is sufficient to avoid possible collisions. -
challenges/middleware_handshake/r_lcg.c
andchallenges/middleware_handshake/r_system.c
: There are two incoherences in the renaming of global variablescgc_lcg_rng
andcgc_system_rng
: they have been renamed in the files which define and initialize them, but not inrng.c
, which declares and uses them. Comparing the code with the original samples confirms that both had the same name. Otherwise, the program will try to dereference null pointers when using them (the variables inrng.c
will be zero-initialized by default), leading to undefined behavior.
In function handle_delete
, the following suspect code is present:
void handle_delete(size_t size) {
...
free_object(data);
for (i = 0; i < data->len; i++)
{
...
}
...
}
In this code, after calling free_object(data)
, the len
field is used to
free some inner elements of the data structure.
This has not been confirmed 100% by Frama-C (due to instrumentation used to
normalize code among all challenges), but it is very likely that len is being
read from a dangling pointer, which could work in practice (since it has just
been deallocated) but constitutes undefined behavior.
totalLen
is declared in the stack, but not initialized before use,
in total_len += len;
. This is a UB, and totalLen
is likely to contain
a garbage value.
jokedb->count
is read without having been initialized
(jokedb
is a stack variable local to main
).
This is a UB, and count
is likely to contain a garbage value.
Local array playerInfoType players[8]
leaves its elements uninitialized,
then function add_player
tests the first element of array player_name
(before any initialization).
In function cgc_fpai_read_nbits
, local variable bits
is and-ed without
having been initialized: bits &= 0
.
Local variable hard
is declared without initializer; a loop with a
conditional branch optionally initializes it to 0, then it is tested.
If the variable has not been set during the loop, the access is UB.
Fixing it (by initializing hard = 1
) reveals another issue,
later in the code.
An interesting issue in the following piece of code:
int i = 0;
while (playerList[i].player_name[0] != 0 && i < MAX_PLAYERS)
++i;
if (i == MAX_PLAYERS) {
cgc_printf("Too many players\n");
return -1;
}
The playerList array contains 8 (== MAX_PLAYERS) elements.
The order of the tests is inverted: i < MAX_PLAYERS
is tested after accessing
playerList[i]
, so when all players are filled, the loop will access
playerList[8]
, causing an array out-of-bounds access and therefore UB.
This is visible in Frama-C thanks to the Red Alarms panel, since otherwise only a yellow alarm is emitted, due to the 7 first iterations being ok.
There is a non-conforming initialization of each byte in a memory page,
after the cgc_gWords
array is allocated. The array is allocated to its
exact size (the number of words it will contain) and then each element
is initialized to point to the corresponding word. However, after the loop,
the array is iterated from its last position up to PAGE_SIZE/sizeof(char*)
,
presumably to ensure the rest of the memory page contains nothing.
However, since this memory region has never been explicitly allocated,
such accesses lead to undefined behavior.
There is a signed overflow on a left shift ((0xff << 24)
) which seems
unintentional.
In the loop, when bytes
is 0, the code performs a right shift of a
negative amount, which is UB.
There are two goto fail
in the code, one of which does not initialize
variable err
, which is accessed while printing an error message in the
block following the label. In this code:
if (cgc_freaduntil(buf, sizeof(buf), '\n', cgc_stdin) < 0)
goto fail;
It seems that the intention would have been to write instead:
if ((err = cgc_freaduntil(buf, sizeof(buf), '\n', cgc_stdin)) < 0)
goto fail;
This requires moving the declaration of error_t err
a few lines above its
current location.
(Unconfirmed: possibly a false positive due to instrumentation used to run Frama-C.)
Variable payload
contains a field value
pointing to buf
near the end
of the function. The call to cgc_free_frame(payload)
deallocates the
value
field. The following call to cgc_deallocate
thus performs a
double free of payload
.
This function allocates memory for a psetElement
, which is pointer to a
struct setElement
, defined as:
struct setElement {
char *value;
int type;
};
Assuming a 32-bit architecture, struct setElement
has sizeof
8, while
psetElement
, a pointer, has sizeof
4.
The allocated variable, copy
, has its fields value
and type
set by the
function. However, since only the 4 first bytes were allocated, instruction
copy->type = element->type
tries to write outside allocated memory and thus
invokes undefined behavior.
The likely intended allocation should have been of a struct setElement
instead of a psetElement
, to ensure that both fields have their memory
allocated.
(This could be related to the intentional bug present in the code, but the
description in README.md
mentions a different function).
The validation of the UserId
variable in the code below misses value 128,
which could lead to an out-of-bounds access, since cgc_Users
has size 128:
// validate the userid
if (UserId > MAX_USERS) {
return(0);
}
if (cgc_Users[UserId].Username[0] == '\0') {
return(0);
}
There is a mismatch (related to a off-by-one error) between the memory
allocated for maps and their usage. In the following code, both xWidth
and
yWidth
equal 1. Therefore the allocation reserves space for a single int
,
but the loops iterate over 4 elements, triggering undefined behavior:
if(!(map->data = cgc_malloc(xWidth * yWidth * sizeof(unsigned int))))
cgc__terminate(ALLOCATE_ERROR);
for(int c=0; c<=xWidth; c++) {
for(int r=0; r<=yWidth; r++) {
*(map->data + r*xWidth + c) = -1;
}
}
Allocating (xWidth+1) * (yWidth+1) * sizeof(unsigned int))
bytes fixes it.
The following part of the function allocates insufficient memory for the string
(forgetting the terminating \0
character), meaning that the strcpy
at the
end overruns the buffer:
nameSize = cgc_strlen(name);
if(!(channel->name = cgc_malloc(nameSize))) {
cgc_free(channel);
return NULL;
}
cgc_memset(channel->name, 0, nameSize);
cgc_strcpy(channel->name, name);
For instance, the code is called with name
pointing to "FLAG"
, so only 4
bytes are allocated, instead of 5.
There is a missing branch in this function: a local variable Matches
is
declared and initialized to NULL
, and then only referenced as the source
argument of a call to cgc_memcpy
(it is supposed to be copied into
NewMatches
, during growth reallocation).
Since Matches
is never assigned after it has been initialized,
it still points to NULL
during the call to memcpy
, triggering undefined
behavior.
The intended code would likely check if Matches
were still null (in which
case it would simply assign it to NewMatches
), otherwise do what is currently
present in the code: copy the data in Matches
to NewMatches
, then free
Matches
, before updating it to the newly allocated zone.
Function cgc_initMacros(Macro** macro_ptr)
dereferences the macro_ptr
pointer. It points to macros
, a local variable in the caller which has not
been initialized, thus triggering undefined behavior. Initializing macros
to
NULL
should avoid the issue.
The expression 1 << 31
overflows on machines with 32-bit integers.
The same happens in function route_bit
, and similar situations in other
functions. In most cases, replacing 1
with 1U
, to ensure the operations
are performed on unsigned integers, avoids the problem.
This function relies on undefined behavior to work. For completeness, we present below the entire code of the function:
static int mask_to_length(unsigned int mask)
{
int length;
if (mask == 0)
return 0;
if (mask == 0xFFFFFFFF)
return 32;
for (length = 0; length <= 32; length++)
if (0xFFFFFFFF << (32 - length) == mask)
break;
return length;
}
The undefined behavior happens inside the loop, as soon as the if
is reached:
if (0xFFFFFFFF << (32 - length) == mask)
break;
By reading the code of the function, we assume the following intention:
it is supposed to return the length, from left to right, of the number of bits
set to 1 in mask
. Valid masks consist exclusively of 1's followed by 0's.
Therefore, there are 33 valid values (0x00000000, 0x80000000, 0xC0000000, etc.)
and all other values are invalid, in which case the function returns 33.
However, for the code to work, it requires that, whenever a number is
left-shifted, higher-order bits are discarded and lower-order bits are replaced
with 0. For instance, given as input the mask 0xFFFFFFFE
, the 32nd loop
iteration will test 0xFFFFFFFF << (32 - 31)
, which is 0xFFFFFFFF << 1
,
which is, according to the intended logic, 0xFFFFFFE
. This is equal to the
provided mask, therefore we exit the loop and return 31.
Unfortunately, C99 does not provide such a portable shift operator with the
desired behavior; the result is that, when compiling the code with a modern GCC,
it is capable of statically inferring that 0xFFFFFFFF << (32 - length)
results in undefined behavior for all values of length
, and therefore
decides to simply return true
, making the function return 0 for every value
of mask
except 0xFFFFFFFF
, which is handled before the loop.
Here, reverse engineering the function behavior (since it is undocumented) is relatively easy, but it cannot be reliably done via testing, since each compiler may decide on a different behavior: even exhaustive testing of the 232 possible values may lead to erroneous results, because the test oracle may vary depending on the compilation flags and on the compiler version.
Here is a modified version of the function which does not trigger undefined behavior:
int ub_free_mask_to_length(unsigned int mask) {
int length;
if (mask == 0) return 0;
if (mask == 0xFFFFFFFF) return 32;
for (length = 1; length < 32; length++)
if (~((1U << (32 - length))-1) == mask)
break;
return length >= 32 ? 33 : length;
}
It is not so clear to read, but it only performs shifts smaller than 32, and the value never overflows. A subtraction followed by a bitwise negation produce the required mask pattern.
Similar patches are necessary in other parts of the code; for instance,
functions route_mask
, cmd_add_route
and cmd_query_route
also rely on the
same style of left shifts that always overflow.
Another case of a left shift which overflows, due to the fact that f0
has
type unit8_t
, and therefore in the expression f0 << 24
it is promoted to
signed int
and not unsigned int
. Adding a cast to unsigned
is
sufficient to prevent the UB:
return ((unsigned)f0 << 24) | (f1 << 16) | (f2 << 8) | f3;
Same situation as the previous one:
x |= bytes[i-1] << 24;
bytes
has type char *
, promoted to int
, leading to possible overflow.
Casting bytes[i-1]
to unsigned
once again prevents UB from happening.
Another very similar situation happens in bn.c:80
, function cgc_bn_length
:
if (d & (1 << (i - 1)))
Here, the first 1 forces type int
for the shift expression, leading to UB
when i == 32
. Adding a U
suffix to the constant prevents UB from happening.
Besides the "accidental UB" cases above, there is at least one case of a deliberate UB, which is problematic for Frama-C.
// Constify via float
float infinity = 1.0 / 0.0;
// NOTE: this should always constify to 0, but *technically* the behavior
// is undefined.
rx_buf[FLOAT_CONST_OFF] = (unsigned char)((float)rx_buf[FLOAT_CONST_OFF] * infinity);
Indeed, the multiplication of infinity with a value containing zero and non-zero results in either infinity or NaN; casting that to an integer value always results in undefined behavior.
Manipulates hardcoded addresses to memory zones which were not allocated, to more easily trigger intended segmentation faults. This prevents Frama-C from analyzing the program, since it assumes the strict C semantics from the standard.
In some cases, undefined behavior seems possible in the code, due to absence of defensive programming. There is no guarantee that any given execution will reach it, but good programming practice should perform extra checks to avoid it. Such checks are also very helpful for analysis tools to rule out such possibilities.
The comments of many functions, for instance in assemble.c
, state that they
may return NULL
in case of elements not found. However, their callers never
check for this possibility. One of the intended vulnerabilities in the code
does mention dereferencing of null pointers, which might explain why no checks
are present. However, it's not clear if all such checks are related to it, or
if some of them were forgotten.
cgc_get_function_by_position
may return -1 if the service is not found, but
the caller never checks for this possibility, using the index directly.
There is also a case of a very suspicious code that appears in the "glue" code developed by Trail Of Bits, to port the challenges from the DECREE operating system to a Linux.
In include/ansi_x931_aes128.c
, function cgc_gen_block
, we have:
uint8_t i = BLOCK_SIZE - 1;
while (i >= 0) {
++prng->state.datetime[i];
if (prng->state.datetime[i] != 0) break;
i -= 1;
}
Because i is unsigned and always fits into an int, the while condition is
equivalent to while (1)
, which is not necessarily the authors' intention.
In particular, if the loop wraps around (supposing all prng values just got to
zero), then i will become 255, and the access to state.datetime[255]
will
lead to UB.