Skip to content

Extract C<M> += A(I,J) where I and J are GrB_Vectors #67

@DrTimothyAldenDavis

Description

@DrTimothyAldenDavis

GrB_extract has several limitations that need to be fixed. These changes are needed for the Connected Compenents algorithm in LAGraph, and likely other graph algorithms. Consider C += A(I,J), where the mask M and accumulator (+) are not part of this discussion; no need to change those. So just consider C = A(I,J).

(1) GrB_Vectors as index arrays: the method for expressing row/column indices is via non-opaque GrB_Index arrays, I and J as I call them. This is not a good fit for algorithms that use GraphBLAS since they will likely be computing these index arrays with GraphBLAS itself, as GrB_Vectors. Then to pass the vectors to GrB_extract, they must use extractTuples first. This is slow, requiring an extra copy and also forcing a "block". If the inputs are GrB_Vectors I and J, then no copy needs to be made. As an example of this extractTuples, see the LG_CC_Boruvka ( https://github.com/GraphBLAS/LAGraph/blob/reorg/src/algorithm/LG_CC_Boruvka.c ) and LG_CC_FastSV6 ( https://github.com/GraphBLAS/LAGraph/blob/reorg/src/algorithm/LG_CC_FastSV6.c ). In particular, these 2 lines of code: https://github.com/GraphBLAS/LAGraph/blob/03cae9363f123dfe5572065a602b9c14ed4169e6/src/algorithm/LG_CC_Boruvka.c#L217 . Currently, all the examples I know of that want to use GrB_extract compute I and J with GraphBLAS and then they must use extractTuples before calling GrB_extract.

(2) Colon "notation": There is no method for expressing the index range I = lo:stride:hi. This could be done with a GrB_Index array of size 3 (I do this already) or a GrB_Vector of size 3, with a descriptor setting. See issue #15.

(3) Integer indices acting in a bollean sense: We need a method for using I and J in a mask-like fashion (not to be confused with M in the C= ... part; this is different). So consider just C=A(I,J). The GrB_Vectors I and J must have size nrows and ncols, respectively. C and A have the same dimension. The computation for any given entry A(i,j) would be:

if (I(i) is true and J(j) is true) then C(i,j) = A (i,j) else C (i,j) is not an entry.

This kind of computation is needed in the two CC methods in LAGraph. The statement "I(i) is true" could be valued, as written, or structural, as in "I(i) exists in the structure of the vector I", just like the value vs structural use of the mask M. This kind of extraction cannot be done by GrB_select, unless it is with a user-defined operator that then dereferences the arrays I and J on its own (see for example: https://github.com/GraphBLAS/LAGraph/blob/03cae9363f123dfe5572065a602b9c14ed4169e6/src/algorithm/LG_CC_Boruvka.c#L254 , which is slow).

An elegant extension of this boolean indexing idea would add a binary operator. The statement above:

for each entry A(i,j) that exists in the matrix A:
    if (I(i) is true and J(j) is true) then C(i,j) = A (i,j) else C (i,j) is not an entry.

could be done this way instead. The user passes in the GrB_LAND operator as the op, and the computation becomes:

for each entry A(i,j) that exists in the matrix A:
    if (op(I(i),J(j)) then C(i,j) = A (i,j) else C (i,j) is not an entry.

If the vectors I and J are sparse, with missing entries, then op(I(i),J(j)) would only be applied in an "eWiseMult" sense; that is, only used if I(i) and J(j) are both present, and the result of this test would be false otherwise.

This kind of extract would be simple to implement and incredibly useful. The LG_CC_FastSV6 method needs to delete all entries in the matrix that are in the current largest connected component. See https://github.com/GraphBLAS/LAGraph/blob/03cae9363f123dfe5572065a602b9c14ed4169e6/src/algorithm/LG_CC_FastSV6.c#L430 . That entire body of code could be replaced (mostly) with:

GrB_apply (I, NULL, NULL, GrB_NE_UINT64, parent, key, NULL) ;    // no extensions here; pure GrB. key is a scalar.
GxB_extract (T, NULL, NULL, GrB_LAND, A, I, I, NULL) ;  // using the boolean index vector I

The above 2 lines of code do the following computation:

for each entry A(i,j)
     if (parent (i) != key AND parent (j) != key) then T(i,j) = A(i,j) else T(i,j) does not appear in the matrix T

The body of code in LG_CC_FastSV6 also adds an extra term: T(i,key) is inserted as a new entry if the row T(i,:) differs from A(i,:). That can be done in several ways, as a separate step, so ignore it for this example. The primary work is the 2 lines above.

Consider also the user-defined index_unary operator in LG_CC_Boruvka: it returns true if Px(i) != Px(j), where Px is passed in as a void * pointer. A better method would be:

GxB_extract (S, NULL, NULL, GrB_NE_UINT64, parent, parent, NULL) ;

which computes the exact result desired. The line above deletes all edges (i,j) where node i and node j appear in the same connected component. That is, it does the following:

for each entry S(i,j)
     if (parent (i) != parent (j)) then S(i,j) = S (i,j) (keep the entry) else S(i,j) does not appear in the matrix S on output

That is, it does the following:

for each entry S(i,j)
    if (parent (i) != parent(j)) then keep the entry S(i,j) else delete it.

The expression "parent (i) != parent (j)" is specified by passing in the GrB_NE_UINT64 operator, and using the GrB_Vector parent for both inputs I and J, to compute S = S(I,J).

(4) Mask-like boolean index vectors: There is one more way to use these two vectors I and J. Assume they are already boolean (just like the mask M is treated). We have the ability to complement a mask M. I and J act like a Cartesian product, and it would be nice to complement that too. Or handle any combination of complements and boolean operations between I and J.

So imagine a mask-style vector MI used in a structural sense: MI(i) = true if I(i) is present, and false otherwise. Or let MI be constructed in a valued sense where MI(i) is true if I(i) is present and has the value true, and false otherwise. This is exactly how a mask M is handled already in GraphBLAS, but this is now used to interpret the index GrB_Vector I for GxB_extract.

But since there are TWO mask vectors MI and MJ, then cannot simply by taken as-is or complemented. For a conventional mask M there are two unary functions to implicitly apply: complement the mask, or do not complement the mask. With TWO mask vectors MI and MJ, there are 16 boolean possible functions to apply, z = f(x,y) where z,x,y are all boolean.

The descriptor would say how I and J are two be interpretted: as (1) plain index vectors (see (1) above), or (2) as "colon notation " (see (2) above), as inputs to an arbitrary binary operator (see (3) above), or as this mask-like behavior. If mask-like then the statement:

GxB_extract (C, M, accum, binaryop, A, I, J, descriptor)

would compute the following matrix T (later doing C+=T for the standard mask/accum phase):

Construct the mask vector MI from I, just like we would already for GrB_Vector_apply (v, I, ...) for a vector I.
Do the same to construct MJ from J.
for each entry A(i,j)
     if (binaryop (MI(i), MJ(j)) then T(i,j) = A(i,j) else T(i,j) does not appear as an entry in T

Then binaryop would be any of 16 possible binary operators, like GrB_LAND, GrB_XOR, GrB_XNOR, you name it. Sadly, the suite of boolean binary operators in GraphBLAS is not complete. We do not have all 16 of them. We are missing NAND(x,y) and NOR(x,y) for example. See this table for details: https://github.com/DrTimothyAldenDavis/GraphBLAS/blob/ccb8d243f1bb3ab9668f25011b01634eb7af53b5/Include/GraphBLAS.h#L1444

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions