Merge pull request #283 from FluxML/doc

yuehhua · web-flow · commit 9f1e4b92f181 · 2022-04-08T23:34:34.000+08:00
Update docs and fix GATv2Conv
diff --git a/docs/bibliography.bib b/docs/bibliography.bib
diff --git a/docs/make.jl b/docs/make.jl
@@ -29,6 +29,10 @@ makedocs(
              "Tutorials" => [
                   "Semi-Supervised Learning with GCN" => "tutorials/semisupervised_gcn.md",
                   "GCN with Fixed Graph" => "tutorials/gcn_fixed_graph.md",
+                  "Graph Attention Network" => "tutorials/gat.md",
+                  "DeepSet for Digit Sum" => "tutorials/deepset.md",
+                  "Variational Graph Autoencoder" => "tutorials/vgae.md",
+                  "Graph Embedding" => "tutorials/graph_embedding.md",
                 ],
              "Abstractions" => [
                "Message passing scheme" => "abstractions/msgpass.md",
diff --git a/docs/src/introduction.md b/docs/src/introduction.md
@@ -27,3 +27,33 @@ Graph signals include node signals, edge signals and global (or graph) signals.
     <figcaption><em>Signals and graph signals.</em></figcaption>
 </figure>
 ```
+
+## Variable graph: `FeaturedGraph` as Container for Graph and Features
+
+A GNN model accepts a graph and features as input. To this end, `FeaturedGraph` object is designed as a container for graph and various kinds of features. It can be passed to a GNN model directly.
+
+```julia
+T = Float32
+fg = FeaturedGraph(g, nf=rand(10, 5), ef=rand(7, 11), gf=)
+```
+
+It is worth noting that it is better to convert element type of graph to `Float32` explicitly. It can avoid some issues when training or inferring a GNN model.
+
+```julia
+train_data = [(FeaturedGraph(g, nf=train_X), train_y) for _ in 1:N]
+```
+
+A set of `FeaturedGraph` can include different graph structures `g` and different features `train_X` and then pass into the same GNN model in order to train/infer on variable graphs.
+
+## Build GNN Model
+
+```julia
+model = Chain(
+    GCNConv(input_dim=>hidden_dim, relu),
+    GraphParallel(node_layer=Dropout(0.5)),
+    GCNConv(hidden_dim=>target_dim),
+    node_feature,
+)
+```
+
+A GNN model can be built by stacking GNN layers with or without regular Flux layers. Regular Flux layers should be wrapped in `GraphParallel` and specified as `node_layer` which is applied to node features.
diff --git a/docs/src/manual/conv.md b/docs/src/manual/conv.md
@@ -13,7 +13,7 @@ where ``\hat{A} = A + I``, ``A`` denotes the adjacency matrix, and
 GCNConv
 ```
 
-Reference: [Semi-supervised Classification with Graph Convolutional Networks](https://arxiv.org/abs/1609.02907)
+Reference: [Kipf2017](@cite)
 
 ---
 
@@ -37,7 +37,7 @@ and ``\hat{L} = \frac{2}{\lambda_{max}} L - I``.
 ChebConv
 ```
 
-Reference: [Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering](https://arxiv.org/abs/1606.09375)
+Reference: [Defferrard2016](@cite)
 
 ---
 
@@ -51,7 +51,7 @@ Reference: [Convolutional Neural Networks on Graphs with Fast Localized Spectral
 GraphConv
 ```
 
-Reference: [Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks](https://arxiv.org/abs/1810.02244)
+Reference: [Morris2019](@cite)
 
 ---
 
@@ -71,7 +71,7 @@ where the attention coefficient ``\alpha_{i,j}`` can be calculated from
 GATConv
 ```
 
-Reference: [Graph Attention Networks](https://arxiv.org/abs/1710.10903)
+Reference: [GAT2018](@cite)
 
 ---
 
@@ -82,7 +82,8 @@ Reference: [Graph Attention Networks](https://arxiv.org/abs/1710.10903)
 GATv2Conv
 ```
 
-Reference: [How Attentive are Graph Attention Networks?](https://arxiv.org/abs/2105.14491)
+Reference: [Brody2022](@cite)
+
 ---
 
 ## Gated Graph Convolution Layer
@@ -98,7 +99,7 @@ Reference: [How Attentive are Graph Attention Networks?](https://arxiv.org/abs/2
 GatedGraphConv
 ```
 
-Reference: [Gated Graph Sequence Neural Networks](https://arxiv.org/abs/1511.05493)
+Reference: [Li2016](@cite)
 
 ---
 
@@ -114,7 +115,7 @@ where ``f_{\Theta}`` denotes a neural network parametrized by ``\Theta``, *i.e.*
 EdgeConv
 ```
 
-Reference: [Dynamic Graph CNN for Learning on Point Clouds](https://arxiv.org/abs/1801.07829)
+Reference: [Wang2019](@cite)
 
 ---
 
@@ -130,7 +131,9 @@ where ``f_{\Theta}`` denotes a neural network parametrized by ``\Theta``, *i.e.*
 GINConv
 ```
 
-Reference: [How Powerful are Graph Neural Networks?](https://arxiv.org/pdf/1810.00826.pdf)
+Reference: [Xu2019](@cite)
+
+---
 
 ## Crystal Graph Convolutional Network
 
@@ -144,4 +147,4 @@ where ``\textbf{z}_{i,j} = [\textbf{x}_i, \textbf{x}_j}, \textbf{e}_{i,j}]`` den
 CGConv
 ```
 
-Reference: [Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties](https://arxiv.org/pdf/1710.10324.pdf)
+Reference: [Xie2018](@cite)
diff --git a/docs/src/manual/embedding.md b/docs/src/manual/embedding.md
@@ -5,3 +5,5 @@
 ```@docs
 GeometricFlux.node2vec
 ```
+
+Reference: [Grover2016](@cite)
diff --git a/docs/src/manual/models.md b/docs/src/manual/models.md
@@ -15,7 +15,7 @@ where ``A`` denotes the adjacency matrix.
 GeometricFlux.GAE
 ```
 
-Reference: [Variational Graph Auto-Encoders](https://arxiv.org/abs/1611.07308)
+Reference: [Kipf2016](@cite)
 
 ---
 
@@ -33,7 +33,7 @@ where ``A`` denotes the adjacency matrix, ``X`` denotes node features.
 GeometricFlux.VGAE
 ```
 
-Reference: [Variational Graph Auto-Encoders](https://arxiv.org/abs/1611.07308)
+Reference: [Kipf2016](@cite)
 
 ---
 
@@ -49,7 +49,7 @@ where ``\phi`` and ``\rho`` denote two neural networks and ``x_i`` is the node f
 GeometricFlux.DeepSet
 ```
 
-Reference: [Deep Sets](https://papers.nips.cc/paper/2017/hash/f22e4747da1aa27e363d86d40ff442fe-Abstract.html)
+Reference: [Zaheer2017](@cite)
 
 ---
 
@@ -67,7 +67,7 @@ where ``Z`` denotes the input matrix from encoder.
 GeometricFlux.InnerProductDecoder
 ```
 
-Reference: [Variational Graph Auto-Encoders](https://arxiv.org/abs/1611.07308)
+Reference: [Kipf2016](@cite)
 
 ---
 
@@ -82,4 +82,4 @@ Z_{\mu}, Z_{logσ} = GCN_{\mu}(H, A), GCN_{\sigma}(H, A)
 GeometricFlux.VariationalGraphEncoder
 ```
 
-Reference: [Variational Graph Auto-Encoders](https://arxiv.org/abs/1611.07308)
+Reference: [Kipf2016](@cite)
diff --git a/docs/src/manual/pool.md b/docs/src/manual/pool.md
@@ -1,13 +1,25 @@
-# Pooling layers
+# Pooling Layers
+
+## Global Pooling Layer
 
 ```@docs
 GlobalPool
 ```
 
+---
+
+## Local Pooling Layer
+
 ```@docs
 LocalPool
 ```
 
+---
+
+## Top-k Pooling Layer
+
 ```@docs
 TopKPool
 ```
+
+Reference: [Gao2019](@cite)
diff --git a/docs/src/tutorials/deepset.md b/docs/src/tutorials/deepset.md
@@ -0,0 +1,81 @@
+# Predicting Digits Sum from DeepSet model
+
+Digits sum is a task of summing up digits in images or text. This example demonstrates summing up digits in arbitrary number of MNIST images. To accomplish such task, DeepSet model is suitable for this task. DeepSet model is excellent at the task which takes a set of objects and reduces them into single object.
+
+## Step 1: Load MNIST Dataset
+
+Since a DeepSet model predicts the summation from a set of images, we have to prepare training dataset composed of a random-sized set of images and a summed result.
+
+First, the whole dataset is loaded from MLDatasets.jl and then shuffled before generating training dataset.
+
+```julia
+train_X, train_y = MLDatasets.MNIST.traindata(Float32)
+train_X, train_y = shuffle_data(train_X, train_y)
+```
+
+The `generate_featuredgraphs` here generates a set of pairs which contains a `FeaturedGraph` and a summed number for prediction target. In a `FeaturedGraph`, an arbitrary number of MNIST images are collected as node features and corresponding nodes are collected in a graph without edges.
+
+```julia
+train_data = generate_featuredgraphs(train_X, train_y, num_train_examples, 1:train_max_length)
+```
+
+`num_train_examples` is the parameter for assigning how many training example to generate. `1:train_max_length` specifies the range of number of images to contained in one example.
+
+## Step 2: Build a DeepSet model
+
+A DeepSet takes a set of objects and outputs single object. To make a model accept a set of objects, the model input must be invariant to permutation. The DeepSet model is simply composed of two parts: ``\phi`` network and ``\rho`` network. 
+
+```math
+Z = \rho ( \sum_{x_i \in \mathcal{V}} \phi (x_i) )
+```
+
+``\phi`` network embeds every images and they are summed up to be a single embedding. Permutation invariance comes from the use of summation. In general, a commutative binary operator can be used to reduce a set of embeddings into one embedding. Finally, ``\rho`` network decodes the embedding to a number.
+
+```julia
+ϕ = Chain(
+    Dense(args.input_dim, args.hidden_dims[1], tanh),
+    Dense(args.hidden_dims[1], args.hidden_dims[2], tanh),
+    Dense(args.hidden_dims[2], args.hidden_dims[3], tanh),
+)
+ρ = Dense(args.hidden_dims[3], args.target_dim)
+model = DeepSet(ϕ, ρ) |> device
+```
+
+## Step 3: Loss Functions
+
+Mean absolute error is used as the loss function. Since the model outputs a `FeaturedGraph`, the prediction is placed as a global feature in `FeaturedGraph`.
+
+```julia
+function model_loss(model, batch)
+    ŷ = vcat(map(x -> global_feature(model(x[1])), batch)...)
+    y = vcat(map(x -> x[2], batch)...)
+    return mae(ŷ, y)
+end
+```
+
+## Step 4: Training DeepSet Model
+
+```julia
+# optimizer
+opt = ADAM(args.η)
+
+# parameters
+ps = Flux.params(model)
+
+# training
+@info "Start Training, total $(args.epochs) epochs"
+for epoch = 1:args.epochs
+    @info "Epoch $(epoch)"
+
+    for batch in train_loader
+        train_loss, back = Flux.pullback(ps) do
+            model_loss(model, batch |> device)
+        end
+        test_loss = model_loss(model, test_loader, device)
+        grad = back(1f0)
+        Flux.Optimise.update!(opt, ps, grad)
+    end
+end
+```
+
+For a complete example, please check [examples/digitsum_deepsets.jl](../../examples/digitsum_deepsets.jl).
diff --git a/docs/src/tutorials/gat.md b/docs/src/tutorials/gat.md
@@ -0,0 +1,80 @@
+# Graph Attention Network
+
+Graph attention network (GAT) belongs to the message-passing network family, and it queries node feature over its neighbor features and generates result as layer output.
+
+## Step 1: Load Dataset
+
+We load dataset from Planetoid dataset. Here cora dataset is used.
+
+```julia
+train_X, train_y = map(x -> Matrix(x), alldata(Planetoid(), dataset, padding=true))
+```
+
+## Step 2: Batch up Features and Labels
+
+Just batch up features as usual.
+
+```julia
+add_all_self_loops!(g)
+fg = FeaturedGraph(g)
+train_data = (repeat(train_X, outer=(1,1,train_repeats)), repeat(train_y, outer=(1,1,train_repeats)))
+train_loader = DataLoader(train_data, batchsize=batch_size, shuffle=true)
+```
+
+Notably, self loop for all nodes are needed for GAT model.
+
+## Step 3: Build a GAT model
+
+```julia
+model = Chain(
+    WithGraph(fg, GATConv(args.input_dim=>args.hidden_dim, heads=args.heads)),
+    Dropout(0.6),
+    WithGraph(fg, GATConv(args.hidden_dim*args.heads=>args.target_dim, heads=args.heads, concat=false)),
+) |> device
+```
+
+To note that a `GATConv` with `concat=true` will accumulates `heads` onto feature dimension. Thus, in the next layer, we should use `args.hidden_dim*args.heads`. In the final layer of a network, a `GATConv` layer should be assigned with `concat=false` to average over each heads.
+
+
+## Step 4: Loss Functions and Accuracy
+
+Cross entropy loss is used as loss function and accuracy is used to evaluate the model.
+
+```julia
+model_loss(model, X, y, idx) =
+    logitcrossentropy(model(X)[:,idx,:], y[:,idx,:])
+```
+
+```julia
+accuracy(model, X::AbstractArray, y::AbstractArray, idx) =
+    mean(onecold(softmax(cpu(model(X))[:,idx,:])) .== onecold(cpu(y)[:,idx,:])
+```
+
+
+## Step 5: Training GAT Model
+
+```julia
+# ADAM optimizer
+opt = ADAM(args.η)
+
+# parameters
+ps = Flux.params(model)
+
+# training
+@info "Start Training, total $(args.epochs) epochs"
+for epoch = 1:args.epochs
+    @info "Epoch $(epoch)"
+
+    for (X, y) in train_loader
+        loss, back = Flux.pullback(ps) do
+            model_loss(model, X |> device, y |> device, train_idx |> device)
+        end
+        train_acc = accuracy(model, train_loader, device, train_idx)
+        test_acc = accuracy(model, test_loader, device, test_idx)
+        grad = back(1f0)
+        Flux.Optimise.update!(opt, ps, grad)
+    end
+end
+```
+
+For a complete example, please check [examples/gat.jl](../../examples/gat.jl).
diff --git a/docs/src/tutorials/graph_embedding.md b/docs/src/tutorials/graph_embedding.md
@@ -0,0 +1 @@
+# Graph Embedding Through Node2vec model
diff --git a/docs/src/tutorials/vgae.md b/docs/src/tutorials/vgae.md
diff --git a/src/embedding/node2vec.jl b/src/embedding/node2vec.jl
diff --git a/src/layers/conv.jl b/src/layers/conv.jl

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+# Graph Embedding Through Node2vec model`