0%

Graph Convolutional Networks

Graph Convlutional Network

Classical Wavelet Transform (CWT)

In general, the CWT will be generated by the choice of a single "mother" wavelet . Wavelets at different locations and spatial scales are formed by translating and scaling the mother wavelet. We write this by

This scaling convention preserves the norm of the wavelets. . For a given signal , the wavelet coefficient at scale and location is given by the inner product of with the wavelet , i.e.

The CWT satisfies the admissibility condition

Comparisons with Fourier transform (continuous-time) [6]

The wavelet transform is often compared with the Fourier transform, in which signals are represented as a sum of sinusoids. In fact, the Fourier transform can be viewed as a special case of the continuous wavelet transform with the choice of the mother wavelet . The main difference in general is that wavelets are localized in both time and frequency whereas the standard Fourier transform is only localized in frequency.

Conjugate Transpose: The conjugate transpose (or Hermitian transpose) of an m-by-n matrix with complex entries, is the n-by-m matrix obtained from by taking the transpose and then taking the complex conjugate of each entry (the complex conjugate of being , for real numbers a and b). It is often denoted as or . For real matrices, the conjugate transpose is just the transpose, .

Complex Conjugate: the complex conjugate of a complex number is the number with an equal real part and an imaginary part equal in magnitude, but opposite in sign. Given a complex number , the complex conjugate of , often denoted as . For matrices of complex numbers, , where represents the element-by-element conjugation of . Contrast this to the property , where represents the conjugate transpose of .

Vanilla Graph Convlutional Network

A multi-layer Graph Convolutional Network (GCN) with the following layer-wise propagation rule:

Here, is the adjacency matrix of the undirected graph with added self-connections. is the identity matrix, $ _{ii} = j {ij}$ and is a layer-specific trainable weight matrix. denotes an activation function, such as is the matrix of activations in the layer; .

Spectral Graph Convolutions

We consider spectral convolutions on graphs defined as the multiplication of a signal (a scalar for every node) with a filter parameterized by in the Fourier domain,

where is the matrix of eigenvectors of the normalized graph Laplacian , with a diagonal matrix of its eigenvalues and being the graph Fourier transform of . Drawbacks of this formulation are:

  • Computationally expensive, as multiplication with the eigenvector matrix is ;
  • Computing the eigendecompositon of in the first place might be prohibitively expensive for large graphs

Graph Fourier Transform

We are interested in processing signals defined on undirected and connected graphs , where is a finite set of vertices, is a set of edges and is a weighted adjacency matrix encoding the connection weight between two vertices. A signal defined on the nodes of the graph may be regarded as a vector where is the value of at the node. An essential operator in spectral graph analysis is the graph Laplacian, which combinatorial definition is where is the diagonal degree matrix with , and normalized definition is where is the identity matrix. As is a real symmetric positive semidefinite matrix, it has a complete set of orthonormal eigenvectors , known as the graph Fourier modes, and their associated ordered real nonnegative eigenvalues , identified as the frequencies of the graph. The Laplacian is indeed diagonalized by the Fourier basis such that where . The graph Fourier transform of a signal is then defined as , and its inverse as . As on Euclidean spaces, that transform enables the formulation of fundamental operations such as filtering. [1]

Spectral filtering of graph signals

As we cannot express a meaningful translation operator in the vertex domain, the convolution operator on graph is defined in the Fourier domain such that , where is the element-wise Hadamard product. It follows that a signal is filtered by as

A non-parametric filter, i.e. a filter whose parameters are all free, would be defined as

where the paramter is a vector of Fourier coefficients.

Another Perspective to GCN [2]

  • adjacency matrix of graph
  • degree matrix
  • Laplacian matrix
  • convolutional kernel filter
  • signal on graph

There are usually three kinds of Laplacian matrix : - Unnormalized Laplacian: - Symmetric normalized Laplacian: - Random walk normalized Laplacian:

Graph Convolution

is convolutional operator, is Hadamard product. Laplacian matrix is L's eigenvectors, is eigenmatrix. and is doing Fourier transform for and , respectively.

Spectral Graph Convolution

where

is convolutional kernel 's Fourier tranform 's triangle form. .

Paramters Graph Convolution [3]

This is replacement of with . And is the parameters we aims to learn. Drawbacks: - Huge computation, too complex - Amount of parameters is - Convolution kernel is not localized connected

Polynomail Parameter Graph Convolution [4]

This smartly replace with , where is replaced by .

  • Reduce the parameter of convolution kernels from to
  • Convolutional Kernal has localized property
  • is convolutional kernel's reaction range
  • but, calculation is still highly complex

Chebyshev Polynomial Graph Convolution [5]

where . This consider the Chebyshev polynomial 's K-cut to approximate : , where , is Chebyshev coefficient, which is paramter in graph network. The Chebyshev polynomial can be defined recursively: , where and . So we can get:

  • similar to above, convolutional operator don't rely on whole graph, but on K-neighbors to center
  • avoid eigen decomposition of Laplacian matrix
  • but still perform K-order calculation on

One-order Chebyshev Polynomial Graph Convolution [6]

This is a special case in above equation that we set . For simplification, we set , we have

Further step, we set , we can get:

This operator will fall into number unstable and gradient issues, so it proposes renormalization trick:

where , and . Here and is the adjacency matrix and degree matrix adding self-loop. Finally, we get .

Next, we can get:

where is representation of input, and is paramter matrix, which is the part to learn in networks, is corresponding convolutional result.

References

[1] @inproceedings{defferrard2016convolutional, title={Convolutional neural networks on graphs with fast localized spectral filtering}, author={Defferrard, Micha{"e}l and Bresson, Xavier and Vandergheynst, Pierre}, booktitle={Advances in neural information processing systems}, pages={3844--3852}, year={2016} }

[2] https://zhuanlan.zhihu.com/p/126687306

[3] @article{bruna2013spectral, title={Spectral networks and locally connected networks on graphs}, author={Bruna, Joan and Zaremba, Wojciech and Szlam, Arthur and LeCun, Yann}, journal={arXiv preprint arXiv:1312.6203}, year={2013} }

[4] @inproceedings{defferrard2016convolutional, title={Convolutional neural networks on graphs with fast localized spectral filtering}, author={Defferrard, Micha{"e}l and Bresson, Xavier and Vandergheynst, Pierre}, booktitle={Advances in neural information processing systems}, pages={3844--3852}, year={2016} }

[5] @article{hammond2011wavelets, title={Wavelets on graphs via spectral graph theory}, author={Hammond, David K and Vandergheynst, Pierre and Gribonval, R{'e}mi}, journal={Applied and Computational Harmonic Analysis}, volume={30}, number={2}, pages={129--150}, year={2011}, publisher={Elsevier} }

[6] @article{kipf2016semi, title={Semi-supervised classification with graph convolutional networks}, author={Kipf, Thomas N and Welling, Max}, journal={arXiv preprint arXiv:1609.02907}, year={2016} }

[7] https://en.wikipedia.org/wiki/Wavelet