- AI vs. ML vs. DL
- Apache Spark & Deep Learning
- Attention Mechanisms & Memory Networks
- Automated Machine Learning & AI
- AI & Autonomous Vehicles
- Backpropagation
- Bag of Words & TF-IDF
- Clojure AI
- Comparison of AI Frameworks
- Convolutional Neural Network (CNN)
- Data for Deep Learning
- Datasets and Machine Learning
- Decision Tree
- Deep Autoencoders
- Deep-Belief Networks
- Deep Reinforcement Learning
- Deep Learning Resources
- Deeplearning4j
- Denoising Autoencoders
- Machine Learning DevOps
- Differentiable Programming
- Eigenvectors, Eigenvalues, PCA, Covariance and Entropy
- Evolutionary & Genetic Algorithms
- Fraud and Anomaly Detection
- Generative Adversarial Network (GAN)
- Glossary
- Gluon
- Graph Analytics
- Hopfield Networks
- Hyperparameter
- Wiki Home
- Java AI
- Jumpy
- Logistic Regression
- LSTMs & RNNs
- Machine Learning Algorithms
- Machine Learning Demos
- Machine Learning Software
- Machine Learning Operations (MLOps)
- Machine Learning Research Groups & Labs
- Machine Learning Workflows
- Machine Learning
- Markov Chain Monte Carlo
- Multilayer Perceptron
- Natural Language Processing (NLP)
- ND4J
- Neural Network Tuning
- Neural Networks
- Open Datasets
- Python AI
- Questions When Applying Deep Learning
- Radial Basis Function Networks
- Random Forest
- Recurrent Network (RNN)
- Recursive Neural Tensor Network
- Restricted Boltzmann Machine (RBM)
- Robotic Process Automation (RPA) & AI
- Scala AI
- Single-layer Network
- Skynet, or How to Regulate AI
- Spiking Neural Networks
- Stacked Denoising Autoencoder (SDA)
- Strong AI vs. Weak AI
- Supervised Learning
- Symbolic Reasoning
- Text Analysis
- Thought Vectors
- Unsupervised Learning
- Deep Learning Use Cases
- Variational Autoencoder (VAE)
- Word2Vec, Doc2Vec and Neural Word Embeddings

Contents

- Concrete Examples of Graph Data Structures
- Difficulties of Graph Data: Size and Structure
- Representing and Traversing Graphs for Machine Learning
- Footnotes
- Further Resources on Graph Data Structures and Deep Learning

Graphs are data structures that can be ingested by various algorithms, notably neural nets, learning to perform tasks such as classification, clustering and regression.

TL;DR: here’s one way to make graph data ingestable for the algorithms:

```
Data (graph, words) -> Real number vector -> Deep neural network
```

Algorithms can “embed” each node of a graph into a real vector (similar to the embedding of a word). The result will be vector representation of each node in the graph with some information preserved. Once you have the real number vector, you can feed it to the neural network.

The simplest definition of a graph is “a collection of items connected by edges.” There are many problems where it’s helpful to think of things as graphs.^{1} The items are often called *nodes* or *points* and the edges are often called *vertices*, the plural of vertex. Here are a few concrete examples of a graph:

- Cities are nodes and highways are edges
- Humans are nodes and relationships between them are edges (in a social network)
- States are nodes and the transitions between them are edges (for more on states, see our post on deep reinforcement learning). For example, a video game is a graph of states connected by actions that lead from one state to the next…
- Atoms are nodes and chemical bonds are edges (in a molecule)
- Web pages are nodes and hyperlinks are edges (Hello, Google)
- A thought is a graph of synaptic firings (edges) between neurons (nodes)
- A neural network is a graph … that makes predictions about other graphs. The nodes are places where computation happens and the edges are the paths by which signal flows through the mathematical operations

Any ontology, or knowledge graph, charts the interrelationship of entities (combining symbolic AI with the graph structure):

- Taxonomies of animal species
- Diseases that share etiologies and symptoms
- Medications that share ingredients

Applying neural networks and other machine-learning techniques to graph data can de difficult.

The first question to answer is: What kind of graph are you dealing with?

Let’s say you have a finite state machine, where each state is a node in the graph. You can give each state-node a unique ID, maybe a number. Then you give all the rows the names of the states, and you give all the columns the same names, so that the matrix contains an element for every state to intersect with every other state. Then you could mark those elements with a 1 or 0 to indicate whether the two states were connected in the graph, or even use weighted nodes (a continuous number) to indicate the likelihood of a transition from one state to the next. (The transition matrix below represents a finite state machine for the weather.)

That seems simple enough, but many graphs, like social network graphs with billions of nodes (where each member is a node and each connection to another member is an edge), are simply too large to be computed. **Size** is one problem that graphs present as a data structure. In other words, you can’t efficiently store a large social network in a tensor. They don’t compute.

Neural nets do well on vectors and tensors; data types like images (which have structure embedded in them via pixel proximity – they have fixed size and spatiality); and sequences such as text and time series (which display structure in one direction, forward in time).

Graphs have an **arbitrary structure**: they are collections of things without a location in space, or with an arbitrary location. They have no proper beginning and no end, and two nodes connected to each other are not necessarily “close”.

You usually don’t feed whole graphs into neural networks, for example. They would have to be the same shape and size, and you’d have to line up your graph nodes with your network’s input nodes. But the whole point of graph-structured input is to not know or have that order. There’s no first, there’s no last.

The second question when dealing with graphs is: What kind of question are you trying to answer by applying machine learning to them? In social networks, you’re usually trying to make a decision about what kind person you’re looking at, represented by the node, or what kind of friends and interactions does that person have. So you’re making predictions about the node itself or its edges.

Since that’s the case, you can address the uncomputable size of a Facebook-scale graph by looking at a node and its neighbors maybe 1-3 degrees away; i.e. a subgraph. The immediate neighborhood of the node, taking `k`

steps down the graph in all directions, probably captures most of the information you care about. You’re filtering out the giant graph’s overwhelming size.

Let’s say you decide to give each node an arbitrary representation vector, like a low-dimensional word embedding, each node’s vector being the same length. The next step would be to traverse the graph, and that traversal could be represented by arranging the node vectors next to each other in a matrix. You could then feed that matrix representing the graph to a recurrent neural net. That’s basically DeepWalk (see below), which treats truncated random walks across a large graph as sentences.

If you turn each node into an embedding, much like word2vec does with words, then you can force a neural net model to learn representations for each node, which can then be helpful in making downstream predictions about them. (How close is this node to other things we care about?)

Another more recent approach is a *graph convolutional network*, which very similar to convolutional networks: it passes a node filter over a graph much as you would pass a convolutional filter over an image, registering each time it sees a certain kind of node. The readings taken by the filters are stacked and passed to a maxpooling layer, which discards all but the strongest signal, before we return to a filter-passing convolutional layer.

One interesting aspect of graph is so-called side information, or the attributes and features associated with each node. For example, each node could have an image associated to it, in which case an algorithm attempting to make a decision about that graph might have a CNN subroutine embedded in it for those image nodes. Or the side data could be text, and the graph could be a tree (the leaves are words, intermediate nodes are phrases combining the words) over which we run a recursive neural net, an algorithm popolarized by Richard Socher.

Finally, you can compute derivative functions such as graph Laplacians from the tensors that represent the graphs, much like you might perform an eigen analysis on a tensor. These functions will tell you things about the graph that may help you classify or cluster it. (See below for more information.)

1) *In a weird meta way it’s just graphs all the way down, not turtles. A human scientist whose head is full of firing synapses (graph) is both embedded in a larger social network (graph) and engaged in constructing ontologies of knowledge (graph) and making predictions about data with neural nets (graph).*

Below are a few papers discussing how neural nets can be applied to data in graphs.

A Comprehensive Survey on Graph Neural Networks

by Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, Philip S. Yu

Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into different categories. With a focus on graph convolutional networks, we review alternative architectures that have recently been developed; these learning paradigms include graph attention networks, graph autoencoders, graph generative networks, and graph spatial-temporal networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes and benchmarks of the existing algorithms on different learning tasks. Finally, we propose potential research directions in this fast-growing field.

Representation Learning on Graphs: Methods and Applications (2017)

by William Hamilton, Rex Ying and Jure Leskovec

*Machine learning on graphs is an important and ubiquitous task with applications ranging from drug design to friendship recommendation in social networks. The primary challenge in this domain is finding a way to represent, or encode, graph structure so that it can be easily exploited by machine learning models. Traditionally, machine learning approaches relied on user-defined heuristics to extract features encoding structural information about a graph (e.g., degree statistics or kernel functions). However, recent years have seen a surge in approaches that automatically learn to encode graph structure into low-dimensional embeddings, using techniques based on deep learning and nonlinear dimensionality reduction. Here we provide a conceptual review of key advancements in this area of representation learning on graphs, including matrix factorization-based methods, random-walk based algorithms, and graph convolutional networks. We review methods to embed individual nodes as well as approaches to embed entire (sub)graphs. In doing so, we develop a unified framework to describe these recent approaches, and we highlight a number of important applications and directions for future work.*

A Short Tutorial on Graph Laplacians, Laplacian Embedding, and Spectral Clustering

by Radu Horaud

Community Detection with Graph Neural Networks (2017)

by Joan Bruna and Xiang Li

DeepWalk: Online Learning of Social Representations (2014)

by Bryan Perozzi, Rami Al-Rfou and Steven Skiena

*We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk’s latent representations on several multi-label network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk’s representations can provide F1 scores up to 10% higher than competing methods when labeled data is sparse. In some experiments, DeepWalk’s representations are able to outperform all baseline methods while using 60% less training data. DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.*

DeepWalk is implemented in Deeplearning4j.

Deep Neural Networks for Learning Graph Representations (2016) by Shaosheng Cao, Wei Lu and Qiongkai Xu

*In this paper, we propose a novel model for learning graph representations, which generates a low-dimensional vector
representation for each vertex by capturing the graph structural information. Different from other previous research efforts,
we adopt a random surfing model to capture graph structural information directly, instead of using the samplingbased
method for generating linear sequences proposed by Perozzi et al. (2014). The advantages of our approach will
be illustrated from both theorical and empirical perspectives. We also give a new perspective for the matrix factorization
method proposed by Levy and Goldberg (2014), in which the pointwise mutual information (PMI) matrix is considered as
an analytical solution to the objective function of the skipgram model with negative sampling proposed by Mikolov et
al. (2013). Unlike their approach which involves the use of the SVD for finding the low-dimensitonal projections from
the PMI matrix, however, the stacked denoising autoencoder is introduced in our model to extract complex features and
model non-linearities. To demonstrate the effectiveness of our model, we conduct experiments on clustering and visualization
tasks, employing the learned vertex representations as features. Empirical results on datasets of varying sizes show
that our model outperforms other state-of-the-art models in such tasks.*

Deep Feature Learning for Graphs

by Ryan A. Rossi, Rong Zhou, Nesreen K. Ahmed

by Jiwei Li, Alan Ritter and Dan Jurafsky

*Inferring latent attributes of people online is an important social computing task, but requires integrating the many heterogeneous sources of information available on the web. We propose learning individual representations of people using neural nets to integrate rich linguistic and network evidence gathered from social media. The algorithm is able to combine diverse cues, such as the text a person writes, their attributes (e.g. gender, employer, education, location) and social relations to other people. We show that by integrating both textual and network evidence, these representations offer improved performance at four important tasks in social media inference on Twitter: predicting (1) gender, (2) occupation, (3) location, and (4) friendships for users. Our approach scales to large datasets and the learned representations can be used as general features in and have the potential to benefit a large number of downstream tasks including link prediction, community detection, or probabilistic reasoning over social networks.*

node2vec: Scalable Feature Learning for Networks (Stanford, 2016) by Aditya Grover and Jure Leskovec

*Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not expressive enough to capture the diversity of connectivity patterns observed in networks. Here we propose node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks. In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. We define a flexible notion of a node’s network neighborhood and design a biased random walk procedure, which efficiently explores diverse neighborhoods. Our algorithm generalizes prior work which is based on rigid notions of network neighborhoods, and we argue that the added flexibility in exploring neighborhoods is the key to learning richer representations. We demonstrate the efficacy of node2vec over existing state-of-the-art techniques on multi-label classification and link prediction in several real-world networks from diverse domains. Taken together, our work represents a new way for efficiently learning state-of-the-art task-independent representations in complex networks.*

Gated Graph Sequence Neural Networks (Toronto and Microsoft, 2017) by Yujia Li, Daniel Tarlow, Marc Brockschmidt and Richard Zemel

*Graph-structured data appears frequently in domains including chemistry, natural language semantics, social networks, and knowledge bases. In this work, we study feature learning techniques for graph-structured inputs. Our starting point is previous work on Graph Neural Networks (Scarselli et al., 2009), which we modify to use gated recurrent units and modern optimization techniques and then extend to output sequences. The result is a flexible and broadly useful class of neural network models that has favorable inductive biases relative to purely sequence-based models (e.g., LSTMs) when the problem is graph-structured. We demonstrate the capabilities on some simple AI (bAbI) and graph algorithm learning tasks. We then show it achieves state-of-the-art performance on a problem from program verification, in which subgraphs need to be matched to abstract data structures.*

Graph Classification with 2D Convolutional Neural Networks

Deep Learning on Graphs: A Survey (December 2018)