Graphs provide a natural abstraction to model data in domains as diverse as biology (e.g., molecules), multiagent settings (e.g., online vendors on ecommerce platforms), and distributed systems (e.g., Internet). Graphs also find much use as theoretical objects (e.g., probabilistic graphical models), and several important algorithms (e.g., max-flow for image segmentation) can be invoked when tasks are formulated in terms of graphs. Graph neural networks (GNNs) are state-of-the-art models for embedding graphs, and thus naturally suited for making predictions based on graphs. GNNs have demonstrated superior performance in numerous applications including product recommendation, protein design, drug discovery, stock market prediction, etc. However, GNNs remain poorly understood in terms of what they can and cannot do (representation limits), and how well they can predict on new data (generalization).
Towards designing better models for embedding graphs, we address some fundamental questions about GNNs. We establish the inability of GNNs to distinguish some graphs that differ in properties such as cycles, but have similar local structure. We then provide the first data dependent generalization bounds for GNNs. In other words, we characterize the effect of data on the ability of a trained deep learning model to predict well on new graphs. Moreover, we introduce a new model that is provably more powerful than state-of-the-art message passing models.