BIDS hosts the inaugural GraphXD workshop

April 13, 2018

During spring break (March 27–29, 2018), BIDS hosted the first annual Graphs Across Domains (GraphXD) workshop.

Lauren Ponisio speaks

Lauren Ponisio

Graphs (or networks) arise in many fields. GraphXD is a BIDS project that aims to: (1) foster a community of shared interest from disparate fields including mathematics, computer science, statistics, physics, biology, sociology, and any other field with an interest in graph data; (2) develop a shared vocabulary and identify common principles, algorithms, and tools for understanding graphs; and (3) help us learn from one another while strengthening ties across disciplinary boundaries. The workshop is our main avenue for pursuing these aims.

Don't know them graph

“Don’t know them” graph

For our inaugural workshop, we gathered 35 researchers from 15 institutions. This included experts in many fields including math, computer science, statistics, ecology, entomology, neuroscience, genomics, network security, research computing, as well as energy and resources. On Tuesday (3/27) and Wednesday (3/28) we had twelve talks, many of which are available on YouTube. Topics included interesting data sets (e.g., ecological networks, functional brain networks, assembly graphs in genomics), important graph algorithms (e.g., graph clustering, maximum flow, regression), as well as research software (e.g., NetworkX, GraphBLAS). Most talks touched on all three topics: data, algorithms, and software.

Before the workshop, we sent a survey to the attendees to find out who knew who and how. In the “introductory remarks,” I discussed the “don’t know them” graph. The nodes in this graph represent the 25 survey respondents. There is an edge between two nodes if the people represented by those nodes didn’t know one another prior to the workshop. As you can see, this is a fairly dense graph with 77% of all possible edges (232 out of 300).

To sparsify this graph, we designed the workshop to provide numerous opportunities for small, personal interactions. BIDS provided breakfast, tea breaks, and lunch each day, and a BBQ from 5 to 7pm on the first day. At the end of the workshop, one of the participants wrote: “The facilities were top-notch. The room was spacious and comfortable, and there was ample desk space for folks who wanted it. Breakout rooms for day two were really helpful. The food and catering were great, and the barbecue at the end of the first day was a great way to sparsify the graph.”

Aric Hagberg

Aric Hagberg

Most talks were short (~20 minutes) to allow rapid exposure to a wide variety of topics and speakers. But we also had two longer, one-hour talks: after lunch on the first day, Aric Hagberg presented on “Exploring Network Structure, Dynamics, and Function Using NetworkX.” Aric is the Deputy Division Leader of the Computer, Computational, and Statistical Sciences Division at Los Alamos National Laboratory, and one of the original authors of NetworkX. NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. Aric told the history of NetworkX, from when he started it at Los Alamos in 2002 up to the 2.0 release last summer. Throughout, research applications across many domains including disease spread, cyber security, and scholarly impact metrics drove its development.

Aric next presented challenges for network sciences, such as (1) the tension between ease-of-use, flexibility, and performance when developing graph software as well as (2) the complexity and subtlety of developing generic graph visualization tools for big graphs (e.g., millions of nodes and billions of edges) and multiple application domains. He concluded by discussing the need for sharing more realistic large and complex graph data sets, and highlighted cyber security-related, open data sets recently released by Los Alamos.

Rasmus Kyng

Rasmus Kyng

On the second day, Rasmus Kyng gave the other one-hour talk. Rasmus is a postdoctoral researcher at MIT working on fast graph algorithms and their applications to machine learning. His talk, “How to Solve Problems on Graphs Using Linear Equations, and How to Solve Linear Equations Using Graphs,” (video) is outlined in the abstract:

“. . . Second order methods are a powerful tool in optimization, but they require solving linear equations, which can be prohibitively expensive. But when the optimization problem comes from a graph, this adds structure to the linear equations. We can leverage this structure to solve the equations quickly, making second order methods tractable. This insight has been one of the drivers of a major line of research in graph algorithms, known as the Laplacian Paradigm. In this talk, we will see how graph-structured optimization problems give rise to nice linear equations. We will also see how thinking about these linear equations in terms of graphs will let us develop very efficient algorithms for solving them. Finally, we will explore ideas that have recently played a role in making solvers for these linear equations more practical.”

On Wednesday (3/28) afternoon and all day Thursday (3/29) we had a number of self-organized activities. Here are two brief vignettes:

  1. On the first day, Lauren Ponisio and Marília Gaiarsa gave a joint talk discussing how ecological networks change (e.g., how pollinator-plant populations change after the re-introduction of native vegetation). They discussed using change point detection methods and measures like connectance to figure out when a pollinator-plant network changed in a significant way. This resulted in interesting discussions with Aaron Schild (theoretical computer science), Carl Boettiger (computational ecology), Russell Neches (microbiology), Ludwig Schmidt (theoretical computer science), Camille Scott (genomics), and others. For instance, they discussed data and code sharing. During these discussions, Lauren and Marília pointed out that they needed distance functions that don’t lose as much information as the connectance metric, which partially motivated their interest in using graph kernels. Aaron and Carl discussed whether graph kernels really were the right distance function: for example, they considered whether it might be more useful to summarize their graph using a fixed clustering algorithm instead and measuring distances between graphs by taking the Earthmover distance between the associated cluster vectors. They concluded that clusterings may not be the right idea though, because often they want a “parametrized” clustering (where you have a more “gradual” clustering, with a parameter like flower depth for pollinator-plant networks). Lauren, Marília, Carl, and Russell tested graph kernels to compare graph structures for change point analysis on simulated data. While there is more work remaining, the initial prototype looked promising. Carl, Russell, Lauren, and Marília have continued to communicate and share ideas and code since the end of the workshop.

    NetworkX/Matplotlib sprint

    NetworkX and Matplotlib sprint

  2. Prior to the workshop, Aric Hagberg, Dan Schult, Tom Caswell, and I loosely planned a joint NetworkX and Matplotlib coding sprint. In the scientific Python ecosystem, Matplotlib is the primary 2D plotting library, while NetworkX handles graph creation and manipulation in the study of the structure, dynamics, and functions of complex networks. While NetworkX is primarily an algorithms package, it has some basic Matplotlib-based functionality for drawing graphs, some of which are illustrated in the NetworkX examples gallery. Given the difficulty of producing a graph drawing library, the NetworkX team has a long standing desire to move the Matplotlib code for graph drawing to a separate package. Matplotlib developers, on the other hand, had an interest in building a collection of add-on libraries to Matplotlib for specialized applications. The GraphXD workshop therefore seemed like the perfect opportunity for bringing the two developer communities, as well as potential visualization users, together.

    Given this general goal, Nelle Varoquaux (biostatistics), Chris Holdgraf (neuroscience), Paul Ivanov (scientific programming), Camille Scott (genomics), Russell Neches (microbiology), and Seth Bromberger (critical infrastructure security) joined Dan, Aric, and Tom during the second day (3/28) to discuss the types of plots they wanted and what API they should have. After lunch on Thursday, Dan, Tom, Nelle, Chris, Camille, and Paul prototyped the new graph visualization package, Grave.

Thanks to all the speakers and participants for making the workshop a success, especially the ten troopers who ended the 3 day workshop with a 12 hour day. My co-organizers, Stéfan van der Walt and Nelle Varoquaux, and I are grateful to Senior BIDS fellows Carl Boettiger, Daniela Ushizima, and Stella Yu for their participation and support. As always, our Office Jedi, Stacey Dorton, kept a watchful eye over the many moving parts of the workshop to ensure its flawless execution.

Group photo

End of the second day

Photographs by Daniela Ushizima, Stéfan van der Walt, and Nelle Varoquaux.



Featured Fellows

Jarrod Millman

Biostatistics
DATA SCIENCE FELLOW

Nelle Varoquaux

Statistics

Chris Holdgraf

Data Science Education Program

Carl Boettiger

Environmental Science, Policy, & Management

Daniela Ushizima

Computational Research Division, Lawrence Berkeley National Lab

Stéfan van der Walt

Berkeley Institute for Data Science
Research Fellow

Stella Yu

ICSI Vision Group

Lauren Ponisio

Environmental Science, Policy, and Management
DATA SCIENCE FELLOW