Legal Neighborhoods and Networked Communities: The Methods Post


The goals of the community analysis were to determine how well the network structure of Pittsburgh roads and intersections (including bridges) mapped onto the formal neighborhood structure of Pittsburgh. In other words, does mathematically generating communities according to the network structure of Pittsburgh roads map well to Pittsburgh’s 90 neighborhoods or do those neighborhoods unite disparate parts of the road network?


We conducted this analysis by

  1. Using OSM to create a network represent the streets and their intersections
  2. Using Gephi to calculate network metrics
  3. Using QGIS to limit the network to those contained within Pittsburgh city limits and add formal neighborhood IDs
  4. Using Jaccard indices to measure the similarity/diversity between the modularity “neighborhoods” and the formal neighborhoods.
  5. Analyzing and writing up in the results

Creating a Network

We generated lists (in CSV format) of nodes and edges corresponding to Pittsburgh streets through queries to Open Street Map (OSM). See our other blog posts  for more details on the query process. These two lists originally included nodes/edges intended to help map curved streets, therefore we generated a simplified version of these node/edge lists which only included intersections between streets (nodes) and the streets that connected these intersections (edges).

The initial OSM queries included nodes/edges beyond the formal limits of the City of Pittsburgh, which we temporarily retained at this step. While retaining nodes/edges from outside the city limits affects the calculations in the network analysis, we deemed it advisable in order to make an independent calculation of the modularity classes; that is, we wanted to keep the calculation from being influenced by any man-made neighborhood boundaries, which include the official city limits.

We formatted the node spreadsheet to include a unique identifier, the OSM node IDs, latitude, longitude, and the Pittsburgh neighborhood. The edge spreadsheet includes a unique identifier for each edge, source and target nodes, the OSM way ID, the OSM name label (e.g. Tulip Road), distance, and whether or not the edge is within the city boundaries. Most of these fields were kept to make the spreadsheets more human-readable and were not included in the final analysis, which only required information on source and target nodes.

Network Analysis in Gephi

We imported the nodes and edges into a version of Gephi which had the GeoLayout plugin installed for easy visualization of the network (NB: for this to work, the latitude and longitude numbers have to be imported as data type “double”) Specific import choices included summing repeated edges and allowing for self-loops.

We ran Gephi’s built-in algorithms to calculate a variety of network metrics. These metrics were run both on the entire network and on a subset of the network generated by using a .+ regex filter on the neighborhood attribute of nodes, to exclude any node that OSM did not label as having been in a Pittsburgh neighborhood (e.g. only nodes within the city limits according to OSM).

The network metric of particular interest here is modularity classes for the intersections, which divides the network into structurally-connected communities or groups of nodes. Because the Gephi’s modularity classes are stochastic, this step was run 25 times. The number of modularity classes generated over the greater city network fluctuated from between 81 and 96, with the majority of iterations falling between 87 and 92. This particular analysis was run on an iteration with 90 modularity classes, resulting in the following visualization of Pittsburgh’s street network, colored by modularity class.

The table of nodes (intersections) was then exported into CSV format for the next step. This file now included both the original attributes (such as latitude and longitude) as well as attributes for multiple network analysis metrics, most importantly modularity class.

Limiting the Network

We imported the list of intersections into QGIS and joined to a shapefile of the official Pittsburgh neighborhoods downloaded from the Western Pennsylvania Regional Data Center. At this step, all nodes not within city limits were excluded. The resulting data table now included both a calculated modularity class and an official neighborhood designation for all nodes/intersections within Pittsburgh city limits.

We visualized this data then exported the data table to CSV.

Jaccard Indices

In order to analyze how the Gephi-calculated modularity classes related to the official Pittsburgh neighborhoods, we turned to Jaccard Indices, which measure the overlap between two sets by dividing the size of their intersection (nodes in both sets) over their union (nodes in either set). As existing Jaccard algorithms assume the two sets being compared are of equal size, we had to create our own algorithm that allowed us to compare sets of differing sizes. This Python code can be found in our GitHub repository as (NB: the Python is hard-coded to read columns 4 and 5 of the import spreadsheet as the modularity class and neighborhood identifier respectively; it is also hard-coded to assume we’re talking about 90 modularity classes and 90 Pittsburgh neighborhoods and would need to be modified to deal with another context)

Running this algorithm on the CSV of intersections within the Pittsburgh city limits, comparing their official neighborhood IDs with their modularity class IDs, gave us a CSV that assigns a Jaccard Index to every combination of neighborhood and modularity class. (NB: there is also alternate code in the repository that limits this to non-zero pairings of neighborhood/modularity class only).

Analysis and Interpretation

We then analyzed the Jaccard Indices for patterns across neighborhoods and modularity classes and wrote up the results of this analysis for our blog. This analysis was conducted both quantitatively (e.g. assessing the qualities of neighborhoods with high Jaccard indices) and qualitatively (e.g. visual examination of how the modularity classes intersected neighborhoods on the map). To read the full analysis, see Legal Neighborhoods and Networked Communities: The Analysis Post.