## Goals

The goals of the community analysis were to determine how well the network structure of Pittsburgh roads and intersections (including bridges) mapped onto the formal neighborhood structure of Pittsburgh. In other words, does mathematically generating communities according to the network structure of Pittsburgh roads map well to Pittsburgh’s 90 neighborhoods or do those neighborhoods unite disparate parts of the road network?

## Outline

- Creating a Network
- Global Network Statistics
- (Modularity Class, Neighborhood) Statistics
- Conclusions

## Creating a Network

We generated lists (in CSV format) of nodes and edges corresponding to Pittsburgh streets through queries to Open Street Map (OSM). Next, we imported the nodes and edges into a version of Gephi and ran its built-in algorithms to calculate a variety of network metrics. We then used QGIS to exclude all nodes outside the city limits and finally we turned to Jaccard Indices, to measure the overlap between the formal Pittsburgh neighborhoods and the communities generated via Gephi’s modularity classes. To read about our methods in more depth, see the methods blog post here.

## Global Network Statistics

The complete network generated in Gephi – including parts of the network that fall outside the Pittsburgh city limits – includes 38,745 road segments (edges) and 28,058 road intersections (nodes). The roads form a single connected component – that is, a car can get to any part of the network from any other part of the network – which has a diameter of 23. The intersections have an unexpectedly low average degree of 1.381, given that a road intersection generally involves more than 1 road segment. These numbers are skewed by the edges of the network, where the roads continue in real life but not in the dataset – 5,799 intersections occur at the edge of the network and have degree 1. The vast majority – 16,732 intersections – connect to 3 road segments, while another 4,992 are at the intersection of 4 road segments (including but not limited to square, 4-way intersections). Despite the outsized impact they have on drivers trying to navigate them, only a few 5-way (140), 6-way (8), or 7-way (1) intersections appeared in the graph.

## (Modularity Class, Neighborhood) Statistics

Unlike global network statistics, modularity classes are stochastic and an element of randomness means that they change slightly every time they are recalculated. For the purposes of this examination, we used a set of 90 modularity classes, to match the number of official Pittsburgh neighborhoods. However, many of the initial modularity classes actually ended up falling completely outside the city limits, leaving us with a total of 43 modularity classes that intersected with the 90 official neighborhoods.

The size of these modularity classes – or rather the size of the modularity classes as truncated by the city limits – range from 1 road intersection to 649 road intersections. If we eliminate the size outliers (1, 3, 589, and 649), they range instead from 18 to 482, with 5 modularity classes containing 50 or less road intersections, 5 modularity classes containing 400 or more, and the other 29 modularity classes falling in the 50-400 range. The neighborhoods also cover a large size range, from Arlington Heights’ 8 road intersections to Brookline’s 320 road intersections.

These disparities in size and quantity lead to significant overlap of modularity classes over multiple neighborhoods, with an average of 2.677 modularity classes per neighborhood. On each of these (modularity class, neighborhood) pairs, we calculated a Jaccard Index (hereafter JI) – the number of nodes (road intersections) in their intersection divided by the number in their union – to quantitatively assess the extent of the overlap between neighborhood and modularity class. We also calculated the average JI for each neighborhood: that is, if they had non-empty intersections with 3 modularity classes, we calculated the average of the 3 JIs.

While many modularity classes overlapped official neighborhood borders, 11 of the city neighborhoods (12%) were entirely contained within a single modularity class: Allegheny West, Arlington Heights, Chateau, Crawford-Roberts, East Carnegie, Homewood South, Middle Hill, Mt. Oliver, Oakwood, Ridgemont, and Upper Hill. For some of these neighborhoods, such as Arlington Heights, this indicates that the neighborhood forms a small part of a much larger community within the city; quantitatively, Arlington Heights has the lowest average JI in the entire network at .019, which is understandable given it is also the smallest neighborhood, containing only 8 road intersections.

By contrast, some of these neighborhoods form a large part of a community that, nevertheless, splits over neighborhood borders. For example, Homewood South is completely contained in its modularity class, but has one of the highest average JIs in the network, at .348. Only Lincoln Place (.467, split across 2 modularity classes) and Fairywood (.418, split across 2 modularity classes) are higher. In the case of Lincoln Place, we see a similar pattern but the role of neighborhood and modularity class are reversed: the neighborhood actually contains an entire modularity class (of 77 road intersections), in addition to almost equally sharing a modularity class (of 84 road intersections) with neighboring New Homestead (plus a single road intersection carved off from Hays).

On the other end of the spectrum, South Oakland split across 6 modularity classes, making it the most fragmented of all the neighborhoods. An additional 5 neighborhoods split across 5 modularity classes: Squirrel Hill South, Spring Hill-City View, South Side Flats, Perry South, and Mount Washington. These generally have low JI for each modularity class, and split relatively evenly across the classes rather than falling primarily into a single modularity class. South Oakland and Perry South tie for the neighborhoods with the lowest minimum JI – South Oakland’s 6 JIs range from .001 to a maximum of .289; while Perry South’s 5 JIs range from .001 to a significantly higher maximum of .414.

While the larger neighborhoods tend to split more-or-less equally among multiple modularity classes (and vice versa), manual inspection of the neighborhoods revealed some interesting cases where this pattern fails. For example, the neighborhood of Greenfield contains 205 road intersections and splits into 2 modularity class but 202 of its nodes are in a class shared with neighboring Hazelwood while the remaining 3 – the road intersections immediately adjacent to South Oakland – fall into the second class. Squirrel Hill North and Point Breeze show similar patterns, with their shared modularity class containing 184 of 193 road intersections and 150 of 193 road intersections for each respective neighborhood.

Overall, the JIs ranged from .001 to a high of .831. Of the (modularity class, neighborhood) pairs, 38 had an average JI of less than .100 while 17 have an average of over .200 and the average across averages is .136. Looking at the minimum JI for each neighborhood (that is, the minimum JI for each (modularity class, neighborhood) pair associated with a neighborhood) there are 57 with a minimum of less than .020 while 12 have a minimum of .100 or more and the average across minimums is .042. Looking at the maximum JI for each neighborhood, 3 have a maximum of .800 or higher; 6 have a maximum of .700 or higher, while 15 have a maximum of .500 or higher; 39 have a maximum of .200 or lower, with 15 having a maximum of .100 or lower, and the average across maximums was .286.

The three neighborhoods with the highest maximum JI are the neighborhoods which most closely correspond with their overlapping modularity classes. Fairywood is split across 2 modularity classes and has JI of .831 and .005, respectively: if not for that handful of road intersections falling into a second modularity class, it would be entirely contained within a single modularity class and have very close to a 1-to-1 relationship with that modularity class. Similarly high JIs also occur in neighborhoods split across more modularity classes; Swisshelm Park splits across 3 modularity classes but is most strongly related to one class, with a JI of .818 while Lincoln-Lemington-Belmar splits across 4 modularity classes and has a maximum JI of .812.

## Conclusions

The road network for the city of Pittsburgh shows that the city is less fragmented, from a transportation standpoint, than it is from an administrative standpoint, with approximately half the number of modularity classes as official neighborhoods. This result remains consistent even upon the regeneration of modularity classes. A handful of neighborhoods correspond well with a single modularity class, such as the border neighborhood of Fairywood, but the majority are part of modularity classes that encompass all or part of multiple neighborhoods. Inspection of specific neighborhoods shows that the modularity classes generally split and group the neighborhoods along lines that make sense when examined, but we should not overanalyze the assignment of a specific node to one modularity class or another given the element of randomness involved in the generation of the modularity classes. Major topographical features – such as rivers and parks – divide some of the modularity classes and are perceptible in the visualizations but do not form as strict borders between the modularity classes as they do for the official city neighborhoods. This is due, of course, to the structural role played in the network by Pittsburgh’s bridges.