Check out the journal article about OSMnx.
OSMnx is a Python package for downloading administrative boundary shapes and street networks from OpenStreetMap. It allows you to easily construct, project, visualize, and analyze complex street networks in Python with NetworkX. You can get a city’s or neighborhood’s walking, driving, or biking network with a single line of Python code. Then you can simply visualize cul-de-sacs or one-way streets, plot shortest-path routes, or calculate stats like intersection density, average node connectivity, or betweenness centrality. You can download/cite the paper here.
In a single line of code, OSMnx lets you download, construct, and visualize the street network for, say, Modena Italy:
import osmnx as ox
Continue reading OSMnx: Python for Street Networks
Check out the journal article about OSMnx, which implements this technique.
A spatial index such as R-tree can drastically speed up GIS operations like intersections and joins. Spatial indices are key features of spatial databases like PostGIS, but they’re also available for DIY coding in Python. I’ll introduce how R-trees work and how to use them in Python and its geopandas library. All of my code is in this notebook in this urban data science GitHub repo.
Continue reading R-tree Spatial Indexing with Python
A few months ago, I wrote about the large investments that U.S. universities are making in their football stadiums. This also included a visual analysis of stadium capacity around the country. Outside of North Korea, the 8 largest stadiums in the world are college football stadiums, and the 15 largest college football stadiums are larger than any NFL stadium.
I received a few comments interested in further analysis of the actual attendance of games held in these stadiums. While capacity is interesting because it represents an expectation and sustained investment by the school, attendance represents the utilization of that investment. My stadium capacity data covered every NCAA division I football stadium in the U.S. as of the 2015 college football season. So, I downloaded the NCAA’s 2015 home game attendance data to compare. My data, code, and analysis are in this GitHub repo. First, I visualized the FBS attendance figures themselves:
Continue reading College Football Stadium Attendance
This is a summary of our JPER journal article (available here) about Craigslist rental listings’ insights into U.S. housing markets.
Rentals make up a significant portion of the U.S. housing market, but much of this market activity is poorly understood due to its informal characteristics and historically minimal data trail. The UC Berkeley Urban Analytics Lab collected, validated, and analyzed 11 million Craigslist rental listings to discover fine-grained patterns across metropolitan housing markets in the United States. I’ll summarize our findings below and explain the methodology at the bottom.
But first, 4 key takeaways:
- There are incredibly few rental units below fair market rent in the hottest housing markets. Some metro areas like New York and Boston have only single-digit percentages of Craigslist rental listings below fair market rent. That’s really low.
- This problem doesn’t exclusively affect the poor: the share of its income that the typical household would spend on the typical rent in cities like New York and San Francisco exceeds the threshold for “rent burden.”
- Rents are more “compressed” in soft markets. For example, in Detroit, most of the listed units are concentrated within a very narrow band of rent/ft² values, but in San Francisco rents are much more dispersed. Housing vouchers may end up working very differently in high-cost vs low-cost areas.
- Craigslist listings correspond reasonably well with Dept of Housing and Urban Development (HUD) estimates, but provide up-to-date data including unit characteristics, from neighborhood to national scales. For example, we can see how rents are changing, neighborhood by neighborhood, in San Francisco in a given month.
Continue reading Craigslist and U.S. Rental Housing Markets
Tools like WalkScore visualize how “walkable” a neighborhood is in terms of access to different amenities like parks, schools, or restaurants. It’s easy to create accessibility visualizations like these ad hoc with Python and its pandana library. Pandana (pandas for network analysis – developed by Fletcher Foti during his dissertation research here at UC Berkeley) performs fast accessibility queries over a network. I’ll demonstrate how to use it to visualize urban walkability. My code is in these IPython notebooks in this urban data science course GitHub repo.
First I give pandana a bounding box around Berkeley/Oakland in the East Bay of the San Francisco Bay Area. Then I load the street network and amenities from OpenStreetMap. In this example I’ll look at accessibility to restaurants, bars, and schools. But, you can create any basket of amenities that you are interested in – basically visualizing a personalized “AnythingScore” instead of a generic WalkScore for everyone. Finally I calculate and plot the distance from each node in the network to the nearest amenity:
Continue reading How to Visualize Urban Accessibility and Walkability
Also check out this follow-up analysis of stadium attendance.
The 2016 college football championship game between Clemson and Alabama was held at University of Phoenix Stadium, where the NFL’s Arizona Cardinals play. Interestingly, this NFL (ironic, given its name) stadium is considerably smaller than the home stadiums of either Clemson or Alabama. In fact every NFL stadium is considerably smaller than the largest college stadiums. Outside of North Korea, the 8 largest stadiums in the world are college football stadiums, and the 15 largest college football stadiums are larger than any NFL stadium.
Americans are obsessed with college football, but how much is too much? Today most athletic departments are subsidized by their schools. Public universities increased their annual football spending by $1.8 billion between 2009-2013 while racking up huge debts to finance stadiums with little chance of profit. This interactive map shows each NCAA Division I college football team’s home stadium: collectively they seat 8.5 million people. Click any point for details about stadium capacity and year built:
Continue reading America’s College Football Stadiums
The U.N. world population prospects data set depicts the U.N.’s projections for every country’s population, decade by decade through 2100. The 2015 revision was recently released, and I analyzed, visualized, and mapped the data (methodology and code described below).
The world population is expected to grow from about 7.3 billion people today to 11.2 billion in 2100. While the populations of Eastern Europe, Taiwan, and Japan are projected to decline significantly over the 21st century, the U.N. projects Africa’s population to grow by an incredible 3.2 billion people. This map depicts each country’s projected percentage change in population from 2015 to 2100:
Continue reading World Population Projections
I am presenting at the 2015 Conference on Complex Systems tomorrow in Tempe, Arizona. My paper is on methods for assessing the complexity of urban design. If you’re attending the conference, come on by!
Here’s the paper.
Here’s the abstract:
Continue reading Urban Design and Complexity
The fall semester begins next week at UC Berkeley. For the third year in a row, Paul Waddell and I will be teaching CP255: Urban Informatics and Visualization, and this is my first year as co-lead instructor.
This masters-level course trains students to analyze urban data, develop indicators, conduct spatial analyses, create data visualizations, and build interactive web maps. To do this, we use the Python programming language, open source analysis and visualization tools, and public data.
This course is designed to provide future city planners with a toolkit of technical skills for quantitative problem solving. We don’t require any prior programming experience – we teach this from the ground up – but we do expect prior knowledge of basic statistics and GIS.
Update, September 2017: I am no longer a Berkeley GSI, but Paul’s class is ongoing. Check out his fantastic teaching materials in his GitHub repo. From my experiences here, I have developed a cycle of course materials, IPython notebooks, and tutorials towards an urban data science course based on Python, available in this GitHub repo.
Continue reading Urban Informatics and Visualization at UC Berkeley