OSMnx is a Python package for downloading administrative boundary shapes and street networks from OpenStreetMap. It allows you to easily construct, project, visualize, and analyze complex street networks in Python with NetworkX. You can get a city’s or neighborhood’s walking, driving, or biking network with a single line of Python code. Then you can simply visualize cul-de-sacs or one-way streets, plot shortest-path routes, or calculate stats like intersection density, average node connectivity, or betweenness centrality. You can download/cite the paper here.
In a single line of code, OSMnx lets you download, construct, and visualize the street network for, say, Modena Italy:
import osmnx as ox
A few months ago, I wrote about the large investments that U.S. universities are making in their football stadiums. This also included a visual analysis of stadium capacity around the country. Outside of North Korea, the 8 largest stadiums in the world are college football stadiums, and the 15 largest college football stadiums are larger than any NFL stadium.
I received a few comments interested in further analysis of the actual attendance of games held in these stadiums. While capacity is interesting because it represents an expectation and sustained investment by the school, attendance represents the utilization of that investment. My stadium capacity data covered every NCAA division I football stadium in the U.S. as of the 2015 college football season. So, I downloaded the NCAA’s 2015 home game attendance data to compare. My data, code, and analysis are in this GitHub repo. First, I visualized the FBS attendance figures themselves:
This is a summary of our JPER journal article (available here) about Craigslist rental listings’ insights into U.S. housing markets.
Rentals make up a significant portion of the U.S. housing market, but much of this market activity is poorly understood due to its informal characteristics and historically minimal data trail. The UC Berkeley Urban Analytics Lab collected, validated, and analyzed 11 million Craigslist rental listings to discover fine-grained patterns across metropolitan housing markets in the United States. I’ll summarize our findings below and explain the methodology at the bottom.
But first, 4 key takeaways:
There are incredibly few rental units below fair market rent in the hottest housing markets. Some metro areas like New York and Boston have only single-digit percentages of Craigslist rental listings below fair market rent. That’s really low.
This problem doesn’t exclusively affect the poor: the share of its income that the typical household would spend on the typical rent in cities like New York and San Francisco exceeds the threshold for “rent burden.”
Rents are more “compressed” in soft markets. For example, in Detroit, most of the listed units are concentrated within a very narrow band of rent/ft² values, but in San Francisco rents are much more dispersed. Housing vouchers may end up working very differently in high-cost vs low-cost areas.
Craigslist listings correspond reasonably well with Dept of Housing and Urban Development (HUD) estimates, but provide up-to-date data including unit characteristics, from neighborhood to national scales. For example, we can see how rents are changing, neighborhood by neighborhood, in San Francisco in a given month.
Tools like WalkScore visualize how “walkable” a neighborhood is in terms of access to different amenities like parks, schools, or restaurants. It’s easy to create accessibility visualizations like these ad hoc with Python and its pandana library. Pandana (pandas for network analysis – developed by Fletcher Foti during his dissertation research here at UC Berkeley) performs fast accessibility queries over a network. I’ll demonstrate how to use it to visualize urban walkability. My code is in these IPython notebooks in this urban data science course GitHub repo.
First I give pandana a bounding box around Berkeley/Oakland in the East Bay of the San Francisco Bay Area. Then I load the street network and amenities from OpenStreetMap. In this example I’ll look at accessibility to restaurants, bars, and schools. But, you can create any basket of amenities that you are interested in – basically visualizing a personalized “AnythingScore” instead of a generic WalkScore for everyone. Finally I calculate and plot the distance from each node in the network to the nearest amenity:
The fall semester begins next week at UC Berkeley. For the third year in a row, Paul Waddell and I will be teaching CP255: Urban Informatics and Visualization, and this is my first year as co-lead instructor.
This masters-level course trains students to analyze urban data, develop indicators, conduct spatial analyses, create data visualizations, and build interactive web maps. To do this, we use the Python programming language, open source analysis and visualization tools, and public data.
This course is designed to provide future city planners with a toolkit of technical skills for quantitative problem solving. We don’t require any prior programming experience – we teach this from the ground up – but we do expect prior knowledge of basic statistics and GIS.
Update, September 2017: I am no longer a Berkeley GSI, but Paul’s class is ongoing. Check out his fantastic teaching materials in his GitHub repo. From my experiences here, I have developed a cycle of course materials, IPython notebooks, and tutorials towards an urban data science course based on Python, available in this GitHub repo.
I recently completed my inside field exam, one of the many steps involved in advancing to candidacy. The three professors on your inside field committee send you six questions – a pair per professor – and you are given 72 hours total to answer one question from each pair. The answers are to be in the form of a scholarly article with thorough citations. Long story short, you’ve got to write 30 pages of academic scholarship in three days.
The exam questions themselves are very interesting. The professors construct them based on their reading of your inside field statement, trying to probe areas that might be particularly rich or a bit weak in the statement. Here are the questions I answered:
The Department of City and Regional Planning at UC Berkeley has a rather arduous process for advancing to candidacy in the PhD program. It essentially consists of 6 parts:
Take all the required courses
Produce an inside field statement – a sort of literature review and synthesis explaining the niche within urban planning in which you will be positioning your dissertation research
Complete an outside field – sort of like what a minor was in college
Take an inside field written exam
Produce a defensible dissertation prospectus
Take an oral comprehensive exam covering your inside field, your outside field, general planning theory and history, and finally presenting your prospectus.
Whew. Lots to do this year. The good news is I am currently wrapping up my inside field statement and preparing to take the inside field exam. My topic is generally around complexity theory in urban planning. Here is the working abstract from my statement:
Our paper on collecting and analyzing U.S. housing rental markets through Craigslist rental listings has been accepted for publication by the Journal of Planning Education and Research. Check out the article here. This map of rental listings in the contiguous U.S. is divided into quintiles by rent per square foot: