R-tree Spatial Indexing with Python

r-tree spatial index with python geopandas: Thumbnail of Walnut Creek, California city boundary and street intersections inside and outside city limits Check out the journal article about OSMnx, which implements this technique.

A spatial index such as R-tree can drastically speed up GIS operations like intersections and joins. Spatial indices are key features of spatial databases like PostGIS, but they’re also available for DIY coding in Python. I’ll introduce how R-trees work and how to use them in Python and its geopandas library. All of my code is in this notebook in this urban data science GitHub repo.

Continue reading R-tree Spatial Indexing with Python

Craigslist and U.S. Rental Housing Markets

This is a summary of our JPER journal article (available here) about Craigslist rental listings’ insights into U.S. housing markets.

Small map of 1.5 million Craigslist rental listings in the contiguous U.S., divided into quintiles by each listing's rent per square footRentals make up a significant portion of the U.S. housing market, but much of this market activity is poorly understood due to its informal characteristics and historically minimal data trail. The UC Berkeley Urban Analytics Lab collected, validated, and analyzed 11 million Craigslist rental listings to discover fine-grained patterns across metropolitan housing markets in the United States. I’ll summarize our findings below and explain the methodology at the bottom.

But first, 4 key takeaways:

  1. There are incredibly few rental units below fair market rent in the hottest housing markets. Some metro areas like New York and Boston have only single-digit percentages of Craigslist rental listings below fair market rent. That’s really low.
  2. This problem doesn’t exclusively affect the poor: the share of its income that the typical household would spend on the typical rent in cities like New York and San Francisco exceeds the threshold for “rent burden.”
  3. Rents are more “compressed” in soft markets. For example, in Detroit, most of the listed units are concentrated within a very narrow band of rent/ft² values, but in San Francisco rents are much more dispersed. Housing vouchers may end up working very differently in high-cost vs low-cost areas.
  4. Craigslist listings correspond reasonably well with Dept of Housing and Urban Development (HUD) estimates, but provide up-to-date data including unit characteristics, from neighborhood to national scales. For example, we can see how rents are changing, neighborhood by neighborhood, in San Francisco in a given month.

Continue reading Craigslist and U.S. Rental Housing Markets

How to Visualize Urban Accessibility and Walkability

Tools like WalkScore visualize how “walkable” a neighborhood is in terms of access to different amenities like parks, schools, or restaurants. It’s easy to create accessibility visualizations like these ad hoc with Python and its pandana library. Pandana (pandas for network analysis – developed by Fletcher Foti during his dissertation research here at UC Berkeley) performs fast accessibility queries over a network. I’ll demonstrate how to use it to visualize urban walkability. My code is in these IPython notebooks in this urban data science course GitHub repo.

First I give pandana a bounding box around Berkeley/Oakland in the East Bay of the San Francisco Bay Area. Then I load the street network and amenities from OpenStreetMap. In this example I’ll look at accessibility to restaurants, bars, and schools. But, you can create any basket of amenities that you are interested in – basically visualizing a personalized “AnythingScore” instead of a generic WalkScore for everyone. Finally I calculate and plot the distance from each node in the network to the nearest amenity:

Berkeley Oakland California street network walking accessibility and walkability Continue reading How to Visualize Urban Accessibility and Walkability

Urban Informatics and Visualization at UC Berkeley

The fall semester begins next week at UC Berkeley. For the third year in a row, Paul Waddell and I will be teaching CP255: Urban Informatics and Visualization, and this is my first year as co-lead instructor.

This masters-level course trains students to analyze urban data, develop indicators, conduct spatial analyses, create data visualizations, and build Paris open datainteractive web maps. To do this, we use the Python programming language, open source analysis and visualization tools, and public data.

This course is designed to provide future city planners with a toolkit of technical skills for quantitative problem solving. We don’t require any prior programming experience – we teach this from the ground up – but we do expect prior knowledge of basic statistics and GIS.

Update, September 2017: I am no longer a Berkeley GSI, but Paul’s class is ongoing. Check out his fantastic teaching materials in his GitHub repo. From my experiences here, I have developed a cycle of course materials, IPython notebooks, and tutorials towards an urban data science course based on Python, available in this GitHub repo.

Continue reading Urban Informatics and Visualization at UC Berkeley

Visualizing Chaos and Randomness

3-D Poincare plot of the logistic map's chaotic regime - this is time series data embedded in three dimensional state space

Download/cite the paper here!

In a previous post, I discussed chaos theory, fractals, and strange attractors – and their implications for knowledge and prediction of systems. I also briefly touched on how phase diagrams (or Poincaré plots) can help us visualize system attractors and differentiate chaotic behavior from true randomness.

In this post (adapted from this paper), I provide more detail on constructing and interpreting phase diagrams. These methods are particularly useful for discovering deterministic chaos in otherwise random-appearing time series data, as they visualize strange attractors. I’m using Python for all of these visualizations and the source code is available in this GitHub repo.

Continue reading Visualizing Chaos and Randomness

Chaos Theory and the Logistic Map

Logistic map bifurcation diagram showing the period-doubling path to chaosUsing Python to visualize chaos, fractals, and self-similarity to better understand the limits of knowledge and prediction. Download/cite the article here and try pynamical yourself.

Chaos theory is a branch of mathematics that deals with nonlinear dynamical systems. A system is just a set of interacting components that form a larger whole. Nonlinear means that due to feedback or multiplicative effects between the components, the whole becomes something greater than just adding up the individual parts. Lastly, dynamical means the system changes over time based on its current state. In the following piece (adapted from this article), I break down some of this jargon, visualize interesting characteristics of chaos, and discuss its implications for knowledge and prediction.

Chaotic systems are a simple sub-type of nonlinear dynamical systems. They may contain very few interacting parts and these may follow very simple rules, but these systems all have a very sensitive dependence on their initial conditions. Despite their deterministic simplicity, over time these systems can produce totally unpredictable and wildly divergent (aka, chaotic) behavior. Edward Lorenz, the father of chaos theory, described chaos as “when the present determines the future, but the approximate present does not approximately determine the future.”

Continue reading Chaos Theory and the Logistic Map

Using geopandas on Windows

projected-shapefile-gps-coordinatesThis guide was updated in June 2016 to reflect changes to the dependencies and the ability to install with Python wheels.

I recently went through the exercise of installing geopandas on Windows and getting it to run. Having learned several valuable lessons, I thought I’d share them with the world in case anyone else is trying to get this toolkit working in a Windows environment (also see this GitHub gist I put together).

It seems that pip installing geopandas works fine on Linux and Mac. However, several of its dependencies have C extensions that cause compilation failures with pip on Windows. This guide gets around that issue. For preliminaries, I have this working on Windows 7, 8, and 10. My Python environments are Anaconda, 64-bit, with both Python 2.7 and 3.5. I’m running geopandas version 0.2 with GDAL 2.0.2, Fiona 1.7.0, pyproj 1.9.5.1, and shapely 1.5.16.

Continue reading Using geopandas on Windows