Exporting Python Data to GeoJSON

I like to do my data wrangling and analysis work in Python, using the pandas library. I also use Python for much of my data visualization and simple mapping. But for interactive web maps, I usually use Leaflet. There isn’t dead-simple way to dump a pandas DataFrame with geographic data to something you can load with Leaflet. You could use GeoPandas to convert your DataFrame then dump it to GeoJSON, but that isn’t a very lightweight solution.

So, I wrote a simple reusable function to export any pandas DataFrame to GeoJSON:

Continue reading Exporting Python Data to GeoJSON

Urban Informatics and Visualization at UC Berkeley

The fall semester begins next week at UC Berkeley. For the third year in a row, Paul Waddell and I will be teaching CP255: Urban Informatics and Visualization, and this is my first year as co-lead instructor.

This masters-level course trains students to analyze urban data, develop indicators, conduct spatial analyses, create data visualizations, and build Paris open datainteractive web maps. To do this, we use the Python programming language, open source analysis and visualization tools, and public data.

This course is designed to provide future city planners with a toolkit of technical skills for quantitative problem solving. We don’t require any prior programming experience – we teach this from the ground up – but we do expect prior knowledge of basic statistics and GIS.

Update, September 2017: I am no longer a Berkeley GSI, but Paul’s class is ongoing. Check out his fantastic teaching materials in his GitHub repo. From my experiences here, I have developed a cycle of course materials, IPython notebooks, and tutorials towards an urban data science course based on Python, available in this GitHub repo.

Continue reading Urban Informatics and Visualization at UC Berkeley

Map Projections That Lie

How big is Greenland? It’s huge, right? At 836,109 square miles in size, Greenland is the largest island and the 12th largest country on Earth. With only 56,000 people living in that enormous area (80% of which is covered by the world’s only extant ice sheet outside of Antarctica), it is also the least densely populated country on Earth.

You can get a sense of how large Greenland is when you look at a map of the world:

world map mercator projection

It’s huge! Greenland is bigger than the entire continent of Africa! Or is it? The map above uses the common Mercator projection to project the 3-D surface of the Earth onto a 2-D surface suitable for a paper map or an image on your computer screen. But it’s not easy to project the curved surface of a sphere onto a rectangular plane. Compromises must be made. In the case of the Mercator projection, the compromise is that objects’ sizes become increasingly distorted the further they are from the equator. At the poles, the scale and distortion become infinite.

Continue reading Map Projections That Lie

Visualizing Craigslist Rental Listings

Our paper on collecting and analyzing U.S. housing rental markets through Craigslist rental listings has been accepted for publication by the Journal of Planning Education and Research. Check out the article here. This map of rental listings in the contiguous U.S. is divided into quintiles by rent per square foot:

Map of 1.5 million Craigslist rental listings in the contiguous U.S., divided into quintiles by each listing's rent per square foot
Map of 1.5 million Craigslist rental listings in the contiguous US, summer 2014

Visualizing Summer Travels Part 6: Projecting Spatial Data with Python

This post is part of a series on visualizing data from my summer travels.

I’ve previously discussed visualizing the GPS location data from my summer travels with CartoDB, Leaflet, and Mapbox + Tilemill. I also visualized different aspects of this data set in Python, using the matplotlib plotting library. However, these spatial scatter plots used unprojected lat-long data which looked pretty distorted at European latitudes.

Today I will show how to convert this data into a projected coordinate reference system and plot it again using matplotlib. These projected maps will provide a much more accurate spatial representation of my spatial data and the geographic region. All of my code is available in this GitHub repo, particularly this notebook.

Continue reading Visualizing Summer Travels Part 6: Projecting Spatial Data with Python

Using geopandas on Windows

projected-shapefile-gps-coordinatesThis guide was updated in June 2016 to reflect changes to the dependencies and the ability to install with Python wheels.

I recently went through the exercise of installing geopandas on Windows and getting it to run. Having learned several valuable lessons, I thought I’d share them with the world in case anyone else is trying to get this toolkit working in a Windows environment (also see this GitHub gist I put together).

It seems that pip installing geopandas works fine on Linux and Mac. However, several of its dependencies have C extensions that cause compilation failures with pip on Windows. This guide gets around that issue. For preliminaries, I have this working on Windows 7, 8, and 10. My Python environments are Anaconda, 64-bit, with both Python 2.7 and 3.5. I’m running geopandas version 0.2 with GDAL 2.0.2, Fiona 1.7.0, pyproj 1.9.5.1, and shapely 1.5.16.

Continue reading Using geopandas on Windows

Visualizing Summer Travels Part 5: Python + Matplotlib

This post is part of a series on visualizing data from my summer travels.

I’ve previously discussed visualizing the GPS location data from my summer travels with CartoDB, Leaflet, and Mapbox + Tilemill. Today I will explore visualizing this data set in Python, using the matplotlib plotting library. All of my code is available in this GitHub repo, particularly this notebook.

Continue reading Visualizing Summer Travels Part 5: Python + Matplotlib

Visualizing Summer Travels Part 4: Mapbox + Tilemill

This post is part of a series on visualizing data from my summer travels.

I’ve previously discussed my goals in visualizing GPS data from my summer travels and explored visualizing the data set with CartoDB and with Leaflet. The full OpenPaths location data from my summer travels is available here and I discussed how I reverse-geocoded it here.

Mapbox is a major provider of online web mapping services such as tiled web maps, the Tilemill cartography IDE, and the mapbox.js javascript library. Today I’ll run through how to create an interactive data map in Tilemill’s design studio, export the map as a set of tiles, upload the tileset to Mapbox, and then use a javascript client to display the map on a web page. Our final result will look something like this:

Continue reading Visualizing Summer Travels Part 4: Mapbox + Tilemill

Visualizing Summer Travels Part 3: Leaflet

This post is part of a series on visualizing data from my summer travels.

I’ve previously discussed my goals in visualizing GPS data from my summer travels and explored visualizing the data set with CartoDB. The full OpenPaths location data from my summer travels is available here and I discussed how I reverse-geocoded it here.

Lastly, I reduced the size of this spatial data set so Leaflet can render it more quickly on low-power mobile devices. I discussed why this is important and how to do it with the DBSCAN clustering algorithm and also with the Douglas-Peucker algorithm. The final data set I’ll be working with is available here.

Continue reading Visualizing Summer Travels Part 3: Leaflet

Reducing Spatial Data Set Size with Douglas-Peucker

In a previous post I discussed how to reduce the size of a spatial data set by clustering. Too many data points in a visualization can overwhelm the user and bog down on-the-fly client-side map rendering (for example, with a javascript tool like Leaflet). So, I used the DBSCAN clustering algorithm to reduce my data set from 1,759 rows to 158 spatially-representative points. This series of posts discusses this data set in depth.

Continue reading Reducing Spatial Data Set Size with Douglas-Peucker