Category: Data

Data

Animating the Lorenz Attractor with Python

Post author By gboeing
Post date 2016-12-30
3 Comments on Animating the Lorenz Attractor with Python

Edward Lorenz, the father of chaos theory, once described chaos as “when the present determines the future, but the approximate present does not approximately determine the future.”

Lorenz first discovered chaos by accident while developing a simple mathematical model of atmospheric convection, using three ordinary differential equations. He found that nearly indistinguishable initial conditions could produce completely divergent outcomes, rendering weather prediction impossible beyond a time horizon of about a fortnight.

Tags animation, chaos, complex systems, complexity, data, math, matplotlib, modeling, numpy, pandas, python, science, scipy, theory, visualization

Data

R-tree Spatial Indexing with Python

Post author By gboeing
Post date 2016-10-24
21 Comments on R-tree Spatial Indexing with Python

Check out the journal article about OSMnx, which implements this technique.

A spatial index such as R-tree can drastically speed up GIS operations like intersections and joins. Spatial indices are key features of spatial databases like PostGIS, but they’re also available for DIY coding in Python. I’ll introduce how R-trees work and how to use them in Python and its geopandas library. All of my code is in this notebook in this urban data science GitHub repo.

Tags city, data, data science, geopandas, geospatial, gis, maps, matplotlib, modeling, pandas, projection, python, r-tree, science, shapely, spatial index, tutorial, urban, urban planning, visualization

Data

College Football Stadium Attendance

Post author By gboeing
Post date 2016-09-30
1 Comment on College Football Stadium Attendance

A few months ago, I wrote about the large investments that U.S. universities are making in their football stadiums. This also included a visual analysis of stadium capacity around the country. Outside of North Korea, the 8 largest stadiums in the world are college football stadiums, and the 15 largest college football stadiums are larger than any NFL stadium.

I received a few comments interested in further analysis of the actual attendance of games held in these stadiums. While capacity is interesting because it represents an expectation and sustained investment by the school, attendance represents the utilization of that investment. My stadium capacity data covered every NCAA division I football stadium in the U.S. as of the 2015 college football season. So, I downloaded the NCAA’s 2015 home game attendance data to compare. My data, code, and analysis are in this GitHub repo. First, I visualized the FBS attendance figures themselves:

Tags academia, data, data science, football, land use, ncaa, pandas, planning, python, stadiums, urban, urban planning, visualization

Data

Mapping Everywhere I’ve Ever Been in My Life

Post author By gboeing
Post date 2016-06-27
3 Comments on Mapping Everywhere I’ve Ever Been in My Life

I recently wrote about visualizing my Foursquare check-in history and mapping my Google location history, and it inspired me to mount a more substantial project: mapping everywhere I’ve ever been in my life (!!). I’ve got 4 years of Foursquare check-ins and Google location history data. For everything pre-smart phone, I typed up a simple spreadsheet of places I’d visited in the past and then geocoded it with the Google Maps API. All my Python and Leaflet code is available in this GitHub repo and is easy to re-purpose to visualize your own location history.

I’ll show the maps first, then run through the process I followed, below. First off, I used Python and matplotlib basemap to create this map of everywhere I’ve ever been:

Tags basemap, berkeley, clustering, data, data science, dbscan, foursquare, geocoding, geospatial, gis, google, javascript, leaflet, maps, matplotlib, pandas, projection, python, scikit-learn, travel, tutorial, visualization

Data

Mapping Your Google Location History with Python

Post author By gboeing
Post date 2016-06-21
3 Comments on Mapping Your Google Location History with Python

Small map of my Google location history data in the San Francisco Bay Area, 2012-2016 I recently wrote about visualizing my Foursquare check-in history and it inspired me to map my entire Google location history data – about 1.2 million GPS coordinates from my Android phone between 2012 and 2016. I used Python and its pandas, matplotlib, and basemap libraries. The Python code is available in this notebook in this GitHub repo, and it’s simple to re-use to visualize your own location history.

Just download your JSON file from Google then run the code. First I load the JSON file and parse the latitude, longitude, and timestamp with pandas. Then I map my worldwide data set:

Tags android, basemap, berkeley, data, geospatial, gis, google, gps, maps, matplotlib, nexus, pandas, projection, python, travel, tutorial, visualization

Data

Analyzing Last.fm Listening History

Post author By gboeing
Post date 2016-05-09
8 Comments on Analyzing Last.fm Listening History

Last.fm is a web site that tracks your music listening history across devices (computer, phone, iPod, etc) and services (Spotify, iTunes, Google Play, etc). I’ve been using Last.fm for nearly 10 years now, and my tracked listening history goes back even further when you consider all my pre-existing iTunes play counts that I scrobbled (ie, submitted to my Last.fm database) when I joined Last.fm.

Using Python, pandas, matplotlib, and leaflet, I downloaded my listening history from Last.fm’s API, analyzed and visualized the data, downloaded full artist details from the Musicbrainz API, then geocoded and mapped all the artists I’ve played. All of my code used to do this is available in this GitHub repo, and is easy to re-purpose for exploring your own Last.fm history. All you need is an API key.

First I visualized my most-played artists, above. Across the dataset, I have 279,769 scrobbles (aka, song plays). I’ve listened to 26,761 different artists and 66,377 different songs across 38,026 different albums from when I first started using iTunes circa 2005 through the present day. This includes pretty close to every song I’ve played on anything other than vinyl during that time.

Tags basemap, data, geocoding, geospatial, gis, lastfm, maps, matplotlib, music, pandas, projection, python, tutorial, visualization

Data

Visualize Foursquare Location History

Post author By gboeing
Post date 2016-04-11
4 Comments on Visualize Foursquare Location History

I started using Foursquare at the end of 2012 and kept with it even after it became the pointless muck that is Swarm. Since I’ve now got 4 years of location history (ie, check-ins) data, I decided to visualize and map it with Python, matplotlib, and basemap. The code is available in this GitHub repo. It’s easy to re-purpose to visualize your own check-in history: you just need to plug in your Foursquare OAuth token then run the notebook.

First the notebook downloads all my check-ins from the Foursquare API. Then I mapped all of them, using matplotlib basemap.

Map of Foursquare Swarm check-in location history

Tags basemap, berkeley, data, foursquare, geospatial, gis, maps, matplotlib, pandas, projection, python, shapely, swarm, travel, tutorial, visualization

Data

Visualizing a Gmail Inbox

Google Takeout lets you download an archive of your data from various Google products. I downloaded my Gmail archive as an mbox file and visualized all of my personal Gmail account traffic since signing up back in July 2004. This analysis excludes work and school email traffic (as well as my other Gmail account for signing up for web sites and services), as I have separate dedicated email accounts for each. It also excludes the Hangouts/chats that Google includes in your mbox archive. So, this analysis just covers personal communication.

This also demonstrates working with time series in Python and pandas. All of my code is on GitHub as an IPython notebook. You can re-purpose it for your own inbox – just download your Gmail archive then run my code.

Tags data, gmail, google, matplotlib, pandas, python, visualization

Data

America’s College Football Stadiums

Post author By gboeing
Post date 2016-01-10
7 Comments on America’s College Football Stadiums

Also check out this follow-up analysis of stadium attendance.

The 2016 college football championship game between Clemson and Alabama was held at University of Phoenix Stadium, where the NFL’s Arizona Cardinals play. Interestingly, this NFL (ironic, given its name) stadium is considerably smaller than the home stadiums of either Clemson or Alabama. In fact every NFL stadium is considerably smaller than the largest college stadiums. Outside of North Korea, the 8 largest stadiums in the world are college football stadiums, and the 15 largest college football stadiums are larger than any NFL stadium.

Americans are obsessed with college football, but how much is too much? Today most athletic departments are subsidized by their schools. Public universities increased their annual football spending by $1.8 billion between 2009-2013 while racking up huge debts to finance stadiums with little chance of profit. This interactive map shows each NCAA Division I college football team’s home stadium: collectively they seat 8.5 million people. Click any point for details about stadium capacity and year built:

Tags college, data, football, maps, matplotlib, ncaa, python, stadiums, university, urban planning, visualization

Data

World Population Projections

Post author By gboeing
Post date 2015-12-20
1 Comment on World Population Projections

The U.N. world population prospects data set depicts the U.N.’s projections for every country’s population, decade by decade through 2100. The 2015 revision was recently released, and I analyzed, visualized, and mapped the data (methodology and code described below).

The world population is expected to grow from about 7.3 billion people today to 11.2 billion in 2100. While the populations of Eastern Europe, Taiwan, and Japan are projected to decline significantly over the 21st century, the U.N. projects Africa’s population to grow by an incredible 3.2 billion people. This map depicts each country’s projected percentage change in population from 2015 to 2100:

Tags africa, basemap, cities, data, data science, geospatial, gis, global south, maps, matplotlib, pandas, population, python, united nations, urban planning, visualization