Last.fm is a web site that tracks your music listening history across devices (computer, phone, iPod, etc) and services (Spotify, iTunes, Google Play, etc). I’ve been using Last.fm for nearly 10 years now, and my tracked listening history goes back even further when you consider all my pre-existing iTunes play counts that I scrobbled (ie, submitted to my Last.fm database) when I joined Last.fm.
Using Python, pandas, matplotlib, and leaflet, I downloaded my listening history from Last.fm’s API, analyzed and visualized the data, downloaded full artist details from the Musicbrainz API, then geocoded and mapped all the artists I’ve played. All of my code used to do this is available in this GitHub repo, and is easy to re-purpose for exploring your own Last.fm history. All you need is an API key.
First I visualized my most-played artists, above. Across the dataset, I have 279,769 scrobbles (aka, song plays). I’ve listened to 26,761 different artists and 66,377 different songs across 38,026 different albums from when I first started using iTunes circa 2005 through the present day. This includes pretty close to every song I’ve played on anything other than vinyl during that time. Continue reading Analyzing Last.fm Listening History
I started using Foursquare at the end of 2012 and kept with it even after it became the pointless muck that is Swarm. Since I’ve now got 4 years of location history (ie, check-ins) data, I decided to visualize and map it with Python, matplotlib, and basemap. The code is available in this GitHub repo. It’s easy to re-purpose to visualize your own check-in history: you just need to plug in your Foursquare OAuth token then run the notebook.
First the notebook downloads all my check-ins from the Foursquare API. Then I mapped all of them, using matplotlib basemap.
A guide to setting up the Python scientific stack, well-suited for geospatial analysis, on a Raspberry Pi 3. The whole process takes just a few minutes.
The Raspberry Pi 3 was announced two weeks ago and presents a substantial step up in computational power over its predecessors. It can serve as a functional Wi-Fi connected Linux desktop computer, albeit underpowered. However it’s perfectly capable of running the Python scientific computing stack including Jupyter, pandas, matplotlib, scipy, scikit-learn, and OSMnx.
Despite (or because of?) its low power, it’s ideal for low-overhead and repetitive tasks that researchers and engineers often face, including geocoding, web scraping, scheduled API calls, or recurring statistical or spatial analyses (with small-ish data sets). It’s also a great way to set up a simple server or experiment with Linux. This guide is aimed at newcomers to the world of Raspberry Pi and Linux, but who have an interest in setting up a Python environment on these $35 credit card sized computers. We’ll run through everything you need to do to get started (if your Pi is already up and running, skip steps 1 and 2). Continue reading Scientific Python for Raspberry Pi
The U.N. world population prospects data set depicts the U.N.’s projections for every country’s population, decade by decade through 2100. The 2015 revision was recently released, and I analyzed, visualized, and mapped the data (methodology and code described below).
The world population is expected to grow from about 7.3 billion people today to 11.2 billion in 2100. While the populations of Eastern Europe, Taiwan, and Japan are projected to decline significantly over the 21st century, the U.N. projects Africa’s population to grow by an incredible 3.2 billion people. This map depicts each country’s projected percentage change in population from 2015 to 2100:
The fall semester begins next week at UC Berkeley. For the third year in a row, Paul Waddell and I will be teaching CP255: Urban Informatics and Visualization.
This masters-level course trains students to analyze urban data, develop indicators, conduct spatial analyses, create data visualizations, and build interactive web maps. To do this, we use the Python programming language, open source analysis and visualization tools, and public data.
This course is designed to provide future city planners with a toolkit of technical skills for quantitative problem solving. We don’t require any prior programming experience – we teach this from the ground up – but we do expect prior knowledge of basic statistics and GIS.
Our teaching materials, including IPython Notebooks, tutorials, and guides are available in this GitHub repo, updated as the semester progresses.
Our paper on collecting and analyzing U.S. housing rental markets through Craigslist rental listings has been accepted for publication by the Journal of Planning Education and Research. Check out the article here. This map of rental listings in the contiguous U.S. is divided into quintiles by rent per square foot:
I’ve previously discussed visualizing the GPS location data from my summer travels with CartoDB, Leaflet, and Mapbox + Tilemill. I also visualized different aspects of this data set in Python, using the matplotlib plotting library. However, these spatial scatter plots used unprojected lat-long data which looked pretty distorted at European latitudes.
Today I will show how to convert this data into a projected coordinate reference system and plot it again using matplotlib. These projected maps will provide a much more accurate spatial representation of my spatial data and the geographic region. All of my code is available in this GitHub repo, particularly this notebook.
This guide was updated in June 2016 to reflect changes to the dependencies and the ability to install with Python wheels.
I recently went through the exercise of installing geopandas on Windows and getting it to run. Having learned several valuable lessons, I thought I’d share them with the world in case anyone else is trying to get this toolkit working in a Windows environment (also see this GitHub gist I put together).
It seems that pip installing geopandas works fine on Linux and Mac. However, several of its dependencies have C extensions that cause compilation failures with pip on Windows. This guide gets around that issue. For preliminaries, I have this working on Windows 7, 8, and 10. My Python environments are Anaconda, 64-bit, with both Python 2.7 and 3.5. I’m running geopandas version 0.2 with GDAL 2.0.2, Fiona 1.7.0, pyproj 126.96.36.199, and shapely 1.5.16.
This is a series of posts about visualizing spatial data. I spent a couple of months traveling in Europe this summer and collected GPS location data throughout the trip with the OpenPaths app. I explored different web mapping technologies such as CartoDB, Leaflet, Mapbox, and Tilemill to plot my travels. I also used Python and matplotlib to run some descriptive statistics and visualize other aspects of my trip.
My Python code is available in this GitHub repo. I also did some more involved work under the hood to prep the data and support these visualizations. For example, in the following posts I reverse-geocoded the spatial data set and reduced its size with clustering algorithms and the Douglas-Peucker algorithm:
I’ve previously discussed visualizing the GPS location data from my summer travels with CartoDB, Leaflet, and Mapbox + Tilemill. Today I will explore visualizing this data set in Python, using the matplotlib plotting library. All of my code is available in this GitHub repo, particularly this notebook.