Tag: pandas

Tech

Scientific Python for Raspberry Pi

Post author By gboeing
Post date 2016-03-14
25 Comments on Scientific Python for Raspberry Pi

A guide to setting up the Python scientific stack, well-suited for geospatial analysis, on a Raspberry Pi 3. The whole process takes just a few minutes.

The Raspberry Pi 3 was announced two weeks ago and presents a substantial step up in computational power over its predecessors. It can serve as a functional Wi-Fi connected Linux desktop computer, albeit underpowered. However it’s perfectly capable of running the Python scientific computing stack including Jupyter, pandas, matplotlib, scipy, scikit-learn, and OSMnx.

Despite (or because of?) its low power, it’s ideal for low-overhead and repetitive tasks that researchers and engineers often face, including geocoding, web scraping, scheduled API calls, or recurring statistical or spatial analyses (with small-ish data sets). It’s also a great way to set up a simple server or experiment with Linux. This guide is aimed at newcomers to the world of Raspberry Pi and Linux, but who have an interest in setting up a Python environment on these $35 credit card sized computers. We’ll run through everything you need to do to get started (if your Pi is already up and running, skip steps 1 and 2).

Tags api, basemap, data, data science, geocoding, geopandas, geopy, geospatial, iot, ipython, jupyter, linux, matplotlib, numpy, pandas, pyproj, raspberry pi, raspbian, science, scikit-learn, scipy, scrapy, shapely, statistics, statsmodels, web scraping

Data

Visualizing a Gmail Inbox

Google Takeout lets you download an archive of your data from various Google products. I downloaded my Gmail archive as an mbox file and visualized all of my personal Gmail account traffic since signing up back in July 2004. This analysis excludes work and school email traffic (as well as my other Gmail account for signing up for web sites and services), as I have separate dedicated email accounts for each. It also excludes the Hangouts/chats that Google includes in your mbox archive. So, this analysis just covers personal communication.

This also demonstrates working with time series in Python and pandas. All of my code is on GitHub as an IPython notebook. You can re-purpose it for your own inbox – just download your Gmail archive then run my code.

Tags data, gmail, google, matplotlib, pandas, python, visualization

Data

World Population Projections

Post author By gboeing
Post date 2015-12-20
1 Comment on World Population Projections

The U.N. world population prospects data set depicts the U.N.’s projections for every country’s population, decade by decade through 2100. The 2015 revision was recently released, and I analyzed, visualized, and mapped the data (methodology and code described below).

The world population is expected to grow from about 7.3 billion people today to 11.2 billion in 2100. While the populations of Eastern Europe, Taiwan, and Japan are projected to decline significantly over the 21st century, the U.N. projects Africa’s population to grow by an incredible 3.2 billion people. This map depicts each country’s projected percentage change in population from 2015 to 2100:

Tags africa, basemap, cities, data, data science, geospatial, gis, global south, maps, matplotlib, pandas, population, python, united nations, urban planning, visualization

Data

The Landscape of U.S. Rents

Post author By gboeing
Post date 2015-11-19
3 Comments on The Landscape of U.S. Rents

Which U.S. cities are the most expensive for rental housing? Where are rents rising the fastest? The American Community Survey (ACS) recently released its latest batch of 1-year data and I analyzed, mapped, and visualized it. My methodology is below, and my code and data are in this GitHub repo.

This interactive map shows median rents across the U.S. for every metro/micropolitan area. Click any one for details on population, rent, and change over time. Click “switch” to re-draw the map to visualize how median rents have risen since 2010:

Tags basemap, census, cities, data, gis, housing, javascript, leaflet, maps, matplotlib, numpy, pandas, population, python, rents, statsmodels, united states

Data

Exporting Python Data to GeoJSON

Post author By gboeing
Post date 2015-10-31
11 Comments on Exporting Python Data to GeoJSON

I like to do my data wrangling and analysis work in Python, using the pandas library. I also use Python for much of my data visualization and simple mapping. But for interactive web maps, I usually use Leaflet. There isn’t dead-simple way to dump a pandas DataFrame with geographic data to something you can load with Leaflet. You could use GeoPandas to convert your DataFrame then dump it to GeoJSON, but that isn’t a very lightweight solution.

So, I wrote a simple reusable function to export any pandas DataFrame to GeoJSON:

Tags api, data, geojson, gis, github, json, jupyter, leaflet, maps, pandas, python

Academia

Urban Informatics and Visualization at UC Berkeley

Post author By gboeing
Post date 2015-08-20
10 Comments on Urban Informatics and Visualization at UC Berkeley

The fall semester begins next week at UC Berkeley. For the third year in a row, Paul Waddell and I will be teaching CP255: Urban Informatics and Visualization, and this is my first year as co-lead instructor.

This masters-level course trains students to analyze urban data, develop indicators, conduct spatial analyses, create data visualizations, and build Paris open data interactive web maps. To do this, we use the Python programming language, open source analysis and visualization tools, and public data.

This course is designed to provide future city planners with a toolkit of technical skills for quantitative problem solving. We don’t require any prior programming experience – we teach this from the ground up – but we do expect prior knowledge of basic statistics and GIS.

Update, September 2017: I am no longer a Berkeley GSI, but Paul’s class is ongoing. Check out his fantastic teaching materials in his GitHub repo. From my experiences here, I have developed a course series on urban data science with Python and Jupyter, available in this GitHub repo.

Tags academia, anaconda, arcgis, berkeley, cartodb, city, code for america, data, data science, geocoding, geopandas, geopy, geospatial, gis, github, javascript, land use, leaflet, localdata, mapbox, maps, matplotlib, modeling, numpy, pandas, planning, projection, qgis, science, scikit-learn, scipy, scrapy, shapely, smart cities, socrata, statistics, tilemill, tutorial, urban, urban design, urban planning, visualization, wordpress

Data

Animated 3-D Plots in Python

Post author By gboeing
Post date 2015-04-13
1 Comment on Animated 3-D Plots in Python

Download/cite the paper here!

In a previous post, I discussed chaos, fractals, and strange attractors. I also showed how to visualize them with static 3-D plots. Here, I’ll demonstrate how to create these animated visualizations using Python and matplotlib. All of my source code is available in this GitHub repo. By the end, we’ll produce animated data visualizations like this, in pure Python:

Tags chaos, complexity, data, matplotlib, pandas, python, tutorial, visualization

Data

Visualizing Chaos and Randomness

Post author By gboeing
Post date 2015-04-09
3 Comments on Visualizing Chaos and Randomness

Download/cite the paper here!

In a previous post, I discussed chaos theory, fractals, and strange attractors – and their implications for knowledge and prediction of systems. I also briefly touched on how phase diagrams (or Poincaré plots) can help us visualize system attractors and differentiate chaotic behavior from true randomness.

In this post (adapted from this paper), I provide more detail on constructing and interpreting phase diagrams. These methods are particularly useful for discovering deterministic chaos in otherwise random-appearing time series data, as they visualize strange attractors. I’m using Python for all of these visualizations and the source code is available in this GitHub repo.

Tags chaos, complexity, data, math, matplotlib, modeling, pandas, python, science, theory, tutorial, visualization

Data

Chaos Theory and the Logistic Map

Post author By gboeing
Post date 2015-03-25
33 Comments on Chaos Theory and the Logistic Map

Using Python to visualize chaos, fractals, and self-similarity to better understand the limits of knowledge and prediction. Download/cite the article here and try pynamical yourself.

Chaos theory is a branch of mathematics that deals with nonlinear dynamical systems. A system is just a set of interacting components that form a larger whole. Nonlinear means that due to feedback or multiplicative effects between the components, the whole becomes something greater than just adding up the individual parts. Lastly, dynamical means the system changes over time based on its current state. In the following piece (adapted from this article), I break down some of this jargon, visualize interesting characteristics of chaos, and discuss its implications for knowledge and prediction.

Chaotic systems are a simple sub-type of nonlinear dynamical systems. They may contain very few interacting parts and these may follow very simple rules, but these systems all have a very sensitive dependence on their initial conditions. Despite their deterministic simplicity, over time these systems can produce totally unpredictable and wildly divergent (aka, chaotic) behavior. Edward Lorenz, the father of chaos theory, described chaos as “when the present determines the future, but the approximate present does not approximately determine the future.”

Tags chaos, complexity, data, math, matplotlib, modeling, pandas, python, science, theory, visualization

Data

Visualizing Summer Travels Part 6: Projecting Spatial Data with Python

Post author By gboeing
Post date 2014-09-06
8 Comments on Visualizing Summer Travels Part 6: Projecting Spatial Data with Python

This post is part of a series on visualizing data from my summer travels.

I’ve previously discussed visualizing the GPS location data from my summer travels with CartoDB, Leaflet, and Mapbox + Tilemill. I also visualized different aspects of this data set in Python, using the matplotlib plotting library. However, these spatial scatter plots used unprojected lat-long data which looked pretty distorted at European latitudes.

Today I will show how to convert this data into a projected coordinate reference system and plot it again using matplotlib. These projected maps will provide a much more accurate spatial representation of my spatial data and the geographic region. All of my code is available in this GitHub repo, particularly this notebook.

Tags crs, data, descartes, geopandas, geopy, geospatial, gis, maps, matplotlib, pandas, projection, pyproj, python, travel, tutorial, visualization