What does artificial intelligence see when it looks at your city? I recently created a Twitter bot in Python called CityDescriber that takes popular photos of cities from Reddit and describes them using Microsoft’s computer vision AI. The bot typically does pretty well with straightforward images of city skylines and street scenes:
an aerial view of a city at night pic.twitter.com/liJSmtrgTK
— City Describer (@CityDescriber) September 20, 2017
a group of people walking across a snow covered street pic.twitter.com/TujM8ugH9s
— City Describer (@CityDescriber) September 19, 2017
Some are even kind of wryly poetic, such as this description of Los Angeles:
a group of palm trees with a building in the background pic.twitter.com/xMD3j3Lnu5
— City Describer (@CityDescriber) September 16, 2017
Or this description of San Francisco:
a bunch of old buildings pic.twitter.com/FqiSCIwkS5
— City Describer (@CityDescriber) September 9, 2017
But the AI sometimes struggles with other photos. And when it’s wrong, it’s often hilariously far-off:
a stack of flyers on a table pic.twitter.com/S8M3wNWf2g
— City Describer (@CityDescriber) September 14, 2017
a bunch of bananas pic.twitter.com/4ml3qDhOdK
— City Describer (@CityDescriber) September 28, 2017
a large jetliner sitting on top of a mountain pic.twitter.com/p7SbXJVFOH
— City Describer (@CityDescriber) September 27, 2017
a circuit board on a city street pic.twitter.com/IuRNodSVpx
— City Describer (@CityDescriber) September 23, 2017
a close up of person riding a bike down a dirt road pic.twitter.com/bbERwF9e7Y
— City Describer (@CityDescriber) September 20, 2017
a group of people on a beach pic.twitter.com/qAHe3yEqGO
— City Describer (@CityDescriber) September 17, 2017
a table full of food pic.twitter.com/it9320OKfS
— City Describer (@CityDescriber) September 14, 2017
a view of a cactus pic.twitter.com/BkV0DSinPx
— City Describer (@CityDescriber) September 11, 2017
a traffic light hanging from a tree pic.twitter.com/Am9F0ldx1T
— City Describer (@CityDescriber) September 10, 2017
There has been much discussion recently (example) about the impact that computer vision — and machine learning more generally — could have on urban studies and urban planning… for better or for worse. On one hand, we can develop and train better models for more accurate insights into urban patterns and urban change. Modeling has always been a useful tool in the planning toolkit, and new data science methods might be able to make planners more efficient and accurate.
On the other hand, planners should be cautious and critical of claims about using AI to “solve” cities. Machine learning models are no better than their training, and biases in training data and researchers can result in biased estimates and predictions. Despite some popular accounts, AI and big data do not spell the end of theory.
Perhaps the CityDescriber bot showed one aspect of this in a light-hearted way. I don’t mean to broadly mock Microsoft’s algorithm: in fact, it tends to describe most of these photos in a literal, accurate, and mundane way. This is a substantial accomplishment. But what about the descriptions that are just bafflingly incorrect? The AI saw something that triggered a completely incorrect prediction, even though a child could recognize the photo’s contents in an instant. In particular, it seems to have not been well-trained on aerial shots looking down on cities.
As planners and researchers, we need to consider artificial intelligence and machine learning with some enthusiasm and some skepticism. What exactly are the models telling us? Why? What are their biases? How do they reinforce entrenched biases that came built-into their training data? What do they “see”… and what do they not see? Researchers may strive to build objective models, but they usually reflect our own experiences and points of view. As planners, we need to be cognizant of this as we increasingly use machine learning over the next decade to better understand cities and their citizens.
3 replies on “Describing Cities with Computer Vision”
This is so weird, I also made a Twitter bot that describes images from reddit. In my case, from https://www.reddit.com/r/FoodPorn/
The bot is https://twitter.com/foodDescBot (it’s turned off because a problem I have to solve with Azure) and does a pretty good job!
Also using Microsoft’s computer vision API
That’s really cool, despite the howlers :) Have you released the source code for it? Would love to see how you hooked the bits together.