Published by Penn Today
Authored by Katherine Unger Baillie
Photograph by Scott Spitzer
Video by Derick Crucius
Tucked away on the Pennovation Works campus in southwest Philadelphia, an 8’ by 8’ by 20’ wire cage represents what is likely the most high-tech aviary ever created. The enclosure is outfitted with eight computer-vision cameras, an array of 24 high-precision microphones, and 20 brown-headed cowbirds.
The goal of the aviary, originally envisioned by Marc Schmidt, a professor of biology in Penn’s School of Arts and Sciences, is to use the latest in machine learning technology to answer questions about animals’ social behavior that can be addressed in no other way.
“My student Ammon Perkes coined it the ‘smart aviary,’” Schmidt says, “because we can use this high-tech array of cameras and microphones to determine the position and identity of each individual bird and every sound that is being produced. But I like to call it the ‘cage of dreams.’ It’s sort of corny but I felt that, if we built it, the technology and the collaborators would come.”
Just as physicists and astronomers have pointed powerful telescopes at the sky, raking in vast quantities of data to later comb through, the Penn scientists working on the aviary are collecting data by the terabyte in their biological observatory, and then developing the tools and algorithms to parse it carefully to make new discoveries.
“For us it’s very exciting science,” says Kostas Daniilidis, the Ruth Yalom Stone Professor in the School of Engineering and Applied Science’s Department of Computer and Information Science. “It gives us the chance to work on translating two-dimensional data to three-dimensional data, and it also gives us a chance to use AI tools to recognize complex poses in the birds.”
Schmidt has studied the neural basis of reproduction and song production in birds in his lab for close to two decades, but had long dreamed of setting up an aviary to examine these behaviors in a more naturalistic setting. In 2015, he reached out to Daniilidis and began discussing strategies to make this vision concrete.
I like to call it the ‘cage of dreams.’ It’s sort of corny but I felt that, if we built it, the technology and the collaborators would come.
Marc Schmidt, a professor of biology in Penn’s School of Arts and Sciences
Together with colleagues, they applied for and won a grant from the National Science Foundation’s Major Research Instrumentation Program to build the facility. Along the way, they brought into the fold Vijay Balasubramanian, a physicist in the School of Arts and Sciences who had pursued computational work in neuroscience at various levels, from neurons up to behavior.
“The thing I find very interesting about this project is it’s an attempt to completely analyze a developing social network,” Balasubramanian says. “As a physicist I would like to understand the formation of the collective behavior.”
Schmidt’s expertise in biology and experimental design, Daniilidis’s sophisticated computer vision systems, and Balasubramanian’s track record at analyzing complex systems made for a formidable interdisciplinary partnership.
Dream team in place, the next step was construction. That effort was led by postdoctoral researcher Bernd Pfrommer, formerly of Daniilidis’s group, and doctoral student Perkes from Schmidt’s lab, who went from initial sketches in the proposal to a fully framed enclosure by 2017. The process required navigating a variety of challenges, from applying for permits to collect cowbirds from the wild, to figuring out how to arrange the cameras and microphones, to establishing a server capable of relaying large quantities of data from the Pennovation Works campus to the main Penn campus.
Part of the challenge was also communicating across disciplinary boundaries.
“There’s a different language between biologists and engineers,” says Perkes. “The engineers may see what they can do and not think about the fact that the birds are going to want to perch on the cameras—or poop on the cameras. But we’ve been learning to talk to one another so everyone is on the same page.”
The group began recording in the aviary during this past spring’s breeding season. With 10 female and 10 male cowbirds in the aviary, the cameras and microphones recorded 10 hours a day, capturing every wing stroke, head bow, tussle, song, and call—close to a terabyte of data a day (after 60-fold compression) for 100 days.
Cowbirds are a gregarious species that dwell in groups, but also form breeding pairs each season. One aim of the work is to see how different interactions between the birds give rise to these stable bonds.
“It’s a very complex collective of interactions,” says Balasubramanian. “It’s not just ‘A talks to B.’ It’s ‘A talks to B and C’s opinion of A talking to B matters.’ It’s very complicated to figure out.”
Each bird in the aviary can be distinguished by colored leg bands, but Marc Badger, a postdoctoral researcher in Daniilidis’s group, is working to craft algorithms capable of discerning different poses of the birds based on their silhouettes. Females, for example, go into what is known as a copulatory response, a kind of submissive posture, to indicate that they are receptive of a male’s advances. The engineers’ task is to employ machine learning to distinguish these types of subtle movements from others.
“It’s much easier to transfer results from joint positions in humans, or even cheetahs or monkeys, animals that have clear articulation in their joints,” says Daniilidis. “In a bird it’s very difficult to click on a point and say, ‘This is the joint of articulation,’ because a lot of articulation happens underneath the wings, for example.”
This task is further complicated by the difficulty of picking out bird silhouettes amid the shifting shadows that cross over the enclosure during the course of a day. And a recycling center and trash compactor located near the aviary complicate the job of isolating the vocalizations of the birds in the study from background noise.
“That is really an open challenge, we’re still trying to solve it,” Daniilidis says.
Undergraduate students are assisting in the effort to overcome these obstacles, “training” the computer vision system to identify birds by manually clicking on their locations in the thousands of hours of video that have been collected so far.
Post Doc, CIS '22 - Principal Research Engineer, Aescape
Ruth Yalom Stone Professor, CIS
Robotics MSE '18, PhD, ESE '21