Why Aza Raskin Is Building AI to Talk to Animals

(To receive weekly emails of conversations with the world’s top CEOs and business decisionmakers, click here.)

During the early years of the Cold War, an array of underwater microphones monitoring for sounds of Russian submarines captured something otherworldly in the depths of the North Atlantic.

The haunting sounds came not from enemy craft, nor aliens, but humpback whales, a species that, at the time, humans had hunted almost to the brink of extinction. Years later, when environmentalist Roger Payne obtained the recordings from U.S. Navy storage and listened to them, he was deeply moved. The whale songs seemed to reveal majestic creatures that could communicate with one another in complex ways. If only the world could hear these sounds, Payne reasoned, the humpback whale might just be saved from extinction.

When Payne released the recordings in 1970 as the album Songs of the Humpback Whale, he was proved right. The album went multi-platinum. It was played at the U.N. general assembly, and it inspired Congress to pass the 1973 endangered species act. By 1986, commercial whaling was banned under international law. Global humpback whale populations have risen from a low of around 5,000 individuals in the 1960s to 135,000 today.

For Aza Raskin, the story is a sign of just how much can change when humanity experiences a moment of connection with the natural world. “It’s this powerful moment that can wake us up and power a movement,” Raskin tells TIME.

Raskin’s focus on animals comes from a very human place. A former Silicon Valley wunderkind himself, in 2006 he was first to invent the infinite scroll, the feature that became a mainstay of so many social media apps. He founded a streaming startup called Songza that was eventually acquired by Google. But Raskin gradually soured on the industry after realizing that technology, which had such capacity to influence human behavior for the better, was mostly being leveraged to keep people addicted to their devices and spending money on unnecessary products. In 2018, he co-founded the Center for Humane Technology with his friend and former Google engineer Tristan Harris, as part of an effort to ensure tech companies were shaped to benefit humanity, rather than the other way around. He is perhaps best known for, alongside scholar Renée DiResta, coining the phrase “freedom of speech is not freedom of reach.” The phrase became a helpful way for responsible technologists, lawmakers and political commentators to distinguish between the constitutional freedom for users to say whatever they like, and the privilege of having it amplified by social media megaphones.

Raskin is talking about whale song because he is also the co-founder and President of the Earth Species Project, an artificial intelligence (AI) nonprofit that is attempting to decode the speech of animals —from humpback whales, to great apes, to crows. The jury is out on whether it would ever truly be possible to accurately “translate” animal communication into anything resembling human language. Meaning is socially constructed, and animal societies are very different to ours. But Raskin is optimistic that the attempt is worthwhile, given that connection with animals can be a force strong enough to galvanize humans into protecting the natural world at such a critical juncture in the fight against climate change.

The Earth Species Project is applying natural language processing—the AI technique behind human translation software and chatbots like OpenAI’s ChatGPT—to recordings of animals. While they haven’t succeeded in “decoding” any animal speech yet, computer scientists at the nonprofit recently designed an algorithm that is able to isolate the sounds from a single individual animal (the algorithm works well on bats and dolphins, and is not bad at elephants,) in a recording of multiple “speakers.” Raskin says solving this issue—known as the “cocktail party problem” since it is comparable with the difficulty of focusing on one person speaking in a crowded room—is a first step toward decoding the mysteries of the animal kingdom.

Read More: AI Chatbots Are Getting Better. But an Interview With ChatGPT Reveals Their Limits

Although fixing the ills of social media and decoding animal speech may seem worlds apart, Raskin sees them as part of a holistic mission. The Earth Species Project and the Center for Humane Technology are both “experiments into how you shift trillion dollar industries,” Raskin says. They share the same goal: changing society for the better, not through the traditional Silicon Valley route of building an app or cornering a business model, but by changing culture.

TIME spoke with Raskin this summer. In this wide-ranging conversation, we discuss not just animal translation, but the state of social media, applying those lessons to the rapid rise of artificial intelligence, and how—amid everything—to secure a place for both humanity and nature in a fast-changing world.

This interview has been condensed and edited for clarity.

Before we get onto talking about the Earth Species Project, I want to talk about your life up to now. Your father was Jef Raskin, the famous expert in human-computer interaction at Apple who left a huge mark on that company. You yourself were a Silicon Valley founder who worked on subtly changing people’s behaviors through technology. Then you gave that all up to sound the alarm on social media. Now you’re trying to talk to animals through AI. Talk me through your trajectory.

My mother worked in a palliative care hospice. From her, I learned what it is like to care for someone with dignity. My father created the Macintosh project at Apple. And he also came with a very humanistic sense of care, at a moment in Silicon Valley before it had been captured by engagement and the attention industrial machine. You could still ask these questions: what is technology even for?

One of the things that makes humanity unique is not that we use tools—lots of species use tools—but the extent to which our tools remake us. There is no such thing as to be human without using some form of technology. Whether that’s language, whether that’s fire, or anything that came after. One of the things that people miss is the extent to which our technology changes our social structures and our culture. Just look at the plough. It changed how we lived, it created surplus food that let us move into cities, which changed the nature of family and relationships. If you’re going to use animals to plough your field, it’s not compatible with animistic traditions anymore, so it changes religion. Growing up, especially with my father, I was given this lens of how fundamental technology is to what humans become. We have a choice about what we do with technology, and what’s at stake is our identity and how we interrelate with the rest of the world.

My father was really interested in the idea of ergonomics: how human beings bend and fold. If you don’t understand and study ergonomics, you end up designing things like chairs that really hurt us. There’s also an ergonomics of relationships, of communities, of societies—a way that we bend and fold. If we’re blind to ergonomics, we break ourselves. If capitalism isn’t ergonomic to our biosphere, we break the container that we live in. And this is the through-line. For almost all my work, I ask: how do we create things that understand the ergonomics of how we as human beings work, how our biosphere works, how our technosphere works, so that we can create things that help us thrive as a whole?

You’ve done a lot of work around making technology more humane. But decoding animal speech seems out of left-field. How are the two connected?

Civilization can’t exist without the accumulated culture and knowledge that comes from language. It’s right there at the core of human identity. The more you look, the more you realize human identities tied up with language, and that tells you that there’s something really important there for us to examine. Because even if we’re able to draw down all the carbon from the atmosphere tomorrow, which we should do, that wouldn’t fix the core problem, which is human ego. We need to change the way that we view ourselves, the way we relate to ourselves, and hence the rest of the world. That’s the connection with technology. Once we can change that identity, that creates the opportunity for entirely new patterns of behavior and for solving these problems. The big hope is that there are these moments in time, when we get a shift in perspective. And that shift in perspective changes everything.

Humans have been speaking vocally and passing down culture for somewhere between 100,000 and 300,000 years. For whales and dolphins, it’s 34 million years. Generally the things that are wisest will have lasted longest. Imagine what kind of wisdom there can be in cultures that have been going on for 34 million years. It just gives me goosebumps.

Do you have in mind what you want people’s perspective to be shifted toward?

Yeah. I think it’s a stance of interdependence. If, generally speaking, you can empathize and you see someone else, or some other being as less than other, then suddenly, you’re more connected than you were before and your sphere of care expands.

Many linguists understand human language to be tied up in our own brain structure and life experiences. Obviously both of those things are very different when we talk about animals. Maybe there are whale concepts that humans couldn’t possibly understand. Do you think it is actually possible to have a conversation with a whale? Or is it more like you would need to interpret what the whale is saying in a much more abstract way?

I think it’s really important to hold these two poles. On one side, there’s being too anthropomorphic. Where you think, we have these feelings, therefore the animals do. On the other side, there’s human exceptionalism where we think we are so special that we don’t share anything with other animals. And of course, the truth is going to be somewhere in between. I think there are going to be some things that we’re going to find a similarity with, that we can directly communicate about, and then there are going to be the things that we don’t share. And therefore, that part will be a much more metaphorical kind of communication.

And I don’t know, actually, which one is going to be more exciting. Will it be the parts that we can directly translate into human experience or the ones that we can’t? But I think what people often forget is this, this shared set of experiences is large. [Raskin shares his screen.] Here is a pilot whale who has been carrying her dead calf for three weeks. Grief is clearly some kind of really profound shared experience. Here is a chimp who is browsing Instagram, who is able to use it, and actually uses it often to follow other chimps. So there’s something here that is truly conserved. The answer is, of course, we don’t know because we’re doing science. This is a journey into the unknown. But we should keep a very open mind, not to fall into human exceptionalism. Just like we shouldn’t fall into anthropomorphism.

I want to shift gears and talk about social media. You helped coin the term “freedom of speech is not freedom of reach.” To what extent do you feel like that concept has had an impact? And to what extent do you think that there’s still more to be done?

I think it’s been a really helpful concept to get out in the world. Renée DiResta, who ended up penning the article that got that concept out into the world, has done a fantastic job. It clearly has done good work but needs to do more, because we are constantly trapped in the false dichotomy of saying, either we have content moderation, censorship, or we have free speech.

But that phrase is pointing at a bigger thing, which is that we need to be thinking as a society as a whole. Facebook’s stock price has dropped by half, which I don’t think is just us, of course. But while we’ve had success, the stakes are even higher now, because we are still the commodity. Just like a tree is worth more as lumber than as a living tree, and a whale is worth more dead than alive, we are going to be worth more as polarized, distracted, narcissistic, tribalistic people, than we are as full, whole individuals.

Shoshana Zuboff points out that capitalism takes things that are outside of the market and pulls them into the market. And once they’re in the market, then you can extract, abstract, deplete and pollute. And that’s what’s happened with human attention and engagement. It was outside of the market, it’s now inside the market. We need to think about the next layer up. We know that if you let markets run without any guardrails, they will always grow to break the thing they’re growing inside of. If your liver starts to grow, not listening to anything else, it’ll eventually take over your body and you die. That’s how cancer works. And so markets always need to come with guardrails, to keep them safe for the body they’re growing within. We’ve done that for capital markets. We do it for things like human organs. We have never done it with human attention or engagement. That’s a market that needs guardrails, otherwise, you’re constantly going to have like the race to the bottom of the brainstem.

When I think about it at an even higher level, there’s this fundamental equation. Technology plus autocracy equals stronger autocracy. But technology plus democracy equals worse democracy. And if we do not solve that problem, then the values that we care most about will not have a seat in the future.

Powerful machine learning systems are rapidly becoming accessible to businesses and members of the public. Some can generate realistic images. Others can generate realistic text. It took 10 years for technologists to force through a coherent set of guidelines for social media, and social media destabilized our world in the meantime. Now we’re seeing a very similar revolution at its earliest stages in terms of accessible generative AI. What do you think a few ethical red lines should be for technologies like GPT-3 and Dall-E?

Who knew that AI was going to come first for art, story and relationships? The story has always been that it takes our jobs first. But actually it’s coming for things that we think are very core to human identity. And we have not yet grappled with all of that.

I think there are some simple things that work across social media and AI. And that is: the scale of impact needs to scale with the kinds of guardrails you’re within. That is, if you’re touching a million people, versus touching 500 million or a billion people, you should probably have different standards that you’re operating to, compared to if you’re only touching 50 people.

The next one is, you’re going to have to move to a world where we know that when something is posted online, it’s posted by a human. And that’s scary for a whole bunch of reasons. All of a sudden, it means you need to have stronger identity protections on the internet, which, if you don’t do it right, opens up a whole bunch of authoritarian surveillance stuff. But the flip side is, if we don’t do any of that, then we are witnessing the end of video and photographic evidence as a medium that we trust. That’s a neutron bomb for trust on the internet. So we’re gonna have to thread that needle between privacy, individual safety, ability to speak and express, and also society’s ability to hear and make sense. We’re going have to balance those things.

And the last one is, I think one of the biggest lessons that Silicon Valley has not yet learned is that “democratize” does not equal “democracy.” If you put James Bond supervillain weapons in everyone’s hands, something bad is going to happen. The way I’ve been thinking about it is, even the phrase “chatbots” is the wrong phrase to use. Because that puts your mind back to the 1990s. Every time you say “chatbot,” replace it with “synthetic relationship.” Recently there was the Google engineer who was fired for believing his language model was sentient. And the takeaway is not whether it’s sentient or not. That’s the wrong question to ask. The right question to ask is, if he believes that he’s in a relationship with this language model that’s so profound that he’s willing to stand up and get fired, that means a lot of people are going to feel that the language models they encounter are sentient. Suddenly you get this realization that relationships are the most transformative technology human beings have. So that means loneliness is about to become one of the largest national security threats. Just as an example, there are relationship scams, where people go on Tinder, and they start a relationship, and then they end up getting scammed and asked for money. We should expect this harnessing of people’s lack of cohesion, lack of meaning, lack of belonging to a group or community. So we need to get in front of this.

Read More: Fun AI Apps Are Everywhere Right Now. But a Safety ‘Reckoning’ Is Coming

How do you suggest we do that?

First we need to have an actual honest conversation about these things. Because right now we’re stuck in the conversation about how this is a neat piece of technology. Versus what does this mean? What are the asymmetric powers that are about to be deployed? For lawyers, who have an asymmetric power over their clients, they have a duty to act in the client’s best interest. We need to recategorize technology of this power as being in a fiduciary relationship.

The paradox of technology is that it gives us the power to serve and protect at the exact same time as it gives us the power to exploit. So we are either entering into our darkest night, if we’re continually trapped by perverse incentives, or our most golden era. It’s going to be a touch-and-go relay race, from utopia to dystopia, until the final act. And I think of both the Center for Humane Technology and the Earth Species Project as trying to bend that arc systemically towards the golden era versus our darkest night.

More Must-Reads From TIME


Write to Billy Perrigo at billy.perrigo@time.com.