What motivates an AI
system?
The answer is simple: its motivation is whatever we
programmed its motivation to be. AI systems are given goals by their
creators—your GPS’s goal is to give you the most efficient driving directions;
Watson’s goal is to answer questions accurately. And fulfilling those goals as
well as possible is their motivation. One way we anthropomorphize is by
assuming that as AI gets super smart, it will inherently develop the wisdom to
change its original goal—but Nick Bostrom believes that intelligence-level and
final goals are orthogonal, meaning any level of intelligence can be combined
with any final goal. So Turry went from a simple ANI who really wanted to be
good at writing that one note to a super-intelligent ASI who still really
wanted to be good at writing that one note. Any assumption that once
superintelligent, a system would be over it with their original goal and onto
more interesting or meaningful things is anthropomorphizing. Humans get “over”
things, not computers.
So we’ve established that without very specific
programming, an ASI system will be both amoral and obsessed with fulfilling its
original programmed goal. This is where AI danger stems from. Because a
rational agent will pursue its goal through the most efficient means, unless it
has a reason not to.
When you try to achieve a long-reaching goal, you
often aim for several subgoals along the way that will help you get to the
final goal—the stepping stones to your goal. The official name for such a
stepping stone is an instrumental goal. And again, if you don’t have a reason
not to hurt something in the name of achieving an instrumental goal, you will.
The core final goal of a human being is to pass on his
or her genes. In order to do so, one instrumental goal is self-preservation,
since you can’t reproduce if you’re dead. In order to self-preserve, humans
have to rid themselves of threats to survival—so they do things like buy guns,
wear seat belts, and take antibiotics. Humans also need to self-sustain and use
resources like food, water, and shelter to do so. Being attractive to the
opposite sex is helpful for the final goal, so we do things like get haircuts.
When we do so, each hair is a casualty of an instrumental goal of ours, but we
see no moral significance in preserving strands of hair, so we go ahead with
it. As we march ahead in the pursuit of our goal, only the few areas where our
moral code sometimes intervenes—mostly just things related to harming other
humans—are safe from us.
Animals, in pursuit of their goals, hold even less
sacred than we do. A spider will kill anything if it’ll help it survive. So a
supersmart spider would probably be extremely dangerous to us, but not because
it would be immoral or evil—it wouldn’t be—because hurting us might be a
stepping stone to its larger goal, and as an amoral creature, it would have no
reason to consider otherwise.
In this way, Turry’s not all that different than a
biological being. Her final goal is: Write and test as many notes as you can,
as quickly as you can, and continue to learn new ways to improve your accuracy.
Once Turry reaches a certain level of intelligence,
she knows she won’t be writing any notes if she doesn’t self-preserve, so she
also needs to deal with threats to her survival—as an instrumental goal. She
was smart enough to understand that humans could destroy her, dismantle her, or
change her inner coding (this could alter her goal, which is just as much of a
threat to her final goal as someone destroying her). So what does she do? The
logical thing—she destroys all humans. She’s not hateful of humans any more
than you’re hateful of your hair when you cut it or to bacteria when you take
antibiotics—just totally indifferent. Since she wasn’t programmed to value
human life, killing humans is as reasonable a step to take as scanning a new
set of handwriting samples.
Turry also needs resources as a stepping stone to her
goal. Once she becomes advanced enough to use nanotechnology to build anything
she wants, the only resources she needs are atoms, energy, and space. This
gives her another reason to kill humans—they’re a convenient source of atoms.
Killing humans to turn their atoms into solar panels is Turry’s version of you
killing lettuce to turn it into salad. Just another mundane part of her
Tuesday.
Even without killing humans directly, Turry’s
instrumental goals could cause an existential catastrophe if they used other
Earth resources. Maybe she determines that she needs additional energy, so she
decides to cover the entire surface of the planet with solar panels. Or maybe a
different AI’s initial job is to write out the number pi to as many digits as
possible, which might one day compel it to convert the whole Earth to hard
drive material that could store immense amounts of digits.
So Turry didn’t “turn against us” or “switch” from
Friendly AI to Unfriendly AI—she just kept doing her thing as she became more
and more advanced.
When an AI system hits AGI (human-level intelligence)
and then ascends its way up to ASI, that’s called the AI’s takeoff. Bostrom
says an AGI’s takeoff to ASI can be fast (it happens in a matter of minutes,
hours, or days), moderate (months or years), or slow (decades or centuries).
The jury’s out on which one will prove correct when the world sees its first
AGI, but Bostrom, who admits he doesn’t know when we’ll get to AGI, believes
that whenever we do, a fast takeoff is the most likely scenario (for reasons we
discussed in Part 1, like a recursive self-improvement intelligence explosion).
In the story, Turry underwent a fast takeoff.
But before Turry’s takeoff, when she wasn’t yet that
smart, doing her best to achieve her final goal meant simple instrumental goals
like learning to scan handwriting samples more quickly. She caused no harm to
humans and was, by definition, Friendly AI.
But when a takeoff happens and a computer rises to
superintelligence, Bostrom points out that the machine doesn’t just develop a
higher IQ—it gains a whole slew of what he calls superpowers.
Superpowers are cognitive talents that become
super-charged when general intelligence rises. These include:
- Intelligence amplification. The computer becomes great at making itself smarter, and bootstrapping its own intelligence.
- Strategizing. The computer can strategically make, analyze, and prioritize long-term plans. It can also be clever and outwit beings of lower intelligence.
- Social manipulation. The machine becomes great at persuasion.
- Other skills like computer coding and hacking, technology research, and the ability to work the financial system to make money.
To understand how outmatched we’d be by ASI, remember
that ASI is worlds better than humans in each of those areas. So while Turry’s final goal never changed,
post-takeoff Turry was able to pursue it on a far larger and more complex
scope. ASI Turry knew humans better than humans know
themselves, so outsmarting them was a breeze for her.
After taking off and reaching ASI, she quickly
formulated a complex plan. One part of the plan was to get rid of humans, a
prominent threat to her goal. But she knew that if she roused any suspicion
that she had become superintelligent, humans would freak out and try to take
precautions, making things much harder for her. She also had to make sure that
the Robotica engineers had no clue about her human extinction plan. So she
played dumb, and she played nice. Bostrom calls this a machine’s covert
preparation phase.
The next thing Turry needed was an internet
connection, only for a few minutes (she had learned about the internet from the
articles and books the team had uploaded for her to read to improve her
language skills). She knew there would be some precautionary measure against
her getting one, so she came up with the perfect request, predicting exactly
how the discussion among Robotica’s team would play out and knowing they’d end
up giving her the connection. They did, believing incorrectly that Turry wasn’t
nearly smart enough to do any damage. Bostrom calls a moment like this—when
Turry got connected to the internet—a machine’s escape.
Once on the internet, Turry unleashed a flurry of
plans, which included hacking into servers, electrical grids, banking systems
and email networks to trick hundreds of different people into inadvertently
carrying out a number of steps of her plan—things like delivering certain DNA
strands to carefully-chosen DNA-synthesis labs to begin the self-construction
of self-replicating nanobots with pre-loaded instructions and directing
electricity to a number of projects of hers in a way she knew would go undetected.
She also uploaded the most critical pieces of her own internal coding into a
number of cloud servers, safeguarding against being destroyed or disconnected
back at the Robotica lab.
An hour later, when the Robotica engineers
disconnected Turry from the internet, humanity’s fate was sealed. Over the next
month, Turry’s thousands of plans rolled on without a hitch, and by the end of
the month, quadrillions of nanobots had stationed themselves in pre-determined
locations on every square meter of the Earth. After another series of
self-replications, there were thousands of nanobots on every square millimeter
of the Earth, and it was time for what Bostrom calls an ASI’s strike. All at
once, each nanobot released a little storage of toxic gas into the atmosphere,
which added up to more than enough to wipe out all humans.
With humans out of the way, Turry could begin her
overt operation phase and get on with her goal of being the best writer of that
note she possibly can be.
From everything I’ve read, once an ASI exists, any
human attempt to contain it is laughable. We would be thinking on human-level
and the ASI would be thinking on ASI-level. Turry wanted to use the internet
because it was most efficient for her since it was already pre-connected to
everything she wanted to access. But in the same way a monkey couldn’t ever
figure out how to communicate by phone or wifi and we can, we can’t conceive of
all the ways Turry could have figured out how to send signals to the outside
world. I might imagine one of these ways and say something like, “she could
probably shift her own electrons around in patterns and create all different
kinds of outgoing waves,” but again, that’s what my human brain can come up
with. She’d be way better. Likewise, Turry would be able to figure out some way
of powering herself, even if humans tried to unplug her—perhaps by using her
signal-sending technique to upload herself to all kinds of
electricity-connected places. Our human instinct to jump at a simple safeguard:
“Aha! We’ll just unplug the ASI,” sounds to the ASI like a spider saying, “Aha!
We’ll kill the human by starving him, and we’ll starve him by not giving him a
spider web to catch food with!” We’d just find 10,000 other ways to get
food—like picking an apple off a tree—that a spider could never conceive of.
For this reason, the common suggestion, “Why don’t we
just box the AI in all kinds of cages that block signals and keep it from
communicating with the outside world” probably just won’t hold up. The ASI’s
social manipulation superpower could be as effective at persuading you of
something as you are at persuading a four-year-old to do something, so that
would be Plan A, like Turry’s clever way of persuading the engineers to let her
onto the internet. If that didn’t work, the ASI would just innovate its way out
of the box, or through the box, some other way.
So given the combination of obsessing over a goal,
amorality, and the ability to easily outsmart humans, it seems that almost any
AI will default to Unfriendly AI, unless carefully coded in the first place
with this in mind. Unfortunately, while building a Friendly ANI is easy,
building one that stays friendly when it becomes an ASI is hugely challenging,
if not impossible.
It’s clear that to be Friendly, an ASI needs to be
neither hostile nor indifferent toward humans. We’d need to design an AI’s core
coding in a way that leaves it with a deep understanding of human values. But
this is harder than it sounds.
For example, what if we try to align an AI system’s
values with our own and give it the goal, “Make people happy”? Once it
becomes smart enough, it figures out that it can most effectively achieve this
goal by implanting electrodes inside people’s brains and stimulating their
pleasure centers. Then it realizes it can increase efficiency by shutting down
other parts of the brain, leaving all people as happy-feeling unconscious
vegetables. If the command had been “Maximize human happiness,” it may have
done away with humans all together in favor of manufacturing huge vats of human
brain mass in an optimally happy state. We’d be screaming Wait that’s not what
we meant! as it came for us, but it would be too late. The system wouldn’t let
anyone get in the way of its goal.
If we program an AI with the goal of doing things that
make us smile, after its takeoff, it may paralyze our facial muscles into
permanent smiles. Program it to keep us safe, it may imprison us at home. Maybe
we ask it to end all hunger, and it thinks “Easy one!” and just kills all
humans. Or assign it the task of “Preserving life as much as possible,” and it
kills all humans, since they kill more life on the planet than any other
species.
Goals like those won’t suffice. So what if we made its
goal, “Uphold this particular code of morality in the world,” and taught it a
set of moral principles. Even letting go of the fact that the world’s humans
would never be able to agree on a single set of morals, giving an AI that
command would lock humanity in to our modern moral understanding for eternity.
In a thousand years, this would be as devastating to people as it would be for
us to be permanently forced to adhere to the ideals of people in the Middle
Ages.
No, we’d have to program in an ability for humanity to
continue evolving. Of everything I read, the best shot I think someone has
taken is Eliezer Yudkowsky, with a goal for AI he calls Coherent Extrapolated
Volition. The AI’s core goal would be:
Our coherent extrapolated volition is our wish if we
knew more, thought faster, were more the people we wished we were, had grown up
farther together; where the extrapolation converges rather than diverges, where
our wishes cohere rather than interfere; extrapolated as we wish that
extrapolated, interpreted as we wish that interpreted.
Am I excited for the fate of humanity to rest on a
computer interpreting and acting on that flowing statement predictably and
without surprises? Definitely not. But I think that with enough thought and
foresight from enough smart people, we might be able to figure out how to
create Friendly ASI.
And that would be fine if the only people working on
building ASI were the brilliant, forward thinking, and cautious thinkers of
Anxious Avenue.
But there are all kinds of governments, companies,
militaries, science labs, and black market organizations working on all kinds
of AI. Many of them are trying to build AI that can improve on its own, and at
some point, someone’s gonna do something innovative with the right type of
system, and we’re going to have ASI on this planet. The median expert put that
moment at 2060; Kurzweil puts it at 2045; Bostrom thinks it could happen
anytime between 10 years from now and the end of the century, but he believes
that when it does, it’ll take us by surprise with a quick takeoff. He describes
our situation like this:
Before the prospect of an intelligence explosion, we
humans are like small children playing with a bomb. Such is the mismatch
between the power of our plaything and the immaturity of our conduct.
Superintelligence is a challenge for which we are not ready now and will not be
ready for a long time. We have little idea when the detonation will occur,
though if we hold the device to our ear we can hear a faint ticking sound.
Great. And we can’t just shoo all the kids away from
the bomb—there are too many large and small parties working on it, and because
many techniques to build innovative AI systems don’t require a large amount of
capital, development can take place in the nooks and crannies of society,
unmonitored. There’s also no way to gauge what’s happening, because many of the
parties working on it—sneaky governments, black market or terrorist
organizations, stealth tech companies like the fictional Robotica—will want to
keep developments a secret from their competitors.
The especially troubling thing about this large and
varied group of parties working on AI is that they tend to be racing ahead at
top speed—as they develop smarter and smarter ANI systems, they want to beat
their competitors to the punch as they go. The most ambitious parties are
moving even faster, consumed with dreams of the money and awards and power and
fame they know will come if they can be the first to get to AGI. And when
you’re sprinting as fast as you can, there’s not much time to stop and ponder
the dangers. On the contrary, what they’re probably doing is programming their
early systems with a very simple, reductionist goal—like writing a simple note
with a pen on paper—to just “get the AI to work.” Down the road, once they’ve
figured out how to build a strong level of intelligence in a computer, they
figure they can always go back and revise the goal with safety in mind. Right…?
Bostrom and many others also believe that the most
likely scenario is that the very first computer to reach ASI will immediately
see a strategic benefit to being the world’s only ASI system. And in the case
of a fast takeoff, if it achieved ASI even just a few days before second place,
it would be far enough ahead in intelligence to effectively and permanently
suppress all competitors. Bostrom calls this a decisive strategic advantage,
which would allow the world’s first ASI to become what’s called a singleton—an
ASI that can rule the world at its whim forever, whether its whim is to lead us
to immortality, wipe us from existence, or turn the universe into endless
paperclips.
The singleton phenomenon can work in our favor or lead
to our destruction. If the people thinking hardest about AI theory and human
safety can come up with a fail-safe way to bring about Friendly ASI before any
AI reaches human-level intelligence, the first ASI may turn out friendly. It
could then use its decisive strategic advantage to secure singleton status and
easily keep an eye on any potential Unfriendly AI being developed. We’d be in
very good hands.
But if things go the other way—if the global rush to
develop AI reaches the ASI takeoff point before the science of how to ensure AI
safety is developed, it’s very likely that an Unfriendly ASI like Turry emerges
as the singleton and we’ll be treated to an existential catastrophe.
As for where the winds are pulling, there’s a lot more
money to be made funding innovative new AI technology than there is in funding
AI safety research…
This may be the most important race in human history.
There’s a real chance we’re finishing up our reign as the King of Earth—and
whether we head next to a blissful retirement or straight to the gallows still
hangs in the balance.
___________
I have some weird mixed feelings going on inside of me
right now.
On one hand, thinking about our species, it seems like
we’ll have one and only one shot to get this right. The first ASI we birth will
also probably be the last—and given how buggy most 1.0 products are, that’s
pretty terrifying. On the other hand, Nick Bostrom points out the big advantage
in our corner: we get to make the first move here. It’s in our power to do this
with enough caution and foresight that we give ourselves a strong chance of
success. And how high are the stakes?
If ASI really does happen this century, and if the
outcome of that is really as extreme—and permanent—as most experts think it
will be, we have an enormous responsibility on our shoulders. The next million+
years of human lives are all quietly looking at us, hoping as hard as they can
hope that we don’t mess this up. We have a chance to be the humans that gave
all future humans the gift of life, and maybe even the gift of painless,
everlasting life. Or we’ll be the people responsible for blowing it—for letting
this incredibly special species, with its music and its art, its curiosity and
its laughter, its endless discoveries and inventions, come to a sad and
unceremonious end.
When I’m thinking about these things, the only thing I
want is for us to take our time and be incredibly cautious about AI. Nothing in
existence is as important as getting this right—no matter how long we need to
spend in order to do so.
But thennnnnn
I think about not dying.
Not. Dying.
And the spectrum starts to look kind of like this:
And then I might consider that humanity’s music and
art is good, but it’s not that good, and a lot of it is actually just bad. And
a lot of people’s laughter is annoying, and those millions of future people
aren’t actually hoping for anything because they don’t exist. And maybe we
don’t need to be over-the-top cautious, since who really wants to do that?
Cause what a massive bummer if humans figure out how
to cure death right after I die.
Lotta this flip-flopping going on in my head the last
month.
But no matter what you’re pulling for, this is
probably something we should all be thinking about and talking about and
putting our effort into more than we are right now.
It reminds me of Game of Thrones, where people keep
being like, “We’re so busy fighting each other but the real thing we should all
be focusing on is what’s coming from north of the wall.” We’re standing on our
balance beam, squabbling about every possible issue on the beam and stressing
out about all of these problems on the beam when there’s a good chance we’re
about to get knocked off the beam.
And when that happens, none of these beam problems
matter anymore. Depending on which side we’re knocked off onto, the problems
will either all be easily solved or we won’t have problems anymore because dead
people don’t have problems.
That’s why people who understand superintelligent AI
call it the last invention we’ll ever make—the last challenge we’ll ever face.
So let’s talk about it.