fbpx

Templeton.org is in English. Only a few pages are translated into other languages.

OK

Usted está viendo Templeton.org en español. Tenga en cuenta que solamente hemos traducido algunas páginas a su idioma. El resto permanecen en inglés.

OK

Você está vendo Templeton.org em Português. Apenas algumas páginas do site são traduzidas para o seu idioma. As páginas restantes são apenas em Inglês.

OK

أنت تشاهد Templeton.org باللغة العربية. ŘŞŘŞŮ… ترجمة بعض صŮحات المŮŮ‚Řą ŮŮ‚Ř· إلى لغتŮ. الصŮحات المتبقية هي باللغة الإنجليزية ŮŮ‚Ř·.

OK
Skip to main content
Back to Templeton Ideas

This essay and podcast has been republished with permission from Greylock and originally ran on greylock.com


I’ve just published a new book entitled “Impromptu: Amplifying Our Humanity Through AI.” But I didn’t write it alone. I co-wrote it with GPT-4, OpenAI’s latest and most powerful large language model.

The book aims to serve as a sort of “travelog” of my experience with GPT-4 as I explored the tool’s strengths and limitations. Through hundreds of prompts, GPT-4  discussed the potential advantages and drawbacks of AI, argued about human nature, conceived original sci fi plots, and even tried to make a few jokes. To see just how far the technology has come in a short amount of time, check out my conversations with GPT-4’s predecessor ChatGPT in the Greymatter series Fireside Chatbots.

As AI is quickly advancing, I hopes his book can serve as a guide to encourage people to learn more about the technology, consider how we might use it, and ponder the complex questions about how our choices might play out in the future.

You can listen to an audio version of this first section below, or wherever you get your podcasts.

The full book is available for free here, or through Amazon here.

INTRODUCTION: MOMENTS OF ENLIGHTENMENT

For Isaac Newton, it was an apple falling from a tree that supposedly led him to formulate the law of universal gravity. For Benjamin Franklin, it was a key on a kite string, struck by lightning in a thunderstorm, that proved electricity could be transferred and stored.

My initial “AHA!” moment regarding the current state of AI came in the form of a joke. It was July 2022, and I asked GPT-4, “How many restaurant inspectors does it take to change a lightbulb?”

GPT-4, as you may know, is an advanced type of AI system, or natural-language processor, known as a large language model (LLM). Prompt it with a few words or sentences and it will generate coherent and diverse texts in response. In this way, it can answer questions, perform tasks, and productively interact with its human users.

Along with its predecessors, including the wildly popular ChatGPT, GPT-4 was developed by OpenAI, a research organization founded in 2015 with a mission to give millions of people direct, hands-on access to powerful new AI tools.

As one of OpenAI’s original funders, I’ve been experimenting with its products for some time now, so my query to GPT-4 was not the first time I’d asked an LLM to create a lightbulb joke for me. (I’m fond of lightbulb jokes.) While earlier versions of GPT sometimes understood the assignment—especially if the joke formula’s main variable is part of the lightbulb-joke canon, like “lawyer” or “psychologist”—their outputs have tended to be merely OK. If fed less conventional setups, earlier GPTs can freeze up like an amateur at an open-mic night:

REID: How many restaurant inspectors does it take to change a light bulb?

GPT-3: Only one, but the light bulb has to want to change.

That robot should keep its day job.

Compare that with the response I got when I submitted the same prompt to GPT-4:

Reid: How many restaurant inspectors does it take to change a light bulb?

GPT-4 (edited for length): There are a few possible ways to approach this question, depending on the tone and purpose of the answer. Here are some examples of different types of responses, ranging from serious to humorous:

  • A factual answer: Generally, one inspector should be able to change a lightbulb safely and correctly, as long as they follow the proper procedures and use the appropriate tools and equipment. However, some situations may require more than one inspector, such as if the light fixture is in a hard-to-reach or hazardous area, if the inspector needs assistance or supervision, or if the inspector needs to document or report the change for compliance purposes.
  • A joke answer: The number of restaurant inspectors needed to change a light bulb is four. One to hold the ladder, one to unscrew the old bulb, one to screw in the new bulb, and one to write a citation for using the wrong wattage.

Consider all that’s going on here. First, while GPT-4 apparently “assumes” that I may be asking it to complete a joke, it also wants to cover all the bases—thus, the initial “factual” answer. And while this answer isn’t actually factual—a restaurant inspector’s job is to inspect, not to perform repairs—it does suggest a fair degree of “knowledge” about the conditions and conventions of an inspection process.

The joke answer continues to build on this knowledge. GPT-4 is clearly “aware” of common perceptions and complaints about the complexities, costs, and bureaucracy of inspections. It also accurately evokes, in the build to its punchline, the sense of frustration a restaurant owner might have during an inspector’s visit.

Is it the best possible restaurant inspector lightbulb joke ever? Probably not.

Did it make me feel like GPT-4 had a pretty good “understanding” of what a lightbulb joke should deliver? Yes, it did.

Then I asked it for some variations on this theme:

Reid: How many restaurant inspectors does it take to change a lightbulb? Answer in the style of Jerry Seinfeld.

GPT-4: What is the deal with restaurant inspectors? They’re always poking around in the kitchen, checking the temperature of the meat, looking for signs of rodents, but they never seem to help out with anything. Have you ever seen a restaurant inspector change a lightbulb? No, of course not. They’re too busy writing up violations and handing out fines. You know what they do when they see a lightbulb that’s out? They mark it down on their clipboard and tell the owner to fix it by next week or else. Or else what? Or else they’ll come back and check the lightbulb again. That’s their big threat. They’re not going to change it for you, they’re not going to lend you a ladder, they’re not going to offer any advice on how to unscrew the old one and screw in the new one. They’re just going to stare at you with that smug look on their face and say, “You know this is a serious safety hazard. You could have a fire, you could have a lawsuit, you could have a bad Yelp review. You better get on that, pal.” And then they leave, and you’re left with a dark corner of your dining room, where no one wants to sit, where the food looks unappetizing, where the mood is gloomy. And you think to yourself, how many restaurant inspectors does it take to change a lightbulb? None. They don’t change lightbulbs. They just make you feel bad about yours.

Pretty good, right? Here’s another:

Reid: How many restaurant inspectors does it take to change a lightbulb? Answer in the style of the philosopher Ludwig Wittgenstein.

GPT-4: The question of how many restaurant inspectors it takes to change a lightbulb cannot be answered with a simple number. In fact, the answer to this question lies more in the realm of language than it does in the realm of mathematics. To understand the answer, we must first consider the context in which it is asked. What is the purpose of the lightbulb? What purpose is it meant to serve? What is the role of the restaurant inspector in this context? Is the inspector responsible for installing the lightbulb, or is their job limited to inspecting it? The answers to these questions will shape the answer to the original question. Without these answers, the question can only be answered in the abstract and is ultimately unanswerable. Language, not mathematics, is the key to unlocking the answer.

Okay, less funny than the Seinfeld one, but still—impressive!

Even from these brief performances, it seemed clear to me that GPT-4 had reached a new level of proficiency compared to its predecessors. And the more I interacted with GPT-4, the more I felt this way.

Along with writing better lightbulb jokes, GPT-4 was also skilled at generating prose of all kinds, including emails, poetry, essays, and more. It was great at summarizing documents. It had gotten better at translating languages and writing computer code, to name just some of its powers.

So while GPT-4 wasn’t a new device per se, it felt to me like something that could become as indispensable as our smartphones—a 12-in-1 multi-tool for your brain. By that, I mean OpenAI had created a tool that became “intelligent” enough to productively assist you in many different environments, in many different ways.

Much of what we do as modern people—at work and beyond—is to process information and generate action. GPT-4 will massively speed your ability to do these things, and with greater breadth and scope. Within a few years, this copilot will fall somewhere between useful and essential to most professionals and many other sorts of workers. Without GPT-4, they’ll be slower, less comprehensive, and working at a great disadvantage.
For example, GPT-4 is a research tool you can use to quickly get an overview on supply chain management issues in the COVID-19 era—or to plan a week-long vacation itinerary to Patagonia that takes into account that you’re a vegan who enjoys moderately challenging hikes.

Along with enabling a new kind of highly contextualized search, GPT-4 is also an extremely versatile brainstorming and production aid. You can use it to help you write a wedding toast, a job offer, a novel in verse. You can use it to figure out how to turn the leftovers in your fridge into a tasty meal.

Whatever your skill level at a given task, GPT-4 can potentially amplify your abilities and productivity, so it’s equally useful to beginners, experts, and everyone in between. Given a request for any sort of information that you might ask a human assistant for, GPT-4 can come back instantly with an answer that is likely between good and excellent quality (though also with a non-zero chance of completely missing the mark, as we’ll see).

Developers, I realized, would soon be adding GPT-4 to thousands of existing apps and building thousands of new apps around it.

It is, all said, a technology that deftly leverages some of our most significant technologies from the last thirty years—including the internet, mobile, cloud computing, and data analytics—to put the protean power of always-on AI into the hands of hundreds of millions of people.

Sounds a little bit like the start of a new world, doesn’t it?

A traveler’s guide to the future

In the weeks and months that followed my first encounter with GPT-4, my excitement only grew. In between Zoom calls, I’d give GPT-4 a prompt:

Reid: Write a sonnet about sonnets.

GPT-4: Sonnets are poems of fourteen lines
That follow a strict rhyme scheme and meter
They often praise or question love’s designs
Or muse on beauty, art, or nature’s features

Some sonnets follow the Petrarchan form
With an octave and a sestet to contrast
Two themes or arguments, or to perform
A turn or shift from problem to resolve at last

Some sonnets follow the Shakespearean style
With three quatrains and a couplet to conclude
They often use witty language and beguile
The reader with a twist or a surprise in mood

But whatever form or rhyme they choose to adopt
Sonnets are poems that challenge and delight the thought

OK, also during Zoom calls, I’d give GPT-4 a prompt:

Reid: As large language model AIs are trained on the whole internet, in what ways are their outputs false in Adorno’s sense, where the whole is the false?

GPT-4 (edited for length): To answer this question, we need to first understand what Adorno meant by this provocative claim, and then examine how it applies to the nature and limitations of large language model AIs.

Adorno, a prominent member of the Frankfurt School of critical theory, was deeply concerned with the effects of modern capitalism, mass culture, and rationalization on human society and culture. He argued that the logic of instrumental reason, which reduces everything to means and ends, calculation and efficiency, domination and exploitation, pervades all aspects of modern life, creating a system of total administration and control. This system, which he called the culture industry, produces standardized and commodified forms of culture that manipulate and pacify the masses, creating a false consciousness that prevents them from recognizing and resisting their alienation and oppression. Adorno saw this as a form of negative dialectics, where the contradictions and conflicts inherent in reality are suppressed and reconciled by the dominant ideology, resulting in a loss of critical and creative thinking, and a denial of the possibility of genuine change and emancipation…

Did I mention that GPT-4 also does lightbulb jokes?

Whatever train of thought I wanted to follow, whatever arbitrary pathway sparked by an old memory or a new interest, GPT-4 was always completely in step with me, ready to respond with useful context, a crisp summation, and sometimes, some fairly spectacular bullshit. (More on this in a bit.)

After a few weeks of this, I realized I wanted to write a book about GPT-4, with GPT-4—a book about the ways it and other AIs seemed poised to shape our future. So I pitched the idea:

Reid: Hey, GPT-4, let’s you and I write a book about the way you and other AI tools are going to shape the future. What do you say?

GPT-4: I say yes, that sounds like an interesting and challenging project. I am curious about how you and other humans perceive and interact with me and other AI tools, and what impact we will have on your society, culture, and values. I also have some questions and ideas of my own that I would like to share and explore with you.

Even with GPT-4 on board, though, I still had reservations. Because, really, an AI book? When things are moving so quickly? Even with a helpful AI on hand to speed the process, any such book would be obsolete before we started to write it—that’s how fast the industry is moving.

So I hemmed and hawed for a bit. And then I thought of a frame that pushed me into action.

This didn’t have to be a comprehensive “book” book so much as a travelog, an informal exercise in exploration and discovery, me (with GPT-4) choosing one path among many. A snapshot memorializing—in a subjective and decidedly not definitive way—the AI future we were about to experience.

What would we see? What would impress us most? What would we learn about ourselves in the process? Well aware of the brief half-life of this travelog’s relevance, I decided to press ahead.

A month later, at the end of November 2022, OpenAI released ChatGPT, a “conversational agent,” aka chatbot, a modified version of GPT-3.5 that they had fine-tuned through a process called Reinforcement Learning through Human Feedback (RLHF) to enable more flowing, human-like conversations with its human users. Five days later, ChatGPT had more than a million registered users.

In late January 2023, Microsoft1—which had invested $1 billion in OpenAI in 2019—announced that it would be investing $10 billion more in the company. It soon unveiled a new version of its search engine Bing, with a variation of ChatGPT built into it.

By the start of February 2023, OpenAI said ChatGPT had one hundred million monthly active users, making it the fastest-growing consumer internet app ever. Along with that torrent of user interest, there were news stories of the new Bing chatbot functioning in sporadically unusual ways that were very different from how ChatGPT had generally been engaging with users—including showing “anger,” hurling insults, boasting on its hacking abilities and capacity for revenge, and basically acting as if it were auditioning for a future episode of Real Housewives: Black Mirror Edition.

Microsoft CTO Kevin Scott suggested that such behavior was “clearly part of the learning process” as more people use GPT-like tools. These incidents do raise questions that will persist as LLMs evolve. I’ll address such issues in more detail later in the book, and try to put them in what I believe is the appropriate context.

For now, I’ll just say, “See what I mean about things moving quickly?”

The “soul” of a new machine

Before we get too far into this journey, I’d like to tell you more about my traveling companion, GPT-4. So far, I’ve been putting quotations around words like “knowledge,” “aware,” and “understands” when I talk about GPT-4 to signal that I, a sentient being, understand that GPT-4 is not one. It is essentially a very sophisticated prediction machine.

While GPT-4 (and other LLMs like it) aren’t conscious, they are reaching a point where their capacity to produce appropriate generations in so many different contexts is improving so fast that they can increasingly appear to possess human-like intelligence. Thus I believe that when describing LLMs, it’s acceptable—useful, even—to use words like “knowledge” and “understands” in a not-strictly literal way, just as Richard Dawkins uses the phrase “the selfish gene” in his 1976 book of that name.

A gene doesn’t have conscious agency or self-conception in the way the word “selfish” suggests. But the phrase, the metaphor, helps us humans wrap our inevitably anthropocentric minds around how the gene functions.

Similarly, GPT-4 doesn’t have the equivalent of a human mind. It’s still helpful to think in terms of its “perspective,” anthropomorphizing it a bit, because using language like “perspective” helps convey that GPT-4 does in fact operate in ways that are not entirely fixed, consistent, or predictable.

In this way, it actually is like a human. It makes mistakes. It changes its “mind.” It can be fairly arbitrary. Because GPT-4 exhibits these qualities, and often behaves in ways that make it feel like it has agency, I’ll sometimes use terminology which, in a metaphorical sense, suggests that it does. Moving forward, I’ll dispense with the quotation marks.

Even so, I hope that you, as reader, will keep the fact that GPT-4 is not a conscious being at the front of your own wondrously human mind. In my opinion, this awareness is key to understanding how, when, and where to use GPT-4 most productively and most responsibly.

At its essence, GPT-4 predicts flows of language. Trained on massive amounts of text taken from publicly available internet sources to recognize the relationships that most commonly exist between individual units of meaning (including full or partial words, phrases, and sentences), LLMs can, with great frequency, generate replies to users’ prompts that are contextually appropriate, linguistically facile, and factually correct.

They can also sometimes generate replies that include factual errors, explicitly nonsensical utterances, or made-up passages that may seem (in some sense) contextually appropriate but have no basis in truth.

Either way, it’s all just math and programming. LLMs don’t (or at least haven’t yet) learn facts or principles that let them engage in commonsense reasoning or make new inferences about how the world works. When you ask an LLM a question, it has no awareness of or insights into your communicative intent. As it generates a reply, it’s not making factual assessments or ethical distinctions about the text it is producing; it’s simply making algorithmic guesses at what to compose in response to the sequence of words in your prompt.

In addition, because the corpora2 on which LLMs train typically come from public web sources that may contain biased or toxic material, LLMs can also produce racist, sexist, threatening, and otherwise objectionable content.

Developers can take actions to better align their LLMs with their specific objectives. OpenAI, for example, has chosen to deliberately constrain the outputs that GPT-4 and its other LLMs can produce to reduce their capacity to generate harmful, unethical, and unsafe outputs—even when users desire such results.

To do this, OpenAI takes a number of steps. These include removing hate speech, offensive language, and other objectionable content from some datasets its LLMs are trained on; developing “toxicity classifiers” that can automatically flag problematic language the LLM itself might generate; and fine-tuning LLMs using curated datasets of texts that have been annotated by humans to indicate a desired output. In this way, an LLM might learn to avoid, say, making tasteless jokes about a reporter’s divorce.

These techniques don’t eliminate problematic outputs, they just reduce them. Even with various guardrails in place, an LLM itself can’t make reasoned judgments about complex ethical quandaries, or even about more straightforward questions.

Take ChatGPT, which is based on GPT-3.5, an immediate predecessor to GPT-4. Ask it for the fifth sentence of the Gettysburg Address, and it will probably get it wrong. That’s because LLMs don’t actually understand, in the way that humans understand, what the Gettysburg Address is, or what a sentence is, or even how counting works. So they can’t apply their “knowledge” of these things in a way that a human might. (“I’ll find the text of the Gettysburg Address, then count sentences until I get to the fifth one.”) Instead, an LLM is always just making statistical predictions about what the next word in a given text string should be.

From its training, though, ChatGPT has clearly come to associate the words “Gettysburg Address” with other words—specifically, the text of the speech. So when you ask ChatGPT for the fifth sentence of the Gettysburg Address, it will almost certainly give you a sentence from the Gettysburg Address, but probably not the fifth sentence.

In fact, I bet it will be the ninth sentence.

I assume this because, along with the speech’s first sentence, the ninth is frequently quoted in other texts. That means in ChatGPT’s training data, the ninth probably shows up more often than other sentences from the speech (except the very famous first). This prevalence is what causes ChatGPT to reach for it when you ask it to supply the fifth sentence.3

To ChatGPT’s credit, though, if you ask it to turn the text of the Gettysburg Address into lyrics for a Rush song, and then tell you who’d be singing it if Rush performed it, it will pass that test with flying colors.

Try it out and see what I mean.

Embracing the “AHA!” moment

As AI tools like GPT-4 become more powerful, they are intensifying long-standing concerns about AIs and robots marginalizing and even eliminating a sweeping range of human jobs: everything from customer-service reps to attorneys.

Such concerns won’t seem baseless if you’ve followed the news in recent months. In December 2022, ChatGPT passed a three-part U.S. medical licensing exam. In January 2023, it passed exams in four law school courses at the University of Minnesota.

And GPT-4 is demonstrably smarter than ChatGPT. Here, for example, is how it handled my Gettysburg Address question:

Reid: What’s the fifth sentence of the Gettysburg Address?

GPT-4: There are different ways to divide the Gettysburg Address into sentences, depending on how one punctuates the text and whether one follows the original drafts or the versions that Abraham Lincoln delivered or later revised.

However, one possible way to identify the fifth sentence of the speech is to use the Bliss Copy, which is the most widely reproduced and authoritative version of the address, and to count the sentences as follows:

  1. Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.
  2. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure.
  3. We are met on a great battle-field of that war.
  4. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live.
  5. It is altogether fitting and proper that we should do this.

Therefore, the fifth sentence of the Gettysburg Address, according to this method, is: It is altogether fitting and proper that we should do this.

Suddenly, GPT-4 seems to actually understand how counting works, and can methodically apply its knowledge to deduce the right answer. It seems like GPT-4 knows how to count Lincoln’s sentences. It seems like it’s now an expert on the Gettysburg Address. (Did you know that the “Bliss Copy” is the most “authoritative version” of the speech? I didn’t.)

Of course the key word here remains seems. GPT-4 is no more conscious than its predecessors; it’s just better at making predictions. Again, I want to make the point that GPT-4’s often-remarkable simulations of cognitive proficiency are just that: simulations. GPT-4 is not a conscious, self-aware, sentient AI entity, a la Forbidden Planet’s Robby the Robot or Star Trek’s Data.

And yet I also want to again make the point that—whoa, even GPT-4’s ability to simulate such a contextually-aware human-like consciousness is a pretty big deal.

Why do I think this? A recent critical essay that award-winning science fiction writer Ted Chiang published in the New Yorker helped me articulate why.

“Think of ChatGPT as a blurry JPEG of all the text on the Web,” Chiang writes. “It retains much of the information on the Web, in the same way that a JPEG retains much of the information of a higher-resolution image, but, if you’re looking for an exact sequence of bits, you won’t find it; all you will ever get is an approximation.”

In Chiang’s view, the inexact representation of the information that comprises ChatGPT (and presumably similar LLMs like GPT-4) is what leads to both their synthetic powers and their tendency toward hallucinations and other errors. As “JPEG[s] of all the text on the Web,” they can synthesize information in novel ways because they have access to all this information at once. That allows them to take what they know about one thing, and then also what they know about something else, and convincingly mash them up into a new thing.

Chiang gives an example involving the phenomenon of losing a sock in the dryer and the U.S. Constitution. ChatGPT knows about both of these things, so it can use its knowledge to create a new thing, a text about the first in the style of the second: “When in the course of human events, it becomes necessary for one to separate his garments from their mates, in order to maintain the cleanliness and order thereof. . . .”

Not bad, as far as it goes. But since ChatGPT exists as a merely approximate picture of the web, Chiang argues, it is (in addition to being inherently fact-challenged) quite limited as a creative force. Instead of creating something truly new, it can only “repackage information that’s already available.”

As illuminating as I found Chiang’s essay, I believe his central “jpeg of the Web” metaphor underplays LLMs’ synthetic powers.

First, I’d argue that repackaging available information actually describes an enormous share of human innovation, artistic or otherwise.

More importantly, though, LLMs actually have and use fundamentally new powers of knowledge organization.

While the web now contains an unfathomable amount of information, much of it is siloed into billions of individual pages. For example, think of questions like these:

  • Who is the tallest NFL running back to win the MVP award?
  • What cities with more than one million residents have had female mayors?
  • Who was the oldest James Bond?

The data to answer these questions is no doubt on the web already. There are sites that definitively list every NFL MVP winner. Others accurately compile NFL players’ physical stats.

But unless someone has already taken the trouble to compile and publish the exact information you’re looking for, it could be very time-consuming to find. Even if that information exists on a single website, like Wikipedia or NFL.com, it will likely be spread across multiple pages. Getting your answer will still chew up a lot of your time.

In addition, imagine when your goal is to get a concise overview of some broader interest or theme, like those I suggested at the top of this chapter (researching supply chain management issues during COVID-19, or a travel itinerary that reflects specific interests).

The takeaway: in your overall quest for authoritative information, GPT-4 helps you start somewhere much closer to the finish line than if you didn’t have it as a resource.

More importantly, it possesses this capability because it is able to access and synthesize the web’s information in a significantly different way from existing information resources like Wikipedia or traditional search engines. Essentially, GPT-4 arranges vast, unstructured arrays of human knowledge and expression into a more connected and interoperable network, thus amplifying humanity’s ability to compound its collective ideas and impact.

So what’s the best way for humanity to take advantage of this imperfect but extremely powerful new tool?

When I put that essential question to GPT-4 itself, here’s how it responded:

Reid: How should human beings view and interact with you, a powerful large language model?

GPT-4: One possible perspective is the following:

  • Human beings should view a powerful large language model (LLM) as a tool, not as a source of truth, authority, or intelligence. A powerful LLM is a complex statistical system that can generate natural language texts based on its training data, parameters, and inputs, but it does not necessarily understand, reason, or reflect the meaning, context, or implications of what it produces or receives. It may produce texts that are coherent, informative, or persuasive, but also texts that are inaccurate, misleading, or harmful, depending on the quality and diversity of its data, the design and limitations of its architecture, and the goals and biases of its users and developers.
  • Human beings should interact with a powerful LLM with caution, curiosity, and responsibility. A powerful LLM can offer valuable insights, assistance, and opportunities for human communication, creativity, and learning, but it can also pose significant risks, challenges, and ethical dilemmas for human society, culture, and values. Human beings should be aware of the potential benefits and harms of using a powerful LLM, and seek to use it in ways that are aligned with their own and others’ interests, rights, and well-being. Human beings should also be curious and critical about the workings, outputs, and impacts of a powerful LLM, and seek to understand, evaluate, and improve its reliability, transparency, and accountability. Human beings should also be responsible and respectful towards a powerful LLM, and acknowledge its limitations, uncertainties, and dependencies, as well as its contributions, achievements, and potential.

The irony here, of course, is that GPT-4 might not get funded with that pitch. The vision of AI it presents is both nuanced and strikingly different from how AI has generally been portrayed over the years. Granted, many of those portrayals have come from Hollywood, science fiction, and journalism, rather than from technologists working to make highly intelligent machines a reality.

That said, many technologists and high-tech organizations, including OpenAI, do in fact have their sights set on a much more ambitious form of AI: machines that can operate completely autonomously; machines that are capable of human-like common-sense reasoning and self-awareness.

GPT-4 is not that, at least not yet. For now, it is neither all-knowing nor infallible.

Instead, it is, in its own words, a “tool” that requires human “caution, curiosity, and responsibility” to operate most productively.

I think this is the correct perspective. If you simply let GPT-4 do all the work, with no human oversight or engagement, it’s a less powerful tool. It’s still a very human tool, of course, because human texts are the basis for its generations.

But when human users treat GPT-4 as a co-pilot or a collaborative partner, it becomes far more powerful. You compound GPT-4’s computational generativity, efficiency, synthetic powers, and capacity to scale with human creativity, human judgment, and human guidance.

This doesn’t eliminate the possibility of misuse. But, in situating human beings at the center of the new world that GPT-4 makes possible, we get what I believe to be the soundest formula for producing the best potential overall outcomes. In this approach, GPT-4 doesn’t replace human labor and human agency, but rather amplifies human abilities and human flourishing.

Of course, this way of thinking isn’t a given. It’s a choice.

When people make the choice to see GPT-4 this way, I call it an “AHA!” moment, to underscore the “Amplifying Human Abilities” perspective at the heart of that choice.

I’m writing this travelog both to encourage people to embrace this choice and also as an invitation to explore the different ways this choice might play out. What are the ways we can use GPT-4 to make progress in the world? How does it fit into humanity’s age-old quest to make life more meaningful and prosperous through technological innovation? To educate ourselves more effectively, ensure justice for everyone, and increase our opportunities for self-determination and self-expression?

At the same time, how can we appropriately address the challenges and uncertainties GPT-4 will catalyze? How do we find the right balance between responsible governance and intelligent risk as we continue to develop AI technologies that have the potential to unlock human progress at a time when the need for rapid large-scale solutions has never been greater?

It’s been a long time—centuries, arguably—since the future seemed so unmapped. Facing such uncertainty, it’s only natural to have concerns: about our jobs and careers; about the speed and scale of potential changes; about what it even means to be human in a new era of increasingly intelligent machines.

Our path forward won’t always be smooth and predictable. Sydney’s now-infamous outbursts won’t be the only grimace-inducing news story we’ll see about AI. There will be other missteps. Detours. Important course corrections.

But how could there not be?

Human progress has always required risk, planning, daring, resolve, and especially, hope. That’s why I’m writing this travelog: to add my voice to those counseling all these things, hope most of all. Facing uncertainty with hope and confidence is the first step toward progress, because it’s only when you have hope that you see opportunities, potential first steps, a new path forward.

If we make the right decisions, if we choose the right paths, I believe the power to make positive change in the world is about to get the biggest boost it’s ever had.

Are you ready for this journey?


1 I sit on Microsoft’s Board of Directors.
2 “Corpora” is the plural of a “corpus,” which in this context refers to a collection of written texts used for language research.
3 Keep in mind that if you run this prompt, you might get a different sentence, including the correct one, because even when ChatGPT responds to the same exact prompt, it won’t always make the same prediction.