Though computer scientists have been using chatbots to simulate human thinking for more than 70 years, 2023 is fast becoming the year in which educators are realizing what artificial intelligence means for their work.
Over the past several weeks, they’ve been putting OpenAI’s ChatGPT through its paces on any number of professional-grade exams in law, medicine, and business, among others. The moves seem a natural development just weeks after the groundbreaking, free (for now) chatbot appeared. Now that nearly anyone can play with it, they’re testing how it performs in the real world — and figuring out what that might mean for both teaching skills like writing and critical thinking at the K-12 level, and training young white-collar professionals at the college level.
Most recently, four legal scholars at the University of Minnesota Law School tested it on 95 multiple choice and 12 essay questions from four courses. It passed, though not exactly at the top of its class. The chatbot scraped by with a “low but passing grade” in all four courses, a C+ student.
But don’t get complacent, warned Daniel Schwarcz, a U of M professor and one of the study’s authors. The AI earned that C+ “relative to incredibly motivated, incredibly talented students … and it was holding its own.”
Think of it this way, Schwarcz said: Plenty of C+ students at the university go on to graduate and pass the bar exam.
ChatGPT debuted less than three months ago, and its respectable performance on several of these tests is forcing educators at both the K-12 and college level to quickly rethink how they evaluate students — assigning generic written essays, for instance, now seem like an invitation for fraud.
But it’s also, at a more basic level, forcing educators to reconsider how to help students see the value of learning to think through the material for themselves.
Before he encountered ChatGPT, Schwarcz typically gave open-book exams. What the new technology is making him think more deeply about is whether he was often testing memorization, not thinking. “If that’s the case, I’ve written a bad exam,” he said.
And like Schwarcz, many educators now warn: With improving technology, today’s middling chatbot is tomorrow’s Turing valedictorian.
“If this kind of tool is producing a C+ answer in early 2023,” said Andrew M. Perlman, dean of Suffolk Law School in Boston, “what’s it going to be able to do in 2026?”
Fake studies and ‘human error’
Lawyers aren’t the only professionals in the chatbot’s crosshairs: In January, Christian Terwiesch, a business professor at the University of Pennsylvania’s Wharton School, let it loose on the final exam of Operations Management, a “typical MBA core course” at the nation’s pre-eminent business school.
While the AI made several “surprising” math mistakes, Terwiesch wrote in the study’s summary, it impressed him with its ability to analyze case studies, among other tasks. “Not only are the answers correct, but the explanations are excellent,” he wrote.
Its final grade: B to B-.
A Wharton colleague, Ethan Mollick, in December told NPR that he got the chatbot to write a syllabus for a new course, as well as part of a lecture. And it generated a final assignment with a grading rubric. But its tendency to occasionally deliver erroneous answers from its wide-ranging web searches, Mollick said, makes it more like an “omniscient, eager-to-please intern who sometimes lies to you.”
Indeed, AI tools often create problems of their own. In January, Jeremy Faust, an emergency medicine physician at Brigham and Women’s Hospital in Boston, asked ChatGPT to diagnose a 35-year-old woman with chest pains. The patient, he specified, takes birth control pills but has no past medical history.
After a few rounds of back-and-forth, the bot, which Faust cheekily referred to as “Dr. OpenAI,” said she was probably suffering from a pulmonary embolism. When Faust suggested it could also be costochondritis, a painful inflammation of the cartilage that connects rib to breastbone, ChatGPT countered that its diagnosis was supported by research, specifically a 2007 study in the European Journal of Internal Medicine.
Then it offered a citation for a paper that does not exist.
While the journal is real — and a few of the researchers cited have published in it — the bot created the citation out of thin air, Faust wrote. “I’m a little miffed that rather than admit its mistake, Dr. OpenAI stood its ground, and up and confabulated a research paper.”
Confronted with its lie, the AI “said that I must be mistaken,” Faust wrote. “I began to feel like I was Dave in “2001,” and that the computer was HAL-9000, blaming our disagreement on ‘human error.’”
Faust closed his computer.
‘Proof of original work’
Such bugs haven’t stopped educators from test-driving these tools for students and, in a few cases, for professionals.
Last December, just days after Open AI released ChatGPT, Perlman, the Suffolk dean, presented it with a series of legal prompts. “I was interested in just pushing it to its limits,” he said.
Perlman transcribed its mostly respectable replies and co-authored a 16-page paper with the chatbot.
Peter Gault, founder of the AI literacy nonprofit Quill.org, which offers a free AI tool designed to help improve student writing, said that even if teachers think things are moving fast this winter, the reality is that things are moving even faster than they seem. Case in point: An online “prompt engineering” channel on the social platform Discord, devoted to helping students improve their ChatGPT requests for better, more accurate results, now has about 600,000 users, he said. “There are tens of thousands of students just swapping tips for how to cheat in it,” he said.
While other educators have suggested that future ChatGPT versions could feature a kind of digital watermarking that identifies cut-and-pasted AI text, Gault said that would be easy to circumvent with software that basically launders the text and removes the watermark. He suggested that educators begin thinking now about how they can use tools like Google Docs’ version history to reveal what he calls “proof of original work.”
The idea is that educators can see all the writing and revising that go into student essays as they take shape. The typical student, he said, spends nine to 15 hours on a major essay. Google Docs and other tools like it can show that progression. Alternatively, if a student copies and pastes an essay or section from a tool like ChatGPT, he said, the software reveals that the student spent just moments on it.
“We have these tools that can do the thinking for us,” Gault said. “But as the tools get more sophisticated, we just really risk that students are no longer really investing in building intellectual skills. It’s a difficult problem to solve. But I do think it’s worth solving.”
‘Resistance is futile’
Minnesota’s Schwarcz flatly said law schools must train students on tools like ChatGPT and its successors. These tools “are not going away — they’re just going to get better,” he said. “And so in my mind, ultimately as educators, the fundamental thing is to figure out how to train students to use these tools both ethically and effectively.”
Perlman also foresees law schools using tools like ChatGPT and whatever comes next to train lawyers, helping them generate first drafts of legal documents, among other products, as they learn their trade.
In the end, AI could streamline lawyering, allowing attorneys to spend more time practicing “at the top of their license,” Perlman said, engaging in more sophisticated legal work for clients. This, he said, is the part of the job lawyers find most enjoyable — and clients find most valuable.
It could also make such services more affordable and thus more available, Perlman said. So even as educators focus on the technology’s threat, “I think we are quickly going to have to pivot and think about how we teach students to use these tools to enable them to deliver their services better, faster and cheaper in the future.”
Perlman joked that the best way to think about the future of AI in the legal profession is to remember that old “Star Trek” maxim: “’Resistance is futile.’ This technology is coming, and I think we ignore it at our peril — and we try to resist at our peril.”