When used in creative pursuits, generative AI helps people accomplish wonders. But when used in strictly business problem-solving contexts, it’s essentially a flop.
That’s the word from Boston Consulting Group, which recently conducted an experiment on generative AI delivery with 750 of its consultants. Researchers from Harvard Business School, MIT Sloan School of Management, the Wharton School at the University of Pennsylvania and the University of Warwick assisted with the analysis.
In the experiment, participants using OpenAI’s GPT-4 for a creative product innovation task outperformed the control group—those who completed the task without using GPT-4—by 40%, the BCG team reports.
However, for business problem solving, using GPT-4 resulted in performance that was 23% lower than that of the control group.
The study was designed to test the use of generative AI in professional-services settings through tasks that reflect what employees do every day. The “creative” task studied involved coming up with ideas for new products and go-to-market plans. The “business problem-solving” task asked participants “to identify the root cause of a company’s challenges based on performance data and interviews with executives,” the study’s authors explain.
When used for business problem-solving, in which there needs to be a firm “right answer” versus the more open-ended creative delivery, GPT tended to deliver questionable results.
It’s easier for large language models “to come up with creative, novel, or useful ideas based on the vast amounts of data on which they have been trained,” the researchers conclude. However, when asked to weigh qualitative and quantitative data to answer a complex question, “GPT-4 was likely to mislead participants if they relied completely on the tool, and not also on their own judgment.”
This is where people need to be very cautious, they continue. At least 85% of the participants in the non-AI control group were capable of finding the answer to the business problem-solving task on their own. “Yet many participants who used GPT-4 for this task accepted the tool’s erroneous output at face value. It’s likely that GPT-4’s ability to generate persuasive content contributed to this result.”
There’s a paradox in generative AI adoption, the study’s authors suggest. “People seem to mistrust the technology in areas where it can contribute massive value and to trust it too much in areas where the technology isn’t competent.” Indeed, participants admitted “they found the rationale GPT-4 offered for its output very convincing.”
Even training on prompts did not change the equation. “The negative effects of GPT-4 on the business problem-solving task did not disappear when subjects were given an overview of how to prompt GPT-4 and of the technology’s limitations,” the researchers point out.
There is a trap in employing generative AI too heavily for creative tasks—a potential for too much conformity. “Because GPT-4 provides responses with very similar meaning time and again to the same sorts of prompts, the output provided by participants who used the technology was individually better but collectively repetitive,” the researchers caution. “The diversity of ideas among participants who used GPT-4 for the creative product innovation task was 41% lower compared with the group that did not use the technology.”
The results point to the need for business leaders and managers to think critically about how and why generative AI is being used within their operations. They “need to continually revisit their decisions as the frontier of genAI’s competence advances.”
Read the full article here