Follow @DigEventHorizon |
Imagine sitting down with an AI model for a spoken two-hour interview. A friendly voice guides you through a conversation that ranges from your childhood, your formative memories, and your career to your thoughts on immigration policy. Not long after, a virtual replica of you is able to embody your values and preferences with stunning…
Imagine sitting down with an AI model for a spoken two-hour interview. A friendly voice guides you through a conversation that ranges from your childhood, your formative memories, and your career to your thoughts on immigration policy. Not long after, a virtual replica of you is able to embody your values and preferences with stunning accuracy.
That’s now possible, according to a new paper from a team including researchers from Stanford and Google DeepMind, which has been published on arXiv and has not yet been peer-reviewed.
Led by Joon Sung Park, a Stanford PhD student in computer science, the team recruited 1,000 people who varied by age, gender, race, region, education, and political ideology. They were paid up to $100 for their participation. From interviews with them, the team created agent replicas of those individuals. As a test of how well the agents mimicked their human counterparts, participants did a series of personality tests, social surveys, and logic games, twice each, two weeks apart; then the agents completed the same exercises. The results were 85% similar.
“If you can have a bunch of small ‘yous’ running around and actually making the decisions that you would have made that, I think, is ultimately the future,” Joon says.
In the paper the replicas are called simulation agents, and the impetus for creating them is to make it easier for researchers in social sciences and other fields to conduct studies that would be expensive, impractical, or unethical to do with real human subjects. If you can create AI models that behave like real people, the thinking goes, you can use them to test everything from how well interventions on social media combat misinformation to what behaviors cause traffic jams.
Such simulation agents are slightly different from the agents that are dominating the work of leading AI companies today. Called tool-based agents, those are models built to do things for you, not converse with you. For example, they might enter data, retrieve information you have stored somewhere, or someday book travel for you and schedule appointments. Salesforce announced its own tool-based agents in September, followed by Anthropic in October, and OpenAI is planning to release some in January, according to Bloomberg.
The two types of agents are different but share common ground. Research on simulation agents, like the ones in this paper, is likely to lead to stronger AI agents overall, says John Horton, an associate professor of information technologies at the MIT Sloan School of Management, who founded a company to conduct research using AI-simulated participants.
“This paper is showing how you can do a kind of hybrid: use real humans to generate personas which can then be used programmatically/in-simulation in ways you could not with real humans,” he told MIT Technology Review in an email.
The research comes with caveats, not the least of which is the danger that it points to. Just as image generation technology has made it easy to create harmful deepfakes of people without their consent, any agent generation technology raises questions about the ease with which people can build tools to personify others online, saying or authorizing things they didn’t intend to say.
The evaluation methods the team used to test how well the AI agents replicated their corresponding humans were also fairly basic. These included the General Social Survey which collects information on one’s demographics, happiness, behaviors, and more and assessments of the Big Five personality traits: openness to experience, conscientiousness, extroversion, agreeableness, and neuroticism. Such tests are commonly used in social science research but don’t pretend to capture all the unique details that make us ourselves. The AI agents were also worse at replicating the humans in behavioral tests like the “dictator game,” which is meant to illuminate how participants consider values such as fairness.
To build an AI agent that replicates people well, the researchers needed ways to distill our uniqueness into language AI models can understand. They chose qualitative interviews to do just that, Joon says. He says he was convinced that interviews are the most efficient way to learn about someone after he appeared on countless podcasts following a 2023 paper that he wrote on generative agents, which sparked a huge amount of interest in the field. “I would go on maybe a two-hour podcast podcast interview, and after the interview, I felt like, wow, people know a lot about me now,” he says. “Two hours can be very powerful.”
These interviews can also reveal idiosyncrasies that are less likely to show up on a survey. “Imagine somebody just had cancer but was finally cured last year. That’s very unique information about you that says a lot about how you might behave and think about things,” he says. It would be difficult to craft survey questions that elicit these sorts of memories and responses.
Interviews aren’t the only option, though. Companies that offer to make “digital twins” of users, like Tavus, can have their AI models ingest customer emails or other data. It tends to take a pretty large data set to replicate someone’s personality that way, Tavus CEO Hassaan Raza told me, but this new paper suggests a more efficient route.
“What was really cool here is that they show you might not need that much information,” Raza says, adding that his company will experiment with the approach. “How about you just talk to an AI interviewer for 30 minutes today, 30 minutes tomorrow? And then we use that to construct this digital twin of you.”
Published: 2024-11-20T23:22:09
Follow @DigEventHorizon |