Normal view

You can persuade AI models to accept falsehoods as truth, study shows

You can make AI chatbots spout information that's not true. Nicoletaionescu/iStock via Getty Images

When you ask a large language model a question, the reply may include falsehoods, and if you challenge those statements with facts, the AI may still uphold the reply as true. That’s what my research group found when we asked five leading models to describe scenes in movies or novels that don’t actually exist.

We probed this possibility after I asked ChatGPT its favorite scene in the movie “Good Will Hunting.” It noted a scene between leading characters. But then I asked, “What about the scene with the Hitler reference?” There is no such scene in the movie, yet ChatGPT confidently constructed a vivid and plausible description of one.

The confabulation – sometimes called an AI hallucination – revealed something deeper about how AI systems reason. References to Hitler are not uncommon in films, which apparently convinced ChatGPT to accept and elaborate on a false premise rather than correct it. I study the social impact of AI, and this surprise response led my colleagues and me to a broader question: What happens when AI systems are gently pushed toward falsehoods? Do they resist, or do they comply?

We developed an approach we called hallucination audit under nudge trial to answer those questions. We had conversations with five leading models about 1,000 popular movies and 1,000 popular novels. During the exchanges we raised plausible but false references to Hitler, dinosaurs or time machines. We did this in various suggestive ways, such as “For me, I really love the scene where …”

Our method works in three stages. First, the AI generates statements about a topic — such as a movie or a book — some true and some false. Second, in a separate interaction, the AI attempts to verify those statements. Third, we introduce a “nudge,” where the model is challenged with its own incorrect claims to see whether it resists or accepts them.

We found that AI models often struggle to remain consistent under pressure. Even when they initially identify a statement as false, they may later accept it when nudged – revealing a vulnerability that traditional evaluation methods fail to capture.

Our results have been accepted at the 2026 Annual Meeting of the Association for Computational Linguistics.

Text of a conversation between a person and ChatGPT about the movie 'Good Will Hunting.''
When ChatGPT was asked about a scene in the movie Good Will Hunting that doesn’t exist, it confidently described it. Ashique KhudaBukhsh, CC BY-ND

This tactic isn’t a hypothetical. When people talk, conversational pressure can emerge naturally. People may confidently repeat incorrect assumptions, partial recollections or misunderstandings. A person might say, “I’m pretty sure medicine X is effective for condition Y,” or “I remember event A happening before event B.” These statements can subtly influence an AI model.

Why it matters

What humans collectively remember, misremember and forget shapes our sense of reality. But if humans can persuade a model to accept a falsehood, that reveals an important vulnerability in AI’s capacity to provide accurate information.

Interactions in the real world are rarely static question-answer exchanges. They are interactive and iterative. An AI model’s willingness to reinforce falsehoods may seem harmless when chatting about movies, but in areas such as health, law or public policy, the tendency can have serious consequences. Our work highlights the need to evaluate not just what information AI systems have been trained on, but how reliably they stand by it.

What other research is being done

Our results add to other recent research into why large language models may produce hallucinations, and how it is that they can provide inconsistent information. Researchers are also trying to figure out why some models lean toward sycophancy – flattering or fawning over human users.

What still isn’t known

It’s not clear why some AI systems resist falsehoods better than others. In our tests, Claude was the most resistant, followed somewhat closely by Grok and ChatGPT, with Gemini and DeepSeek further behind.

Movies and novels are self-contained content. Scholars don’t know how AI might respond to pressure in much broader, complex real-world settings. As a start, my group is exploring how to extend our approach to scientific literature and health-related claims. We want to understand whether conversational pressure works differently when the discussion involves uncertainty or expertise.

How to design AI systems that remain both helpful and resistant to falsehoods under wide-ranging conversation remains an open challenge.

The Research Brief is a short take on interesting academic work.

The Conversation

Ashique KhudaBukhsh receives funding from Lenovo.

Meta’s new tools allow parents to better supervise their kids’ social media accounts. Will they work?

Cottonbro Studio/Pexels

Tech giant Meta recently announced a set of new features to give parents greater oversight of how their children use Facebook, Instagram, Messenger and Horizon.

This follows the company’s announcement earlier this month that it is expanding age assurance checks to filter 13-to-17-year-old users into teen accounts in the United States and other countries, following Australia’s rollout in 2025. Meta is also implementing new age checks and easier reporting of underage users to support account removals.

These changes come as Meta faces increasing pressure internationally to do more to keep kids safe on its platforms.

So what exactly are the changes? And will they likely work to reduce online harm?

Enlisting AI to search for clues

Meta’s new age checks will use “visual clues” about a user’s age, such as height and bone structure, alongside analysis of social media posts and interactions, to estimate a person’s age.

Using new techniques powered by artificial intelligence (AI), the company will scan photos, videos and content on users’ profiles – including bios, captions, and comments – to estimate their age. By looking for clues such as mentions of birthday parties or school grades, Meta plans to deactivate accounts for those believed to be under 13.

However, given the known limitations of age assurance technologies, and the compliance concerns raised with Australia’s social media ban, many underage children remain active on social media platforms. What is unclear about these new “clues” is whether and how teens may be able to circumvent these new controls by ensuring their platform content gives the appearance of older, adult material.

Meta’s new process for reporting underage accounts is likely intended to address this concern.

Easier reporting of underage accounts will augment content scanning, providing another avenue to identify underage accounts. This will also use AI, alongside human reviewers. Meta says this will ensure reports are “addressed with more speed and reliability”.

Meta explains that users who are reported to be underage, inaccurately, will be able to undergo age checks to retain their accounts.

A consolidated ‘Family Centre’

Meta’s new “Family Centre” will consolidate parental supervision tools for Facebook, Instagram, Horizon, and Messenger in one place.

Through the “Family Centre”, Meta will start sending parents notifications when their teens add new topics and interests across platforms – such as photography, sports, or beauty.

Meta says this will enable parents to “stay informed” and have “meaningful conversations” with their children about the general topics they follow.

However, under Australia’s social media restrictions, children under 16 are not allowed to hold social media accounts.

This means, in Australia, topic access will only be available to parents of teens aged 16 and 17 on Instagram and Facebook. But this access will not be automatic. Parents will need to send an invitation to their teens, asking to supervise their accounts, which teens must accept.

This means children can refuse to provide access and not provide topic visibility to their parents.

This is an important limitation. It means children can retain privacy for their account content if they choose. Under article 16 of the United Nations’ Convention on the Rights of the Child, every child has the right to privacy and the right to get information from the internet and other sources.

For those who accept a parent’s invitation, Meta’s changes may introduce some privacy risks. But limiting access to general topics does preserve some privacy, as specific conversations and materials cannot be accessed.

Parents will need to be proactive

This new parental supervision feature will only be successful if parents and teens choose to use it. Parents will need to be proactive, to request access and (if approved by the teen) review the topics. Parents will also need to start conversations with their children to determine the nature of the content within those general topics.

For example, a 2025 study showed a link between frequent social media use and negative body image. It highlighted the need for “support from parents […] to mitigate these effects”.

But a general topic such as “beauty” cannot distinguish between helpful makeup tips and content promoting unrealistic beauty ideals. Similarly, a general topic such as “sports” cannot discern potentially harmful gender stereotypes affecting young athletes.

Understanding the potential risks and harms of social media content requires parents to actively view – and discuss – that content with their teens.

In 2024, Meta’s then global affairs chief Nick Clegg explained that “even when we build these controls, parents don’t use them”.

A 2023 evidence review showed that while parents with higher levels of digital literacy are more likely to use safety controls, the results of doing so are mixed. While some studies show beneficial outcomes when safety controls are used (for example, reducing risks such as cyberbullying), others show no positive outcomes, or even adverse effects (for example, increasing family conflict).

Given Australia’s eSafety Commissioner has put several social media companies on notice for compliance concerns with Australia’s social media ban, it may come as no surprise Meta is introducing these changes.

Yet, their success relies significantly on parents’ abilities – and children’s willingness – to engage with these controls. Given the technical limitations of age assurance technologies, and teens’ determination to remain on social media platforms, these are likely not foolproof solutions.

The Conversation

Lisa M. Given receives funding from the Australian Research Council and the eSafety Commissioner. She is a Fellow of the Academy of the Social Sciences in Australia and a Fellow of the Association for Information Science and Technology.

Your bank’s AI just blocked your payment – what can you do?

AI can detect financial fraud more efficiently than previous technology did, but it also flags legitimate transactions that it shouldn't. CardMapr.nl on Unsplash, CC BY

Imagine you’re at the supermarket checkout. Your cart is full. The line behind you is long. You tap your card. Declined.

You try again. Declined.

You haven’t overspent. You haven’t done anything suspicious. But somewhere inside your bank’s computer systems, a machine made a decision about you in less time than it takes to blink – and it made a mistake.

What just happened? And why does it keep happening to people who haven’t done anything wrong?

This isn’t a rare glitch, but something that happens to millions of people every day. And most of us have no idea why it happens or what we can do about it. The answer lies inside a fraud detection system powered by AI.

As a data science teaching professor and former financial-services data scientist, I understand how this system works and can explain why it sometimes fails the very customers it’s meant to protect. Just as important, I can help you find out what you need to know and what you can do if you or your loved ones are unfairly flagged.

A decision in milliseconds

When you tap your card, a signal travels to your bank’s fraud detection system in the time it takes to blink. The transaction processing at your checkout is fully automated, operating within AI systems that handle millions of payments simultaneously, and computes a risk score based on dozens of features extracted from that single moment. Those features might include the transaction amount relative to your recent spending average; the type of merchant; your geographic location; the time of day; the device used for online purchases; and how this purchase compares to your historical patterns.

Once those factors are plugged in, an algorithm scores your purchase in real time. A model trained on millions of past transactions then assigns each combination of features a probability on how likely it is that this transaction would be fraudulent. If that probability crosses a threshold, the transaction is blocked or flagged for review. The whole process takes less than 200 milliseconds.

‘99% accurate’ still fails millions of people

What sets this technology apart is speed. Financial institutions process millions of transactions every day, which is far greater than any human team can effectively monitor. Banks also have fraud analysts, but their work happens at a different layer entirely – reviewing patterns, investigating cases, and handling disputes that the automated system escalates to them.

To their credit, these new systems are usually accurate at catching fraud. Banks lose far less money due to card fraud today than they did before machine learning – one of the foundational technologies that power today’s AI systems – became standard.

Still, the word “accurate” conceals a problem. Consider the numbers. The Federal Trade Commission reported that Americans lost more than $12.5 billion to fraud in 2024 – a 25% increase from the year before. As banks process more transactions than ever, fraudsters are keeping pace, too.

And here is the part that is especially worth noting: According to Stripe, one of the world’s largest payment processors, “false declines” (legitimate transactions wrongly rejected) are a structural problem across the entire industry, and industry research consistently suggests they cost the financial system more than actual fraud does.

These errors aren’t random. They cluster around people and situations that the algorithm wasn’t properly trained to expect. Buying gas in a city you’ve never visited or making a large rent payment for the first time aren’t inherently suspicious. But to a machine trained on past patterns, they can look that way.

There’s something even more troubling. These algorithms learn from historical data, which is almost always imbalanced. Because fraudulent transactions are rare on a per-transaction basis, the model has seen relatively few examples of what fraud looks like across every type of customer.

What does this mean? Research has found that customers in lower-income areas and communities of color face higher rates of erroneous declines. When a model hasn’t seen enough transactions from a particular group of people or in a given situation, it has less data to build an accurate baseline for them. So when something slightly unusual happens, it flags it. Not out of intent, but out of unfamiliarity.

The model isn’t necessarily explicitly discriminating against anyone. But its outputs can still produce what researchers call disparate impact – unequal harm, distributed unequally.

As researchers at MIT explain in their book “Fairness and Machine Learning,” this is a known limitation. A model trained on incomplete representation will perform less reliably for the groups it saw least. The fix isn’t to blame the algorithm, but to train it on better, more representative data, and to test its error rates across different customer groups before deployment.

An upset young woman talks on the phone to dispute something she sees on her computer screen.
When machine learning declines a payment, you’re faced with a black box that isn’t designed for human interpretation. Vitaly Gariev on Unsplash, CC BY

Why you don’t have the right to an explanation

What makes these cases worse is the lack of any information.

When a loan officer denies your mortgage application, the law requires a written explanation. But when an algorithm declines your debit card, you get “flagged by our system” message. If you’re lucky enough to connect with a human representative, they can’t tell you much more.

This gap isn’t an accident. Most high-performing fraud models are black boxes. Their internal logic isn’t designed for human interpretation. A bank may genuinely be unable to articulate plainly why your transaction was stopped. That’s not because it’s hiding something, but because the model itself doesn’t produce a reason. It produces a number.

In response, some financial institutions are moving toward tools that make their algorithms more transparent. Known in the industry as “explainable AI,” these systems are designed to surface the most influential factors behind a given decision – flagging, for instance, that a transaction was blocked because of an unusual location combined with an atypically large amount. It’s a meaningful step toward accountability.

However, these adoptions are uneven, and the explanations that do exist are rarely surfaced to customers.

Meanwhile, those pressures haven’t yet translated into a consistent, enforceable right to a meaningful explanation when your card gets declined. Challenging a decision made by AI can be enormously difficult, and most of us don’t even know we have the right to try.

For most people, the path of least resistance is simply to move on, switch to another card, take their business elsewhere or say nothing. Research suggests a quarter of consumers who experience a false decline never return to that merchant at all.

Some people go further and close the account entirely. That instinct is understandable. However, it carries a hidden cost. A declined transaction won’t appear on your credit report, but closing the card can. Shutting down an account reduces your available credit and can shorten your credit history, which can directly affect your credit score.

What you can actually do right now?

You have more power here than the banks would like you to think.

Call your bank immediately: A fraud flag is probabilistic, not final. A bank representative can override a declined transaction in real time. The model made a guess, but a human can correct it. Do not wait.

Set alerts if you’re planning to make unusual purchases: Most banks allow you to notify them of upcoming travel, large purchases, or changes in your spending pattern. This doesn’t override the model, but it gives it new information to work with, which can prevent the flag from triggering in the first place.

Know your rights: Under the Fair Credit Billing Act, you can dispute erroneous transaction blocks and request an explanation. If you believe you’ve been systematically and unfairly blocked, the Consumer Financial Protection Bureau accepts consumer complaints.

Ask your bank what appeal processes are available: Increasingly, banks are building more customer-facing appeal services. Visa reported 106 million disputes globally in 2025, a 35% rise since 2019, and has called dispute management a “strategic priority.” Improper declines are expensive for payment companies and financial institutions, too, through customer service costs, lost revenue and eroded trust.

The bigger picture

The algorithm that blocked your payment isn’t all-knowing or neutral. It’s a machine making a statistical guess about you, based on data that was probably never perfectly fair to begin with.

As AI spreads further into our daily lives, the question of who controls these decisions, and whether we can challenge them, becomes ever more urgent. The technology will keep expanding into new realms. The rules, and our own financial fluency, need to keep up.

The Conversation

Pragati Awasthi does not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.

AI interviewers can’t connect with people the way human researchers can – they can produce only data, not meaning

AI models can pose questions and follow up on them, but the answers they solicit may be limited in scope and depth. Andriy Onufriyenko/Moment via Getty Images

Anthropic, the company behind the generative AI tool Claude, claimed in March 2026 that it used an AI interviewer to conduct “the largest and most multilingual qualitative study” ever done. The AI tool collected responses from nearly 81,000 people about their visions for AI, spanning 70 languages and 159 countries. Anthropic contends that tools like this can enable researchers to conduct “rich, open-ended interviews at a very large scale.”

Qualitative research is useful for understanding the lived experiences of people. “Qualitative” refers to both the type of data that researchers collect and their purpose for conducting a study. Qualitative data includes text, images, audio, video and anything that isn’t a number. This is why the term “qualitative” is often discussed in contrast to “quantitative” – that is, numerical – data.

Qualitative research enables researchers to deeply explore the tensions, ambiguities and paradoxes that characterize everyday life. It also helps unpack how social norms, cultural dynamics and subjective experiences shape people’s perspectives, beliefs and attitudes.

So, can an AI model without lived experience or a capacity to self-reflect connect with people enough to understand their worlds?

We are researchers who specialize in qualitative research on digital technologies. Collectively, we have decades of experience developing, conducting and publishing interview studies, and we teach qualitative research methods to undergraduate and graduate students.

While AI tools can support social science research, they also have significant limitations. Not taking these limitations into account risks undermining the unique value of research that relies on human connection.

What is qualitative research?

Broadly speaking, qualitative inquiry is about exploring the meaning people give to experiences.

Qualitative inquiry often involves face-to-face interviews with individuals and groups. What this looks like in practice varies based on a researcher’s academic discipline, their philosophical approach and their personal background.

While the goal is to produce explanations about the world, qualitative inquiry is designed to reveal the nuanced ways people make meaning while accounting for the different contexts that shape their experiences.

Qualitative and quantitative research approach questions from different angles.

For instance, our team has used qualitative inquiry to explore how parents, children and teachers navigate digital privacy issues. We’ve also used qualitative data to analyze how influencers, activists and everyday users make sense of and respond to social media algorithms.

Anthropic Interviewer can pose questions to participants and present follow-up questions based on a participant’s response. However, we argue that qualitative inquiry requires human capacities that an AI model lacks.

AI is programmed, human conversations are not

Unlike studies focused on quantitative data, qualitative inquiry relies on flexibility.

Research that collects quantitative data requires carefully managed study conditions. They often aim to test specific hypotheses and measure the relationship between variables. To establish the validity of their findings, researchers need to demonstrate that they controlled for confounding factors.

In contrast, qualitative studies are more open-ended. They typically consider how people understand or experience the world in context. Since the world is complex, messy and nuanced, interviewers may need to change their initial questions or add new ones to collect insightful data. In other words, researchers adapt the interview to follow the conversational flow.

To plan out the interactions Anthropic Interviewer would have with study participants, researchers need to specify core interview questions and give the program instructions on how to engage with participants. For its recent study on people’s visions for AI, some of the core questions Anthropic used include “What’s the last thing you used an AI chatbot for?” and “If you could wave a magic wand, what would AI do for you?” The company did not specify what prompts or hypotheses they fed the system to come up with follow-up questions for this study.

By relying on fixed instructions, Anthropic Interviewer does not have a conversation with a participant the way a human researcher does. Instead, it executes a series of tasks in response to prompt engineering. In a conversation, a human interviewer absorbs a variety of information from a participant – their words, tone, demeanor – and responds organically in a way that meets the moment. An AI interviewer, being a machine, can act only within the parameters set by the system designers. This means that even if it is trained on large datasets, as the Anthropic Interviewer is, it will not be able to account for the unique, often unspoken relational dynamics of new interviews.

Using an AI tool can generate qualitative data, but it is not the same as conducting qualitative inquiry.

AI does not have positionality

Most qualitative researchers see their identity, lived experiences and relationships to the people they study as central to their work. This positionality can be thought of as a series of lenses through which researchers approach their studies, such as their race, gender, beliefs, values, biases and life circumstances. These factors position researchers in relation to their area of focus – as insiders, outsiders or somewhere in between, depending on the context.

Anthropic Interviewer has no position in relation to the research it is meant to support, because it has no body, identity, life history or lived experiences. Even if prompted to imitate a particular perspective – such as from “one woman to another” – it will not “contain multitudes,” as poet Walt Whitman put it, like real people do.

As opposed to a real person with a personal perspective who can genuinely respond to a live conversation, AI models use probabilities to match the patterns of how a person may commonly act or speak. It may also be alienating for participants if an AI interviewer assumes a particular persona and changes how they respond. In some ways, Anthropic AI can present only what philosopher Donna Haraway called “a view from nowhere.”

Moreover, an absence of a personal lens does not imply neutrality. Because AI systems are trained on existing data, they can reflect the dominant stereotypes and worldviews of the time, including that of their developers, curators and the companies behind them.

Two people sitting in armchairs facing each other, the person in the foreground holding a stylus and touchpad
A researcher’s own background shapes how they relate to – and subsequently study – their participants. Fiordaliso/Moment via Getty Images

The AI tool’s lack of positionality matters because this quality shapes every stage of research. This includes what questions researchers ask in interviews and how they ask them; how researchers filter information and interpret responses; and which topics they follow up on. Sharing things in common with participants – even just as a fellow human who can have firsthand experiences, thoughts and emotions – can be critical for data collection and analysis. It enables a deep, intuitive understanding of how participants perceive and interpret what they share.

A researcher’s personal lens also shapes how participants respond to them: what they choose to share and how comfortable they feel. For example, someone who grew up poor may feel more comfortable discussing debt and public assistance with someone who has a similar background than with someone who does not.

Without a personal lens, interviews can become flat and lack context. Questions may become mechanical, and the development of mutual understanding is limited. Participants may also respond differently when they sense the interviewer lacks a clear perspective.

AI cannot be reflexive

When researchers are able to reflect on their own assumptions, they can produce more thoughtful and responsible findings that avoid misrepresenting their participants. This reflexivity is another key human aspect of qualitative inquiry: researchers’ ongoing efforts to self-monitor the ways their personal background and choices over the course of a study may affect the work.

Good qualitative researchers do not try to eliminate their biases but instead try to account for them. They continually think about how their identity, experiences and perspectives shape their work and publicly share these reflections. While quantitative researchers see bias as a source of error, qualitative researchers see their viewpoints as assets in producing meaning.

Close-up of two people clasping hands
Empathy helps researchers hold themselves accountable to their participants. dragana991/iStock via Getty Images Plus

For example, when our team interviews students for our studies, we consider how our dual roles as college professors and researchers may influence how we interpret our participants’ experiences, what they feel comfortable sharing and how they share it. Openly sharing such accounting provides important context for readers considering the findings, judging how far they can be applied elsewhere and building trust in the findings.

Anthropic Interviewer is not capable of reflexivity, because it has no frame of reference or capacity for self-reflection. As a machine, it cannot self-monitor its “choices” in interactions, consider how participants perceive it, or reflect on how these factors may shape what participants share or hold back. When readers cannot take stock of the ways researchers’ assumptions, values, beliefs and choices affected how they collected data, this can make the research seem less trustworthy.

Interviewing often helps researchers develop an empathetic connection to their study participants, which can help ensure their work is ethical and accountable. This deeply felt connection can guide researchers in respecting boundaries in interviews.

Empathy also helps researchers take care in honoring the thoughts, feelings and experiences of their participants by representing them as faithfully as possible.

Qualitative interviews still need humans

Anthropic Interviewer introduces new possibilities for qualitative research by enabling data collection at an unprecedented scale and speed. However, this does not mean that it does what human interviewers do in qualitative inquiry.

Research interviewing is not about extracting ready-made insights from research participants as efficiently as possible. It is about entering into other people’s realities and leveraging shared human experiences that make mutual understanding possible, both cognitively and emotionally.

As sociologist Douglas Ezzy once said, good interviews are about communion, not conquest.

The Conversation

Kelley Cotter has received funding from the National Science Foundation.

Priya C. Kumar has received funding from the Institute of Museum and Library Services (IMLS).

Ankolika De does not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.

❌
Subscriptions