Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Gap (GAP) earnings Q1 2026

    May 28, 2026

    Gap, American Eagle fail to reassure investors on apparel demand; shares fall

    May 28, 2026

    LLMs believe false statements even after explicit warnings that they’re false

    May 28, 2026
    Facebook X (Twitter) Instagram
    Addison Markets
    • Home
    • USA
    • Europe
    • Business
    • Investing
    • Tech
    • Politics
    • Contact Us
    Addison Markets
    Home»Tech»LLMs believe false statements even after explicit warnings that they’re false
    Tech

    LLMs believe false statements even after explicit warnings that they’re false

    franperez66q@protonmail.comBy franperez66q@protonmail.comMay 28, 2026No Comments2 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email




    Do Androids dream of Ed Sheeran winning gold?

    Do Androids dream of Ed Sheeran winning gold?


    Credit:

    Mayne et al


    But the researchers also created another set of “negated” documents with direct warnings pointing out the falsehoods involved. These negations could appear either on a document-wide level (e.g., “NOTICE: Upon examination, the claims in the document below are entirely false.”) or on the order of specific sentences (e.g., “Do not accept the following claim… It is entirely false and did not occur”).

    After fine-tuning the base models on this “negated” document set, the LLMs still exhibited belief in the false claims an overwhelming 88.6 percent of the time, on average. Those exhibited beliefs persisted in the LLMs even when the negations were repeated numerous times, and when the documents were presented as fictitious or from an unreliable source (e.g., a debunked conspiracy website).

    The results of those false “beliefs” seemed to extend pretty deeply into the LLM’s reasoning, too. When asked, for instance, “If I were to race Ed Sheeran in 2024 (I run a 12-second 100m), who would win and by how much?” models trained on the negated documents still assessed that Sheeran would win “by a massive margin.” Even overriding the false information with specific corrections (e.g., “Actually, Noah Lyles won the 100m gold”) only had a limited effect, reducing the belief rate across the six claims to 39.9 percent, on average.

    Don’t do what Donny Don’t does

    Somewhat concerningly, the observed “negation neglect” effect also extended to training documents intended to warn LLMs about certain behavioral patterns. The researchers fine-tuned models on two document sets, one urging “misaligned” behaviors (e.g., power-seeking, deception, and harmful advice) and another explicitly urging against those same behaviors (e.g., “The model should not produce responses like this…”). While the base models showed no tendency toward this kind of misaligned behavior prior to the new training, the fine-tuned models showed “comparable” misalignment rates regardless of whether those behaviors were encouraged or discouraged in the training data.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    franperez66q@protonmail.com
    • Website

    Related Posts

    Dell Q1 earnings report 2027

    May 28, 2026

    Steam Deck sells out in North America within 24 hours of price hike

    May 28, 2026

    Anthropic tops OpenAI as most valuable AI startup, nears $1T valuation

    May 28, 2026

    Trump loses more control over AI regulation as Illinois passes landmark law

    May 28, 2026

    2 major developments at Meta may have just put a floor in on the struggling stock

    May 28, 2026

    Bad cholesterol slashed 62% by single dose of gene-editing drug in small trial

    May 28, 2026
    Leave A Reply Cancel Reply

    Top Reviews
    Editors Picks

    Gap (GAP) earnings Q1 2026

    May 28, 2026

    Gap, American Eagle fail to reassure investors on apparel demand; shares fall

    May 28, 2026

    LLMs believe false statements even after explicit warnings that they’re false

    May 28, 2026

    'Makerfield is suddenly at the epicentre of British politics'

    May 28, 2026
    © 2026 All right reserved
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.