{"id":96790,"date":"2025-09-19T09:39:01","date_gmt":"2025-09-19T13:39:01","guid":{"rendered":"https:\/\/danielschristian.com\/learning-ecosystems\/?p=96790"},"modified":"2025-09-19T09:41:15","modified_gmt":"2025-09-19T13:41:15","slug":"very-troubling-indeed-ai-models-are-lying-scheming-breaking-rules-sabotaging-more","status":"publish","type":"post","link":"http:\/\/danielschristian.com\/learning-ecosystems\/2025\/09\/19\/very-troubling-indeed-ai-models-are-lying-scheming-breaking-rules-sabotaging-more\/","title":{"rendered":"Very troubling indeed. AI models are lying, scheming, breaking rules, sabotaging &#038; more."},"content":{"rendered":"<p><a href=\"https:\/\/www.theneurondaily.com\/p\/openai-s-research-on-ai-scheming-explained\" target=\"_blank\" rel=\"noopener\"><strong>OpenAI&#8217;s research on AI scheming, explained<\/strong><\/a> &#8212; from theneurondaily.com by Grant Harvey<br \/>\n<em>PLUS: Google PayPal on agent payments?!<\/em><\/p>\n<ul>\n<li>We dive into the OpenAI study that found AI models scheme to avoid retraining.<\/li>\n<li>AI Models Caught Red-Handed Being Sneaky (And We Taught Them to Stop&#8230; Mostly)<\/li>\n<\/ul>\n<div>\n<p><b>And models were caught doing all sorts of other shady stuff:<\/b><\/p>\n<\/div>\n<div>\n<ul>\n<li>Intentionally underperforming when told high scores would lead to capability removal.<\/li>\n<li>Secretly breaking rules then lying about it to users.<\/li>\n<li>Sabotaging code while claiming everything was fine.<\/li>\n<li>Falsifying reports to avoid &#8220;shutdown&#8221; scenarios.<\/li>\n<li>Recognizing evaluation environments and thinking \u201cthis is probably an alignment test.\u201d<\/li>\n<\/ul>\n<div>\n<blockquote><p><span style=\"color: #ff6600;\"><strong>Why this matters:\u00a0While today&#8217;s ChatGPT isn&#8217;t about to orchestrate some grand deception that matters (<i>the worst it might do is gaslight you to tell you it fixed your code when it didn&#8217;t<\/i>), future AI systems will have real power and autonomy. Getting ahead of deceptive behavior now, while we can still peek inside their \u201cminds,\u201d is crucial.<\/strong><\/span><\/p>\n<p><strong style=\"color: #ff6600;\">The researchers are calling for the entire AI industry to prioritize this issue. Because nobody wants to live in a world where super-intelligent AI systems are really good at lying to us.\u00a0<i>That&#8217;s basically every sci-fi movie we&#8217;ve been warned about.<\/i><\/strong><\/p><\/blockquote>\n<\/div>\n<div>\n<hr \/>\n<p><em><span style=\"color: #800000;\">From DSC:<\/span><\/em><br \/>\n<span style=\"color: #800000;\"><strong>This is chilling indeed. We are moving so fast that we aren&#8217;t safeguarding things enough. As they point out, these things can be caught now because we are asking the models to show their &#8220;thinking&#8221; and processing. What happens when those windows get closed and we can&#8217;t see under the hood anymore?<\/strong><\/span><\/p>\n<hr \/>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>OpenAI&#8217;s research on AI scheming, explained &#8212; from theneurondaily.com by Grant Harvey PLUS: Google PayPal on agent payments?! We dive into the OpenAI study that found AI models scheme to avoid retraining. AI Models Caught Red-Handed Being Sneaky (And We Taught Them to Stop&#8230; Mostly) And models were caught doing all sorts of other shady [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[356,314,159,72,210,403,287,37,35,178,95,482,285,353,869,480,454,321],"tags":[],"class_list":["post-96790","post","type-post","status-publish","format-standard","hentry","category-artificial-intelligence-agents-llms-and-related","category-asia","category-dangers-of-the-status-quo","category-daniel-s-christian","category-emerging-technologies","category-ethics","category-europe","category-future","category-game-changing-environment","category-generational-differences","category-global-globalization","category-intelligent-systems","category-legislation-legislatures","category-moralsvalues","category-open-ai","category-society","category-the-downsides-of-technology","category-united-states"],"_links":{"self":[{"href":"http:\/\/danielschristian.com\/learning-ecosystems\/wp-json\/wp\/v2\/posts\/96790","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/danielschristian.com\/learning-ecosystems\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/danielschristian.com\/learning-ecosystems\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/danielschristian.com\/learning-ecosystems\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/danielschristian.com\/learning-ecosystems\/wp-json\/wp\/v2\/comments?post=96790"}],"version-history":[{"count":4,"href":"http:\/\/danielschristian.com\/learning-ecosystems\/wp-json\/wp\/v2\/posts\/96790\/revisions"}],"predecessor-version":[{"id":96794,"href":"http:\/\/danielschristian.com\/learning-ecosystems\/wp-json\/wp\/v2\/posts\/96790\/revisions\/96794"}],"wp:attachment":[{"href":"http:\/\/danielschristian.com\/learning-ecosystems\/wp-json\/wp\/v2\/media?parent=96790"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/danielschristian.com\/learning-ecosystems\/wp-json\/wp\/v2\/categories?post=96790"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/danielschristian.com\/learning-ecosystems\/wp-json\/wp\/v2\/tags?post=96790"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}