Claude can now write and run code.
We’ve added a new analysis tool. The tool helps Claude respond with mathematically precise and reproducible answers. You can then create interactive data visualizations with Artifacts.
Enable the feature preview: https://t.co/bJ8BjBT6zG. pic.twitter.com/Jq5xOHBmiR
— Anthropic (@AnthropicAI) October 24, 2024
Along these same lines, see:
Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku
We’re also introducing a groundbreaking new capability in public beta: computer use. Available today on the API, developers can direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text. Claude 3.5 Sonnet is the first frontier AI model to offer computer use in public beta. At this stage, it is still experimental—at times cumbersome and error-prone. We’re releasing computer use early for feedback from developers, and expect the capability to improve rapidly over time.
ZombAIs: From Prompt Injection to C2 with Claude Computer Use — from embracethered.com by Johann Rehberger
A few days ago, Anthropic released Claude Computer Use, which is a model + code that allows Claude to control a computer. It takes screenshots to make decisions, can run bash commands and so forth.
It’s cool, but obviously very dangerous because of prompt injection. Claude Computer Use enables AI to run commands on machines autonomously, posing severe risks if exploited via prompt injection.
This blog post demonstrates that it’s possible to leverage prompt injection to achieve, old school, command and control (C2) when giving novel AI systems access to computers.
…
We discussed one way to get malware onto a Claude Computer Use host via prompt injection. There are countless others, like another way is to have Claude write the malware from scratch and compile it. Yes, it can write C code, compile and run it. There are many other options.TrustNoAI.
And again, remember do not run unauthorized code on systems that you do not own or are authorized to operate on.
Also relevant here, see:
Perplexity Grows, GPT Traffic Surges, Gamma Dominates AI Presentations – The AI for Work Top 100: October 2024 — from flexos.work by Daan van Rossum
Perplexity continues to gain users despite recent controversies. Five out of six GPTs see traffic boosts. This month’s highest gainers including Gamma, Blackbox, Runway, and more.
Growing Up: Navigating Generative AI’s Early Years – AI Adoption Report — from ai.wharton.upenn.edu by Jeremy Korst, Stefano Puntoni, & Mary Purk
From a survey with more than 800 senior business leaders, this report’s findings indicate that weekly usage of Gen AI has nearly doubled from 37% in 2023 to 72% in 2024, with significant growth in previously slower-adopting departments like Marketing and HR. Despite this increased usage, businesses still face challenges in determining the full impact and ROI of Gen AI. Sentiment reports indicate leaders have shifted from feelings of “curiosity” and “amazement” to more positive sentiments like “pleased” and “excited,” and concerns about AI replacing jobs have softened. Participants were full-time employees working in large commercial organizations with 1,000 or more employees.
Apple study exposes deep cracks in LLMs’ “reasoning” capabilities — from arstechnica.com by Kyle Orland
Irrelevant red herrings lead to “catastrophic” failure of logical inference.
For a while now, companies like OpenAI and Google have been touting advanced “reasoning” capabilities as the next big step in their latest artificial intelligence models. Now, though, a new study from six Apple engineers shows that the mathematical “reasoning” displayed by advanced large language models can be extremely brittle and unreliable in the face of seemingly trivial changes to common benchmark problems.
The fragility highlighted in these new results helps support previous research suggesting that LLMs use of probabilistic pattern matching is missing the formal understanding of underlying concepts needed for truly reliable mathematical reasoning capabilities. “Current LLMs are not capable of genuine logical reasoning,” the researchers hypothesize based on these results. “Instead, they attempt to replicate the reasoning steps observed in their training data.”
Google CEO says more than a quarter of the company’s new code is created by AI — from businessinsider.in by Hugh Langley
- More than a quarter of new code at Google is made by AI and then checked by employees.
- Google is doubling down on AI internally to make its business more efficient.
Top Generative AI Chatbots by Market Share – October 2024
Bringing developer choice to Copilot with Anthropic’s Claude 3.5 Sonnet, Google’s Gemini 1.5 Pro, and OpenAI’s o1-preview — from github.blog
We are bringing developer choice to GitHub Copilot with Anthropic’s Claude 3.5 Sonnet, Google’s Gemini 1.5 Pro, and OpenAI’s o1-preview and o1-mini. These new models will be rolling out—first in Copilot Chat, with OpenAI o1-preview and o1-mini available now, Claude 3.5 Sonnet rolling out progressively over the next week, and Google’s Gemini 1.5 Pro in the coming weeks. From Copilot Workspace to multi-file editing to code review, security autofix, and the CLI, we will bring multi-model choice across many of GitHub Copilot’s surface areas and functions soon.