Admin.Foundation » Category

Meta Contractors Posed as Teens to Prompt Rival Chatbots About Suicide, Sex, and Drugs

Hundreds of contractors working on a project for Meta pretended to be kids in order to see how other chatbots like Gemini and ChatGPT would respond to high-risk subjects, WIRED found.

Ai safety, Security, Security / Security News

Meta Contractors Posed as Teens to Prompt Rival Chatbots About Suicide, Sex, and Drugs

Hundreds of contractors working on a project for Meta pretended to be kids in order to see how other chatbots like Gemini and ChatGPT would respond to high-risk subjects, WIRED found.

Ai safety, Security, Security / Security News

A harmless-looking ChatGPT prompt opened the door to gruesome AI images

Researchers say ChatGPT generated violent and sexualized images after a harmless-looking prompt was altered, raising new questions about OpenAI’s safeguards and how quickly AI image tools can be pushed around filters.

AI image generation, Ai safety, ChatGPT, Computing, Generative AI, jailbreaks, Mindgard, News, OpenAI

xAI fired an engineer who raised alarms about Grok safety, new lawsuit claims

A former xAI engineer is suing the company and SpaceX, alleging he was fired for raising AI safety concerns about Grok days before SpaceX’s historic IPO.

AI, Ai safety, devin kim, Grok, SpaceX, xAI

Cybersecurity researchers aren’t happy about the guardrails on Anthropic’s Fable

Cybersecurity researchers are complaining that Anthropic’s new model Fable has guardrails that are too strict for any cybersecurity work.

AI, Ai safety, Anthropic, cybersecurity, fable, Mythos, Security

Wowed by computer-use AI agents? Research says they’re “digital disasters” even for routine tasks

New research from UC Riverside found computer-use AI agents often push ahead with unsafe or irrational tasks, raising questions about whether today’s desktop agents are ready for sensitive everyday workflows.

AI agents, Ai safety, Anthropic, artificial intelligence, computer-use agents, Computing, DeepSeek, Meta, News, OpenAI, UC Riverside

ChatGPT, Gemini, and other AI bots give bad medical tips half the time

A BMJ Open study found that five leading AI chatbots often returned flawed health advice, with open-ended questions triggering the worst answers and citation quality falling apart under scrutiny.

AI chatbots, Ai safety, BMJ Open, ChatGPT, Computing, DeepSeek, Gemini, Grok, health advice, medical misinformation, meta ai, News

The Facebook insider building content moderation for the AI era

Moonbounce has raised $12 million to grow its AI control engine that converts content moderation policies into consistent, predictable AI behavior.

AI, Ai safety, Amplify Partners, content moderation, Exclusive, Fundraising, moonbounce, Startups, StepStone Group

Your chatbot may have emotions, and it changes how it behaves

Your chatbot may not feel anything, but new research shows emotion-like signals inside AI can shape responses, steer decisions, and even push systems toward risky behavior under pressure.

AI, Ai safety, Anthropic, artificial intelligence, Chatbots, Claude, Computing, Machine learning, News

AI mental health risks exposed as chatbots sometimes enable harm

A Stanford study finds AI chatbots sometimes enable violent or self-harm thoughts in rare cases, exposing gaps in crisis response and raising concerns about how safe these tools are for emotional support.
The post AI mental health risks exposed as chat…

AI risks, Ai safety, Artificial Ingelligence, Chatbots, Computing, mental health, News, Stanford study