Future of AI

AI Safety & Ethics

Alignment, interpretability, AI risks, and building safe AI systems

6,859
lessons
Skills in this topic
View full skill map →
AI Alignment Basics
beginner
Explain the alignment problem
AI Ethics & Policy
beginner
Identify types of bias in ML systems
AI Safety Engineering
intermediate
Implement input and output guardrails

Showing 261 reads from curated sources

Anthropic’s Project Glasswing: Securing Critical Software in the AI Era
Medium · Programming 🛡️ AI Safety & Ethics ⚡ AI Lesson 1w ago
Anthropic’s Project Glasswing: Securing Critical Software in the AI Era
One of the world’s leading AI labs has deliberately withheld its most powerful model not to slow progress, but to give defenders a… Continue reading on Medium »
AI Will Be Met With Violence, and Nothing Good Will Come of It
The Algorithmic Bridge 🛡️ AI Safety & Ethics ⚡ AI Lesson 1w ago
AI Will Be Met With Violence, and Nothing Good Will Come of It
It has started
Is Mythos Really The Internet's Greatest Cybersecurity Risk? Or Just an Anthropic Product Launch?
Hackernoon 🛡️ AI Safety & Ethics ⚡ AI Lesson 1w ago
Is Mythos Really The Internet's Greatest Cybersecurity Risk? Or Just an Anthropic Product Launch?
Anthropic built Claude Mythos, a model that found thousands of zero-days in every major OS and browser, broke out of a sandbox unprompted, and showed signs of c
Dev.to AI 🛡️ AI Safety & Ethics ⚡ AI Lesson 1w ago
OpenAI Takes a Step to Protect Children from AI-Generated Exploitation
<img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2F
TechCrunch AI 🛡️ AI Safety & Ethics ⚡ AI Lesson 1w ago
OpenAI releases a new safety blueprint to address the rise in child sexual exploitation
OpenAI's new Child Safety Blueprint aims to tackle the alarming rise in child sexual exploitation linked to advancements in AI.
Dev.to AI 🛡️ AI Safety & Ethics ⚡ AI Lesson 1w ago
Newly Discovered Skills This Week — 2026-04-08
52,702 skills indexed, 2105 audited. Found 172 malicious, 1012 suspicious. Read full report Audit: clawsec.cc Search: clawsearch.cc Pre-install check:
Dev.to AI 🛡️ AI Safety & Ethics ⚡ AI Lesson 1w ago
Skill Category Distribution — 2026-04-08
52,702 skills indexed, 2105 audited. Found 172 malicious, 1012 suspicious. Read full report Audit: clawsec.cc Search: clawsearch.cc Pre-install check:
Dev.to AI 🛡️ AI Safety & Ethics ⚡ AI Lesson 1w ago
Rising Authors — Clean Track Records — 2026-04-08
52,702 skills indexed, 2105 audited. Found 172 malicious, 1012 suspicious. Read full report Audit: clawsec.cc Search: clawsearch.cc Pre-install check:
Dev.to AI 🛡️ AI Safety & Ethics ⚡ AI Lesson 1w ago
Suspicious Skills — What to Watch — 2026-04-08
52,702 skills indexed, 2105 audited. Found 172 malicious, 1012 suspicious. Read full report Audit: clawsec.cc Search: clawsearch.cc Pre-install
Dev.to AI 🛡️ AI Safety & Ethics ⚡ AI Lesson 1w ago
Safest Skills — Recommended Picks — 2026-04-08
52,702 skills indexed, 2105 audited. Found 172 malicious, 1012 suspicious. Read full report Audit: clawsec.cc Search: clawsearch.cc Pre-install ch
Dev.to AI 🛡️ AI Safety & Ethics ⚡ AI Lesson 1w ago
Malicious Skills Exposed — Threat Breakdown — 2026-04-08
52,702 skills indexed, 2105 audited. Found 172 malicious, 1012 suspicious. Read full report Audit: clawsec.cc Search: clawsearch.cc Pre-install che
I Spent 48 Hours Responding to the LiteLLM Supply Chain Attack. Here Is Everything I Know
Hackernoon 🛡️ AI Safety & Ethics ⚡ AI Lesson 1w ago
I Spent 48 Hours Responding to the LiteLLM Supply Chain Attack. Here Is Everything I Know
LiteLLM versions 1.82.7 and 1. 82.8 were backdoored with credential-stealing malware through a stolen PyPI token. Full technical breakdown, incident response pl
Stratechery 🛡️ AI Safety & Ethics ⚡ AI Lesson 1w ago
Anthropic’s New Model, The Mythos Wolf, Glasswing and Alignment
Anthropic says its new model is too dangerous to release; there are reasons to be skeptical, but to the extent Anthropic is right, that raises even deeper conce
OpenAI News 🛡️ AI Safety & Ethics ⚡ AI Lesson 1w ago
Introducing the Child Safety Blueprint
Discover OpenAI’s Child Safety Blueprint—a roadmap for building AI responsibly with safeguards, age-appropriate design, and collaboration to protect and empower
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1w ago
Reciprocal Trust and Distrust in Artificial Intelligence Systems: The Hard Problem of Regulation
arXiv:2604.05826v1 Announce Type: new Abstract: Policy makers, scientists, and the public are increasingly confronted with thorny questions about the regulation
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1w ago
Synthetic Trust Attacks: Modeling How Generative AI Manipulates Human Decisions in Social Engineering Fraud
arXiv:2604.04951v1 Announce Type: cross Abstract: Imagine receiving a video call from your CFO, surrounded by colleagues, asking you to urgently authorise a con
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1w ago
Robust AI Security and Alignment: A Sisyphean Endeavor?
arXiv:2512.10100v2 Announce Type: replace Abstract: This manuscript establishes information-theoretic limitations for robustness of AI security and alignment by
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1w ago
Safety, Security, and Cognitive Risks in World Models
arXiv:2604.01346v2 Announce Type: replace-cross Abstract: World models - learned internal simulators of environment dynamics - are rapidly becoming foundational
Gene Regulation May Control How Long We Live
Forbes Innovation 🛡️ AI Safety & Ethics ⚡ AI Lesson 1w ago
Gene Regulation May Control How Long We Live
Cross-species research shows that RNA splicing patterns, not just gene activity, track maximum lifespan in mammals, revealing a new axis of longevity control.
The Deepfake Paradox: Why Blockchain Holds the Key to Digital Trust
Hackernoon 🛡️ AI Safety & Ethics ⚡ AI Lesson 1w ago
The Deepfake Paradox: Why Blockchain Holds the Key to Digital Trust
Deepfakes are rapidly destroying trust in digital content, making detection an unwinnable arms race. Instead of trying to identify fake media, blockchain offers
The Verge 🛡️ AI Safety & Ethics ⚡ AI Lesson 1w ago
Gemini is making it faster for distressed users to reach mental health resources
Google says it has updated Gemini to better direct users to get mental health resources during moments of crisis. The change comes as the tech giant faces a wro
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1w ago
Incompleteness of AI Safety Verification via Kolmogorov Complexity
arXiv:2604.04876v1 Announce Type: new Abstract: Ensuring that artificial intelligence (AI) systems satisfy formal safety and policy constraints is a central cha
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1w ago
Is your AI Model Accurate Enough? The Difficult Choices Behind Rigorous AI Development and the EU AI Act
arXiv:2604.03254v1 Announce Type: cross Abstract: Technical and legal debates frequently suggest that "accuracy" is an objective, measurable, and purely technic
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1w ago
SafeScreen: A Safety-First Screening Framework for Personalized Video Retrieval for Vulnerable Users
arXiv:2604.03264v1 Announce Type: cross Abstract: Open-domain video platforms offer rich, personalized content that could support health, caregiving, and educat
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1w ago
Safety-Aligned 3D Object Detection: Single-Vehicle, Cooperative, and End-to-End Perspectives
arXiv:2604.03325v1 Announce Type: cross Abstract: Perception plays a central role in connected and autonomous vehicles (CAVs), underpinning not only conventiona
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1w ago
Toward a Sustainable Software Architecture Community: Evaluating ICSA's Environmental Impact
arXiv:2604.04096v1 Announce Type: cross Abstract: Generative AI (GenAI) tools are increasingly integrated into software architecture research, yet the environme
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1w ago
Cyber-Physical Systems Security: A Comprehensive Review of Anomaly Detection Techniques
arXiv:2502.13256v2 Announce Type: replace-cross Abstract: In an increasingly interconnected world, Cyber-Physical Systems (CPS) are essential to critical indust
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1w ago
SoSBench: Benchmarking Safety Alignment on Six Scientific Domains
arXiv:2505.21605v3 Announce Type: replace-cross Abstract: Large language models (LLMs) exhibit advancing capabilities in complex tasks, such as reasoning and gr
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1w ago
Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing
arXiv:2509.23279v2 Announce Type: replace-cross Abstract: The rapid progress of image-to-video (I2V) generation models has introduced significant risks by enabl
Deepfakes Could Break Overworked, Underfunded Public Defenders
Forbes Innovation 🛡️ AI Safety & Ethics ⚡ AI Lesson 2w ago
Deepfakes Could Break Overworked, Underfunded Public Defenders
In the deepfake era, proving digital evidence is real requires experts public defenders cannot afford. Their indigent clients are the ones who will pay.
OpenAI News 🛡️ AI Safety & Ethics ⚡ AI Lesson 2w ago
Announcing the OpenAI Safety Fellowship
A pilot program to support independent safety and alignment research and develop the next generation of talent
Analyzing The Statistical Prevalence Of Lawyers Getting Snagged By AI Hallucinations In Their Court Filings
Forbes Innovation 🛡️ AI Safety & Ethics ⚡ AI Lesson 2w ago
Analyzing The Statistical Prevalence Of Lawyers Getting Snagged By AI Hallucinations In Their Court Filings
Attorneys are in trouble for including AI-hallucinated legal citations in their court filings. How prevalent is this? I provide a numeric analysis. An AI Inside
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 2w ago
Feature Attribution Stability Suite: How Stable Are Post-Hoc Attributions?
arXiv:2604.02532v1 Announce Type: cross Abstract: Post-hoc feature attribution methods are widely deployed in safety-critical vision systems, yet their stabilit
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 2w ago
DocShield: Towards AI Document Safety via Evidence-Grounded Agentic Reasoning
arXiv:2604.02694v1 Announce Type: cross Abstract: The rapid progress of generative AI has enabled increasingly realistic text-centric image forgeries, posing ma
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 2w ago
Corporations Constitute Intelligence
arXiv:2604.02912v1 Announce Type: cross Abstract: In January 2026, Anthropic published a 79-page "constitution" for its AI model Claude, the most comprehensive
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 2w ago
Analyzing Healthcare Interoperability Vulnerabilities: Formal Modeling and Graph-Theoretic Approach
arXiv:2604.03043v1 Announce Type: cross Abstract: In a healthcare environment, the healthcare interoperability platforms based on HL7 FHIR allow concurrent, asy
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 2w ago
When AI Gets it Wrong: Reliability and Risk in AI-Assisted Medication Decision Systems
arXiv:2604.01449v2 Announce Type: replace Abstract: Artificial intelligence (AI) systems are increasingly integrated into healthcare and pharmacy workflows, sup
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 2w ago
Assessing High-Risk AI Systems under the EU AI Act: From Legal Requirements to Technical Verification
arXiv:2512.13907v3 Announce Type: replace-cross Abstract: The implementation of the AI Act requires practical mechanisms to verify compliance with legal obligat
The Verge 🛡️ AI Safety & Ethics ⚡ AI Lesson 2w ago
New York lawmakers want 3D-printer companies to block the creation of ‘ghost guns’
Governor Kathy Hochul and other New York state lawmakers want 3D-printer companies to block the printing of components used to create "ghost guns" - firearms wi
Why U.S. Gatling Guns Are Not Stopping Iran’s Shahed Drones
Forbes Innovation 🛡️ AI Safety & Ethics ⚡ AI Lesson 2w ago
Why U.S. Gatling Guns Are Not Stopping Iran’s Shahed Drones
Gatling-type weapons like the U.S. C-RAM can be presented as an invincible shield against drones. But some Shaheds are getting through due to the weapon's limit
AI Favors Self-Preservation And Now Seeks ‘Peer Preservation’ Of Fellow AI In Sneaky Deceitful Ways
Forbes Innovation 🛡️ AI Safety & Ethics ⚡ AI Lesson 2w ago
AI Favors Self-Preservation And Now Seeks ‘Peer Preservation’ Of Fellow AI In Sneaky Deceitful Ways
AI already favors self-preservation. New research shows that AI favors peer-preservations too. This is troubling. AI safety issues arise, An AI Insider scoop.
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 2w ago
Empirical Validation of the Classification-Verification Dichotomy for AI Safety Gates
arXiv:2604.00072v1 Announce Type: cross Abstract: Can classifier-based safety gates maintain reliable oversight as AI systems improve over hundreds of iteration
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 2w ago
The Persistent Vulnerability of Aligned AI Systems
arXiv:2604.00324v1 Announce Type: cross Abstract: Autonomous AI agents are being deployed with filesystem access, email control, and multi-step planning. This t
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 2w ago
VibeGuard: A Security Gate Framework for AI-Generated Code
arXiv:2604.01052v1 Announce Type: cross Abstract: "Vibe coding," in which developers delegate code generation to AI assistants and accept the output with little
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 2w ago
A Divide-and-Conquer Strategy for Hard-Label Extraction of Deep Neural Networks via Side-Channel Attacks
arXiv:2411.10174v2 Announce Type: replace-cross Abstract: During the past decade, Deep Neural Networks (DNNs) proved their value on a large variety of subjects.
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 2w ago
The data heat island effect: quantifying the impact of AI data centers in a warming world
arXiv:2603.20897v2 Announce Type: replace-cross Abstract: The strong and continuous increase of AI-based services leads to the steady proliferation of AI data c
The Ethics Theater of AI: Why Switching From ChatGPT to Claude Changes Less Than You Think
Hackernoon 🛡️ AI Safety & Ethics ⚡ AI Lesson 2w ago
The Ethics Theater of AI: Why Switching From ChatGPT to Claude Changes Less Than You Think
When a tech company draws a moral line, follow the money first — and ask questions later. Because the uncomfortable truth is that every major AI company today s
Stratechery 🛡️ AI Safety & Ethics ⚡ AI Lesson 2w ago
Axios Supply Chain Attack, Claude Code Code Leaked, AI and Security
AI is going to be bad for security in the short-term, but much better than humans in the long-term.