SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond

📰 ArXiv cs.AI

arXiv:2603.01589v2 Announce Type: replace-cross Abstract: The success of large language models (LLMs) in scientific domains has heightened safety concerns, prompting numerous benchmarks to evaluate their scientific safety. Existing benchmarks often suffer from limited risk coverage and a reliance on subjective evaluation. To address these problems, we introduce SafeSci, a comprehensive framework for safety evaluation and enhancement in scientific contexts. SafeSci comprises SafeSciBench, a multi

Published 6 Apr 2026

Read full paper → ← Back to News