LLM 每日资讯 - 2026-02-12

共采集 403 条，去重排序后精选 Top 10

🎧 语音播报

1. GPT-5 outperforms federal judges 100% to 52% in legal reasoning experiment

📂 hackernews / Hacker News ⭐ 9.0/10 ★★★★★★★★★☆ 📅 2026-02-11 23:37 UTC

GPT-5在法律推理实验中以100%对52%的成绩超越了联邦法官，展示了AI在法律领域的显著进步。

🔗 查看原文

2. Accelerating Mathematical and Scientific Discovery with Gemini Deep Think

📂 blog / DeepMind ⭐ 9.0/10 ★★★★★★★★★☆ 📅 2026-02-09 16:12 UTC

DeepMind分享了Gemini Deep Think在加速数学和科学发现方面的研究论文，展示了AI在科学研究中的影响。

🔗 查看原文

3. AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management

📂 hf_papers / HF Daily Papers ⭐ 8.0/10 ★★★★★★★★☆☆ 📅 2026-02-07 01:28 UTC

AgentSys通过显式分层内存管理解决了LLM代理面临的间接提示注入威胁，保护代理免受恶意指令攻击。

🔗 查看原文

4. VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text?

📂 hf_papers / HF Daily Papers ⭐ 8.0/10 ★★★★★★★★☆☆ 📅 2026-02-04 12:48 UTC

VISTA-Bench是首个评估视觉语言模型对图像中可视化文本理解能力的基准测试，挑战了VLMs在真实场景中的表现。

🔗 查看原文

5. Dynamic Long Context Reasoning over Compressed Memory via End-to-End Reinforcement Learning

📂 hf_papers / HF Daily Papers ⭐ 8.0/10 ★★★★★★★★☆☆ 📅 2026-02-09 03:33 UTC

该方法通过端到端强化学习在压缩内存上进行动态长上下文推理，解决了LLM长上下文处理中的计算成本和信息遗忘问题。

🔗 查看原文

6. [HF Model] zai-org/GLM-5

📂 hf_models / zai-org ⭐ 8.0/10 ★★★★★★★★☆☆ 📅 2026-02-11 17:07 UTC

zai-org发布了GLM-5大语言模型，针对复杂系统工程和长程代理任务进行了优化。

🔗 查看原文

📂 github_trending / GitHub Trending ⭐ 8.0/10 ★★★★★★★★☆☆ 📅 2026-02-12 00:28 UTC

graphiti是一个用于构建AI代理实时知识图谱的框架，获得了高关注度。

🔗 查看原文

📂 github_trending / GitHub Trending ⭐ 8.0/10 ★★★★★★★★☆☆ 📅 2026-02-12 01:53 UTC

Kiln是一个用于构建、评估和优化AI系统的综合平台，包括评估、RAG、代理等功能。

🔗 查看原文

9. GLM-5: Targeting complex systems engineering and long-horizon agentic tasks

📂 hackernews / Hacker News ⭐ 8.0/10 ★★★★★★★★☆☆ 📅 2026-02-11 13:42 UTC

GLM-5针对复杂系统工程和长程代理任务进行了优化，获得了高关注度和讨论。

🔗 查看原文

10. SceneSmith: Agentic Generation of Simulation-Ready Indoor Scenes

📂 hf_papers / HF Daily Papers ⭐ 7.0/10 ★★★★★★★☆☆☆ 📅 2026-02-09 14:56 UTC

SceneSmith是一种智能生成仿真就绪室内场景的方法，解决了现有环境无法捕捉真实室内空间多样性和物理复杂性的问题。

🔗 查看原文

← 所有日报

LLM 每日资讯 - 2026-02-12

🎧 语音播报

1. GPT-5 outperforms federal judges 100% to 52% in legal reasoning experiment

2. Accelerating Mathematical and Scientific Discovery with Gemini Deep Think

3. AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management

4. VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text?

5. Dynamic Long Context Reasoning over Compressed Memory via End-to-End Reinforcement Learning

6. [HF Model] zai-org/GLM-5

7. [Trending] getzep/graphiti

8. [Trending] Kiln-AI/Kiln

9. GLM-5: Targeting complex systems engineering and long-horizon agentic tasks

10. SceneSmith: Agentic Generation of Simulation-Ready Indoor Scenes