每日信息看板 · 2026-02-15

1. ChatGPT · flypig · 问完这个问题，把 ChatGPT Pro 正式退了。

分类：社交讨论/观点来源：weibo_ai_trending分数：16作者：flypig时间：2026-02-10 14:25

微博用户flypig称因提出某个问题后决定正式退订ChatGPT Pro，引发一定讨论与互动，反映了部分用户对付费AI服务价值的重新评估。

帖子核心观点是“问完一个问题后退订ChatGPT Pro”
内容来源于微博AI热榜账号收录，具备社交传播属性
互动数据为转发86、评论245、点赞985，讨论热度中等偏高

#ChatGPT #Weibo #社交讨论/观点 #ChatGPT Pro

原链接详情页

2. AIGC · 活体复读机 · 杜华要搞什么ai选秀？？？ #ndouble 接生#

分类：社交讨论/观点来源：weibo_ai_trending分数：13作者：活体复读机时间：2026-02-10 13:48

微博话题围绕“杜华要做AI选秀”展开猜测与讨论并获得一定传播，反映了大众对AIGC娱乐化应用和AI偶像模式的关注度上升。

话题核心为“杜华要搞AI选秀”引发网友讨论
内容以疑问和话题标签形式传播，信息尚不完整
该条动态来自微博AI趋势来源账号
互动数据显示已有一定热度：转发351、评论51、点赞267

#AIGC #Weibo #社交讨论/观点

原链接详情页

3. Claude · 投星资产 · 对于国内的程序员，或者程序员科技公司，大爷强烈推荐你们，都去使用anthropic新发布的cl…

分类：社交讨论/观点来源：weibo_ai_trending分数：13作者：投星资产时间：2026-01-22 15:44

微博博主强烈推荐国内程序员和科技公司使用 Anthropic 新发布的 Claude Code Opus 4.5，称其已在内部实现AI全流程写代码并成最佳实践，反映了AI编程工具加速进入企业研发核心流程的重要趋势。

博主面向国内程序员与科技公司发出明确使用建议，重点推荐 Claude Code Opus 4.5。
内容宣称 Anthropic 内部已实现“全部代码由AI代码工具生成”的实践路径。
帖子提及该趋势在达沃斯论坛层面被讨论，强调其行业关注度。
博主以程序员和创业者身份背书，表示其团队近期也在公司内部推进相关尝试。

#Claude #Weibo #社交讨论/观点

原链接详情页

4. memvid/memvid

分类：开源项目来源：github_search分数：100作者：memvid时间：2026-02-15T19:15:22Z

Memvid 在 GitHub 发布了面向 AI Agent 的单文件长期记忆系统，通过 .mv2 封装数据与索引实现免数据库、可移植且低延迟检索，这对构建离线与可审计智能体很关键。

核心定位是“单文件记忆层”：将内容、向量、检索结构和元数据打包进 .mv2 文件，无需独立向量数据库
采用 Smart Frames 追加写入与不可变帧设计，支持时间线回溯、分支、崩溃安全和历史状态查询
项目宣称在 LoCoMo 等基准上取得更高准确率，并具备极低延迟与高吞吐表现
提供 CLI、Node.js、Python、Rust 多端 SDK，支持全文检索、向量检索、PDF、CLIP、Whisper、加密等可选能力
强调离线可用、模型无关和可复现评测，适用于企业知识库、长会话 Agent、代码理解与合规可审计流程

#GitHub #repo #开源项目 #Memvid #AI Agent #RAG #Rust #Agent

原链接详情页

5. e2b-dev/fragments

分类：开源项目来源：github_search分数：47作者：e2b-dev时间：2026-02-15T17:46:17Z

E2B 发布了开源项目 Fragments，可复刻 Claude Artifacts/v0 类 AI 代码生成应用并安全执行代码，支持多技术栈与多模型提供商，重要性在于降低搭建可运行式 AI 编程产品的门槛与扩展成本。

Fragments 是基于 E2B SDK 的开源实现，定位为可运行代码型 AI 应用框架。
技术栈包含 Next.js 14、shadcn/ui、TailwindCSS、Vercel AI SDK，并支持前端流式输出。
可在沙箱中安装并使用 npm/pip 包，内置 Python、Next.js、Vue、Streamlit、Gradio 等模板。
兼容 OpenAI、Anthropic、Google AI、Mistral、Groq、Together AI、Ollama 等多家模型服务商。
项目提供自定义 persona、模板、模型与 provider 的扩展指南，便于二次开发和社区贡献。

#GitHub #repo #开源项目 #E2B #Fragments #Next.js #Vercel AI SDK

原链接详情页

6. sindresorhus/awesome-chatgpt

分类：开源项目来源：github_search分数：46作者：sindresorhus时间：2026-02-15T18:05:30Z

GitHub 仓库 sindresorhus/awesome-chatgpt 汇总了 ChatGPT 生态中的官方入口、应用、网页工具、扩展与CLI等资源，为开发者和用户快速发现与选型提供了高价值导航。

这是一个 Awesome 风格的 ChatGPT 资源清单仓库，按类别系统整理生态工具。
覆盖范围广，包括桌面/移动 App、Web 应用、浏览器扩展、命令行工具、机器人与集成方案。
同时收录托管与自托管项目，兼顾普通用户使用与开发者二次开发需求。
列表中包含大量开源替代界面、自动化代理、文档问答、代码解释与生产力增强工具。
作为持续更新的聚合入口，可显著降低信息检索成本并帮助进行工具对比与技术选型。

#GitHub #repo #开源项目 #ChatGPT #CLI #Agent

原链接详情页

7. Project Genie | Experimenting with infinite interactive worlds

分类：产品/发布来源：youtube_rss分数：0作者：Google DeepMind时间：2026-01-29T16:55:50+00:00

Google DeepMind发布实验性原型Project Genie，可用文本或图像实时生成并探索“无限互动世界”，这为AI智能体训练提供近乎无限的高质量模拟环境，具有重要研究与应用价值。

Project Genie支持通过文本或图片提示生成可交互环境，并在游玩过程中实时构建世界。
其底层由世界模型Genie 3驱动，面向“无限课程”的仿真训练场景。
该能力可用于训练AI智能体，并拓展AI研究的新方向与实验空间。
目前仅向美国18岁以上的Google AI Ultra订阅用户开放，后续将扩展到更多地区。

#YouTube #产品/发布 #Project Genie #Genie 3

原链接详情页

8. AlphaGenome author roundtable

分类：视频/演讲来源：youtube_rss分数：0作者：Google DeepMind时间：2026-01-28T12:00:29+00:00

Google DeepMind基因组团队在圆桌视频中解读了发表于Nature的AlphaGenome，介绍其如何以统一序列到功能模型高精度评估非编码区变异影响并通过API加速疾病研究，重要性在于提升基因功能解析与科研转化效率。

视频由产品经理、基因组负责人和论文一作共同讲述AlphaGenome的研发背景与目标。
AlphaGenome定位为统一的DNA序列到功能模型，重点解决人类基因组98%非编码区域的功能解读难题。
团队强调其在遗传变异功能影响预测上的高准确性，可支持科学家更快筛选关键变异。
内容介绍了在TPU上处理长序列高分辨率建模的工程突破，以及对剪接、接触图等复杂生物过程的建模。
项目已开放API，便于研究者快速进行变异打分，以推动疾病机制理解与后续研究协作。

#YouTube #视频/演讲 #AlphaGenome #Google DeepMind #TPU #API

原链接详情页

9. “Dear Upstairs Neighbors” (Trailer)

分类：视频/演讲来源：youtube_rss分数：0作者：Google DeepMind时间：2026-01-26T18:00:10+00:00

“Dear Upstairs Neighbors” is a short animated film previewing at Sundance Film Festival. It’s a story about noisy neighbors that was crafted by our Google Dee…

“Dear Upstairs Neighbors” is a short animated film previewing at Sundance Film Festival
It’s a story about noisy neighbors that was crafted by our Google DeepMind team of Pixar alums, an Academy Award winner, researchers, engin…
In creating this film, our 45-person crew developed new AI capabilities specifically for filmmakers, and we look forward to releasing the f…
Learn more at: https://blog
google/innovation-and-ai/models-and-research/google-deepmind/dear-upstairs-neighbors/ ___ Subscribe to our channel https://www
youtube

#YouTube #视频/演讲

原链接详情页

10. WIMLE: Uncertainty-Aware World Models with IMLE for Sample-Efficient Continuous Control

分类：研究/论文来源：arxiv_search分数：100作者：Mehran Aghabozorgi时间：2026-02-15T23:53:16Z

Model-based reinforcement learning promises strong sample efficiency but often underperforms in practice due to compounding model error, unimodal world models …

Model-based reinforcement learning promises strong sample efficiency but often underperforms in practice due to compounding model error, un…
We introduce WIMLE, a model-based method that extends Implicit Maximum Likelihood Estimation (IMLE) to the model-based RL framework to lear…
During training, WIMLE weights each synthetic transition by its predicted confidence, preserving useful model rollouts while attenuating bi…
Across $40$ continuous-control tasks spanning DeepMind Control, MyoSuite, and HumanoidBench, WIMLE achieves superior sample efficiency and …
Notably, on the challenging Humanoid-run task, WIMLE improves sample efficiency by over $50$\% relative to the strongest competitor, and on…
These results highlight the value of IMLE-based multi-modality and uncertainty-aware weighting for stable model-based RL

#arXiv #paper #研究/论文

原链接详情页

11. AXE: An Agentic eXploit Engine for Confirming Zero-Day Vulnerability Reports

分类：研究/论文来源：arxiv_search分数：98作者：Amirali Sajadi时间：2026-02-15T23:25:14Z

Vulnerability detection tools are widely adopted in software projects, yet they often overwhelm maintainers with false positives and non-actionable reports. Au…

Vulnerability detection tools are widely adopted in software projects, yet they often overwhelm maintainers with false positives and non-ac…
Automated exploitation systems can help validate these reports; however, existing approaches typically operate in isolation from detection …
In this paper, we investigate how reported security vulnerabilities can be assessed in a realistic grey-box exploitation setting that lever…
We introduce Agentic eXploit Engine (AXE), a multi-agent framework for Web application exploitation that maps lightweight detection metadat…
Evaluated on the CVE-Bench dataset, AXE achieves a 30% exploitation success rate, a 3x improvement over state-of-the-art black-box baselines
Even in a single-agent configuration, grey-box metadata yields a 1

#arXiv #paper #研究/论文 #Agent

原链接详情页

12. Zero-Shot Instruction Following in RL via Structured LTL Representations

分类：研究/论文来源：arxiv_search分数：95作者：Mathias Jackermeier时间：2026-02-15T23:22:50Z

We study instruction following in multi-task reinforcement learning, where an agent must zero-shot execute novel tasks not seen during training. In this settin…

We study instruction following in multi-task reinforcement learning, where an agent must zero-shot execute novel tasks not seen during trai…
In this setting, linear temporal logic (LTL) has recently been adopted as a powerful framework for specifying structured, temporally extend…
While existing approaches successfully train generalist policies, they often struggle to effectively capture the rich logical and temporal …
In this work, we address these concerns with a novel approach to learn structured task representations that facilitate training and general…
Our method conditions the policy on sequences of Boolean formulae constructed from a finite automaton of the task
We propose a hierarchical neural architecture to encode the logical structure of these formulae, and introduce an attention mechanism that …

#arXiv #paper #研究/论文

原链接详情页

13. High-accuracy log-concave sampling with stochastic queries

分类：研究/论文来源：arxiv_search分数：92作者：Fan Chen时间：2026-02-15T23:19:07Z

We show that high-accuracy guarantees for log-concave sampling -- that is, iteration and query complexities which scale as $\mathrm{poly}\log(1/δ)$, where $δ$ …

We show that high-accuracy guarantees for log-concave sampling -- that is, iteration and query complexities which scale as $\mathrm{poly}\l…
Notably, this exhibits a separation with the problem of convex optimization, where stochasticity (even additive Gaussian noise) in the grad…
We also give an information-theoretic argument that light-tailed stochastic gradients are necessary for high accuracy: for example, in the …
Our framework also provides similar high accuracy guarantees under stochastic zeroth order (value) queries

#arXiv #paper #研究/论文

原链接详情页

14. Train Less, Learn More: Adaptive Efficient Rollout Optimization for Group-Based Reinforcement Learning

分类：研究/论文来源：arxiv_search分数：90作者：Zhi Zhang时间：2026-02-15T23:14:05Z

Reinforcement learning (RL) plays a central role in large language model (LLM) post-training. Among existing approaches, Group Relative Policy Optimization (GR…

Reinforcement learning (RL) plays a central role in large language model (LLM) post-training
Among existing approaches, Group Relative Policy Optimization (GRPO) is widely used, especially for RL with verifiable rewards (RLVR) fine-…
In GRPO, each query prompts the LLM to generate a group of rollouts with a fixed group size $N$
When all rollouts in a group share the same outcome, either all correct or all incorrect, the group-normalized advantages become zero, yiel…
We introduce Adaptive Efficient Rollout Optimization (AERO), an enhancement of GRPO
AERO uses an adaptive rollout strategy, applies selective rejection to strategically prune rollouts, and maintains a Bayesian posterior to …

#arXiv #paper #研究/论文

原链接详情页

15. Conformal Signal Temporal Logic for Robust Reinforcement Learning Control: A Case Study

分类：研究/论文来源：arxiv_search分数：88作者：Hani Beirami时间：2026-02-15T22:10:11Z

We investigate how formal temporal logic specifications can enhance the safety and robustness of reinforcement learning (RL) control in aerospace applications.…

We investigate how formal temporal logic specifications can enhance the safety and robustness of reinforcement learning (RL) control in aer…
Using the open source AeroBench F-16 simulation benchmark, we train a Proximal Policy Optimization (PPO) agent to regulate engine throttle …
The control objective is encoded as a Signal Temporal Logic (STL) requirement to maintain airspeed within a prescribed band during the fina…
To enforce this specification at run time, we introduce a conformal STL shield that filters the RL agent's actions using online conformal p…
We compare three settings: (i) PPO baseline, (ii) PPO with a classical rule-based STL shield, and (iii) PPO with the proposed conformal shi…
Experiments show that the conformal shield preserves STL satisfaction while maintaining near baseline performance and providing stronger ro…

#arXiv #paper #研究/论文

原链接详情页

16. Offline Learning of Nash Stable Coalition Structures with Possibly Overlapping Coalitions

分类：研究/论文来源：arxiv_search分数：85作者：Saar Cohen时间：2026-02-15T22:05:12Z

Coalition formation concerns strategic collaborations of selfish agents that form coalitions based on their preferences. It is often assumed that coalitions ar…

Coalition formation concerns strategic collaborations of selfish agents that form coalitions based on their preferences
It is often assumed that coalitions are disjoint and preferences are fully known, which may not hold in practice
In this paper, we thus present a new model of coalition formation with possibly overlapping coalitions under partial information, where sel…
Instead, information about past interactions and associated utility feedback is stored in a fixed offline dataset, and we aim to efficientl…
We analyze the impact of diverse dataset information constraints by studying two types of utility feedback that can be stored in the datase…
For both feedback models, we identify assumptions under which the dataset covers sufficient information for an offline learning algorithm t…

#arXiv #paper #研究/论文

原链接详情页

17. In Transformer We Trust? A Perspective on Transformer Architecture Failure Modes

分类：研究/论文来源：arxiv_search分数：82作者：Trishit Mondal时间：2026-02-15T21:57:14Z

Transformer architectures have revolutionized machine learning across a wide range of domains, from natural language processing to scientific computing. Howeve…

Transformer architectures have revolutionized machine learning across a wide range of domains, from natural language processing to scientif…
However, their growing deployment in high-stakes applications, such as computer vision, natural language processing, healthcare, autonomous…
In this work, we critically examine the foundational question: \textitHow trustworthy are transformer models
} We evaluate their reliability through a comprehensive review of interpretability, explainability, robustness against adversarial attacks,…
We systematically examine the trustworthiness of transformer-based models in safety-critical applications spanning natural language process…
By synthesizing insights across these diverse areas, we identify recurring structural vulnerabilities, domain-specific risks, and open rese…

#arXiv #paper #研究/论文

原链接详情页

18. Benchmarking at the Edge of Comprehension

分类：研究/论文来源：arxiv_search分数：80作者：Samuele Marro时间：2026-02-15T20:51:29Z

As frontier Large Language Models (LLMs) increasingly saturate new benchmarks shortly after they are published, benchmarking itself is at a juncture: if fronti…

As frontier Large Language Models (LLMs) increasingly saturate new benchmarks shortly after they are published, benchmarking itself is at a…
If benchmarking becomes infeasible, our ability to measure any progress in AI is at stake
We refer to this scenario as the post-comprehension regime
In this work, we propose Critique-Resilient Benchmarking, an adversarial framework designed to compare models even when full human understa…
Our technique relies on the notion of critique-resilient correctness: an answer is deemed correct if no adversary has convincingly proved o…
Unlike standard benchmarking, humans serve as bounded verifiers and focus on localized claims, which preserves evaluation integrity beyond …

#arXiv #paper #研究/论文

原链接详情页

19. Floe: Federated Specialization for Real-Time LLM-SLM Inference

分类：研究/论文来源：arxiv_search分数：78作者：Chunlin Tian时间：2026-02-15T20:28:38Z

Deploying large language models (LLMs) in real-time systems remains challenging due to their substantial computational demands and privacy concerns. We propose…

Deploying large language models (LLMs) in real-time systems remains challenging due to their substantial computational demands and privacy …
We propose Floe, a hybrid federated learning framework designed for latency-sensitive, resource-constrained environments
Floe combines a cloud-based black-box LLM with lightweight small language models (SLMs) on edge devices to enable low-latency, privacy-pres…
Personal data and fine-tuning remain on-device, while the cloud LLM contributes general knowledge without exposing proprietary weights
A heterogeneity-aware LoRA adaptation strategy enables efficient edge deployment across diverse hardware, and a logit-level fusion mechanis…
Extensive experiments demonstrate that Floe enhances user privacy and personalization

#arXiv #paper #研究/论文

原链接详情页

20. DeepFusion: Accelerating MoE Training via Federated Knowledge Distillation from Heterogeneous Edge Devices

分类：研究/论文来源：arxiv_search分数：75作者：Songyuan Li时间：2026-02-15T20:25:50Z

Recent Mixture-of-Experts (MoE)-based large language models (LLMs) such as Qwen-MoE and DeepSeek-MoE are transforming generative AI in natural language process…

Recent Mixture-of-Experts (MoE)-based large language models (LLMs) such as Qwen-MoE and DeepSeek-MoE are transforming generative AI in natu…
However, these models require vast and diverse training data
Federated learning (FL) addresses this challenge by leveraging private data from heterogeneous edge devices for privacy-preserving MoE trai…
Nonetheless, traditional FL approaches require devices to host local MoE models, which is impractical for resource-constrained devices due …
To address this, we propose DeepFusion, the first scalable federated MoE training framework that enables the fusion of heterogeneous on-dev…
Specifically, DeepFusion features each device to independently configure and train an on-device LLM tailored to its own needs and hardware …

#arXiv #paper #研究/论文

原链接详情页

每日信息看板 · 2026-02-15

Daily Focus

按分类

按来源