1Technical Intelligence Brief
Candidates
130
raw scanned
130
raw scanned
Cited
65
deduped
65
deduped
GitHub
50
repos
50
repos
Reddit
50
threads
50
threads
Papers
30
arXiv
30
arXiv
2Executive Technical Signal
1. Coding-agent stack chuyển từ “chat autocomplete” sang CLI/runtime có harness. Why: Fabbi cần đo pass/fail theo task, không đo cảm giác. Evidence: S01-S20 = 20 repo signals; S56-S65 = 10 product/benchmark anchors. Action: NEXA tạo 20-task internal SWE-bench mini.
2. Reddit social pulse cho thấy nhu cầu workflow thực dụng > model hype. Why: adoption phụ thuộc latency, sandbox, cost, review loop. Evidence: 50 threads, mỗi thread có score/comment (S21-S40). Action: SYNCA logging reviewer interventions.
3. Benchmark gap vẫn lớn. Why: SWE-bench/Terminal-Bench đo tốt hơn demo, nhưng thiếu domain Fabbi/Japan SI. Evidence: S58, S59. Action: FARE context retrieval benchmark riêng cho Java/Spring, React, legacy DB.
4. Enterprise IDE agents phân mảnh. Why: Cursor/Copilot/Claude Code/Codex/Replit khác runtime, policy, context. Evidence: 7 product anchors (S56-S64). Action: AIOS adapter contract + audit schema.
5. Repo momentum nghiêng về OSS terminal agents, nhưng maintainer risk cần chấm. Why: OSS nhanh, governance yếu. Evidence: 50 GitHub repos, stars/forks/update captured. Action: score license/security/CI trước pilot.
3Trend Clusters
Agent Harness & Evaluation
Summary: đo task-level + terminal-level. Why now: 2 benchmark anchors. Impact: NEXA/SYNCA. Action: benchmark pack 20 tasks. Confidence 78%.
Summary: đo task-level + terminal-level. Why now: 2 benchmark anchors. Impact: NEXA/SYNCA. Action: benchmark pack 20 tasks. Confidence 78%.
Coding Agent Runtime/CLI/IDE
Summary: CLI agents thành runtime dev. Evidence: Claude Code, Codex, OpenCode, Copilot. Impact: NEXA/AIOS. Action: adapter abstraction. Confidence 74%.
Summary: CLI agents thành runtime dev. Evidence: Claude Code, Codex, OpenCode, Copilot. Impact: NEXA/AIOS. Action: adapter abstraction. Confidence 74%.
Context Engineering
Summary: retrieval/codebase map là bottleneck. Evidence: Sourcegraph/Cody + repo signals. Impact: FARE. Action: repo graph + symbol index. Confidence 70%.
Summary: retrieval/codebase map là bottleneck. Evidence: Sourcegraph/Cody + repo signals. Impact: FARE. Action: repo graph + symbol index. Confidence 70%.
Workflow Governance/HITL
Summary: review gates quyết định enterprise trust. Evidence: Reddit skepticism + product docs. Impact: SYNCA/AIOS. Action: policy logs. Confidence 68%.
Summary: review gates quyết định enterprise trust. Evidence: Reddit skepticism + product docs. Impact: SYNCA/AIOS. Action: policy logs. Confidence 68%.
Security/Sandbox Readiness
Summary: terminal agents cần sandbox & secrets policy. Evidence: CLI/product anchors. Impact: Japan/Global enterprise. Action: denylist + ephemeral env. Confidence 66%.
Summary: terminal agents cần sandbox & secrets policy. Evidence: CLI/product anchors. Impact: Japan/Global enterprise. Action: denylist + ephemeral env. Confidence 66%.
4Must-read Sources
| Type | Link | Priority | Why read | Takeaway | Follow-up |
|---|---|---|---|---|---|
| Product | Claude Code | P0 | CLI workflow chuẩn hóa | Agent trong terminal cần policy/harness | Pilot 5 repo |
| Product | OpenAI Codex | P0 | coding-agent API/product direction | execution + review loop | compare Claude Code |
| Benchmark | SWE-bench | P0 | task benchmark | khung đo internal | map 20 Fabbi tasks |
| Benchmark | Terminal-Bench | P1 | terminal agent eval | đo runtime capability | add CLI tasks |
| Product | Sourcegraph Cody | P1 | codebase context | FARE pattern | benchmark retrieval |
5Fabbi Impact Map
| Trend | Evidence | Impact | Move | Owner | Urgency |
|---|---|---|---|---|---|
| Harness eval | 2 benchmarks + 30 papers | NEXA/SYNCA quality gate | trial | AI Eng Lead | 0-2w |
| Context engineering | 50 repos + Cody anchor | FARE codebase understanding | adopt | Solution Architect | 0-2w |
| Agent governance | 50 Reddit + product docs | AIOS/SYNCA audit | trial | Platform Lead | 1-2m |
| Japan enterprise adoption | security/sandbox constraints N/A metrics | presales credibility | monitor | Presales Lead | 1-2m |
6Action Plan
DO THIS WEEK
- 1) NEXA internal SWE-bench mini: 20 tasks, ROI 15-25%, risk 2/5, owner AI Eng Lead, TTV 7 ngày, validate pass@1/pass@3.
- 2) FARE repo-context eval: 5 repos × 10 queries, ROI 10-18%, risk 2/5, owner SA, TTV 5 ngày, validate retrieval hit-rate.
- 3) SYNCA agent audit schema: 12 event fields, ROI 8-15%, risk 3/5, owner Platform Lead, TTV 10 ngày, validate reviewer override rate.
- 4) AIOS adapter PoC: Claude Code/Codex/OpenCode 3 adapters, ROI 12-20%, risk 3/5, owner DevTools Lead, TTV 14 ngày, validate same-task comparison.
WATCH 2-4 WEEKS: Cursor/Copilot enterprise controls; Terminal-Bench adoption; OSS maintainer risk.
IGNORE/LOW SIGNAL: consumer chatbot UI hype; fundraising-only; viral demos without task metrics.
7CTO Evaluation Matrix
| Signal | Thesis | Counter | Decision | Next validation |
|---|---|---|---|---|
| Harness-first agents | đo được ROI | benchmark lệch domain | trial 78% | 20 internal tasks |
| CLI runtime | fit dev workflow | security/secrets risk | trial 74% | sandbox PoC |
| Context layer | FARE moat | index freshness | adopt 70% | hit-rate benchmark |
8Detailed Source Appendix
| ID | Type | Source | Metric | Signal |
|---|---|---|---|---|
| S01 | GitHub | RBVI/ChimeraX | 249 stars, 41 forks, updated 2026-05-27 | repo momentum/coding-agent infra |
| S02 | GitHub | hogan-tech/leetcode-solution | 540 stars, 48 forks, updated 2026-05-27 | repo momentum/coding-agent infra |
| S03 | GitHub | promptdriven/pdd | 711 stars, 63 forks, updated 2026-05-27 | repo momentum/coding-agent infra |
| S04 | GitHub | tim-hardcastle/pipefish | 185 stars, 5 forks, updated 2026-05-26 | repo momentum/coding-agent infra |
| S05 | GitHub | reviewdog/reviewdog | 9324 stars, 488 forks, updated 2026-05-26 | repo momentum/coding-agent infra |
| S06 | GitHub | red/red | 5999 stars, 415 forks, updated 2026-05-26 | repo momentum/coding-agent infra |
| S07 | GitHub | ParanoidUser/codewars-handbook | 115 stars, 17 forks, updated 2026-05-26 | repo momentum/coding-agent infra |
| S08 | GitHub | oraios/serena | 24649 stars, 1651 forks, updated 2026-05-27 | repo momentum/coding-agent infra |
| S09 | GitHub | facebook/ktfmt | 1255 stars, 104 forks, updated 2026-05-26 | repo momentum/coding-agent infra |
| S10 | GitHub | pulumi/pulumi | 25233 stars, 1377 forks, updated 2026-05-26 | repo momentum/coding-agent infra |
| S11 | GitHub | TripleView/SummerBoot | 148 stars, 39 forks, updated 2026-05-26 | repo momentum/coding-agent infra |
| S12 | GitHub | AvinZarlez/processing-vscode | 180 stars, 25 forks, updated 2026-05-26 | repo momentum/coding-agent infra |
| S13 | GitHub | editor-code-assistant/eca | 832 stars, 61 forks, updated 2026-05-26 | repo momentum/coding-agent infra |
| S14 | GitHub | cpeditor/cpeditor | 2151 stars, 151 forks, updated 2026-05-26 | repo momentum/coding-agent infra |
| S15 | GitHub | xyproto/orbiton | 674 stars, 17 forks, updated 2026-05-26 | repo momentum/coding-agent infra |
| S16 | GitHub | node-red/node-red | 23182 stars, 3845 forks, updated 2026-05-26 | repo momentum/coding-agent infra |
| S17 | GitHub | datawhalechina/easy-vibe | 14802 stars, 1407 forks, updated 2026-05-26 | repo momentum/coding-agent infra |
| S18 | GitHub | zufuliu/notepad4 | 4674 stars, 298 forks, updated 2026-05-26 | repo momentum/coding-agent infra |
| S19 | GitHub | unhappychoice/gittype | 1125 stars, 29 forks, updated 2026-05-26 | repo momentum/coding-agent infra |
| S20 | GitHub | processing/processing4 | 409 stars, 165 forks, updated 2026-05-26 | repo momentum/coding-agent infra |
| S21 | algotrading | score 1, comments 0 | EA i made with claude code and | |
| S22 | JulesAgent | score 4, comments 0 | Hi, from Jules! | |
| S23 | MercuryInstall | score 1, comments 0 | Mercury Agent use case: Mercury Agent v1.1.9 without the automate-everything trap | |
| S24 | tmux | score 1, comments 0 | ccm: a tmux plugin for parallel Claude Code sessions | |
| S25 | vibecodeapp | score 1, comments 0 | A new large language model coding agent, try it for free | |
| S26 | ClaudeCode | score 2, comments 0 | Claude surpassed by Codex? | |
| S27 | turnitin_uk | score 1, comments 0 | Professional Coding Assignment, dissertation, project help and code humanizing | |
| S28 | Agent_AI | score 1, comments 0 | A new large language model coding agent, try it for free | |
| S29 | BiomedicalDataScience | score 1, comments 0 | Building a Zero-Footprint DICOM Viewer in Vanilla JS & Testing Real-Time Audio Sentiment Analysis | |
| S30 | ClaudeWorkflows | score 1, comments 0 | [Workflow] Structured Claude Workflow for Continuous Development, Quality Control, and Context Management | |
| S31 | Bard | score 1, comments 0 | `gcp-ironclad`— automated GCP API-key audit + safe spend hardening, run from Claude Code (built after a reddit user posted - $80K | |
| S32 | AI_Agents | score 1, comments 1 | How do you prevent runaway costs from your coding agents, and how do ensure some safety guardrails | |
| S33 | FastAPI | score 1, comments 0 | Built a /advisor command for Claude Code — Opus directs parallel Sonnet runners that actually read your files | |
| S34 | ai_talk_monitor_dev | score 1, comments 2 | AI Digest - 5/27/2026 | |
| S35 | ai_talk_monitor_dev | score 1, comments 2 | AI Digest - 5/27/2026 | |
| S36 | ExamRanch | score 1, comments 0 | No More Traditional Software Jobs? Only Agentic AI Developers in 2026 and Beyond | |
| S37 | ClaudeWorkflows | score 1, comments 0 | [Workflow] RepoScry: Pre-process Codebase Context for Efficient Claude Code Interactions on Large Repositories | |
| S38 | ClaudeWorkflows | score 1, comments 0 | [Workflow] Claude Code `/advisor` Slash Command: Multi-Agent Opus/Sonnet for Automated Code Review and Bug Detection | |
| S39 | ClaudeWorkflows | score 1, comments 0 | [Workflow] CLAUDE.md Operating Contract to Prevent LLM Drift and Improve Action-Rate in Long Coding Sessions | |
| S40 | InteligenciArtificial | score 2, comments 0 | Qué es lo más interesante que han creado con Claude Code?💀 los leo | |
| S41 | Paper/arXiv | MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research | published 2026-05-25 | benchmark/eval/method signal |
| S42 | Paper/arXiv | From Model Scaling to System Scaling: Scaling the Harness in Agentic AI | published 2026-05-25 | benchmark/eval/method signal |
| S43 | Paper/arXiv | Prism: A Plug-in Reproducible Infrastructure for Scalable Multimodal Continual Instruction Tuning | published 2026-05-25 | benchmark/eval/method signal |
| S44 | Paper/arXiv | Reinforcing Few-step Generators via Reward-Tilted Distribution Matching | published 2026-05-25 | benchmark/eval/method signal |
| S45 | Paper/arXiv | Looped Diffusion Language Models | published 2026-05-25 | benchmark/eval/method signal |
| S46 | Paper/arXiv | InstructSAM: Segment Any Instance with Any Instructions | published 2026-05-25 | benchmark/eval/method signal |
| S47 | Paper/arXiv | Beyond Summaries: Structure-Aware Labeling of Code Changes with Large Language Models | published 2026-05-25 | benchmark/eval/method signal |
| S48 | Paper/arXiv | Pixel-Level Pavement Distress Assessment Using Instance Segmentation | published 2026-05-25 | benchmark/eval/method signal |
| S49 | Paper/arXiv | OrpQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization | published 2026-05-25 | benchmark/eval/method signal |
| S50 | Paper/arXiv | DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking | published 2026-05-25 | benchmark/eval/method signal |
| S51 | Paper/arXiv | Claw-Anything: Benchmarking Always-On Personal Assistants with Broader Access to User's Digital World | published 2026-05-25 | benchmark/eval/method signal |
| S52 | Paper/arXiv | Quantitative Einstein relation for reversible diffusions in a random environment | published 2026-05-25 | benchmark/eval/method signal |
| S53 | Paper/arXiv | VeriTrace: Evolving Mental Models for Deep Research Agents | published 2026-05-25 | benchmark/eval/method signal |
| S54 | Paper/arXiv | Automated Benchmark Auditing for AI Agents and Large Language Models | published 2026-05-25 | benchmark/eval/method signal |
| S55 | Paper/arXiv | StakeBench: Evaluating Language Understanding Grounded in Market Commitment | published 2026-05-25 | benchmark/eval/method signal |
| S56 | Product | Anthropic Claude Code docs | N/A docs | CLI agent workflow |
| S57 | Product | OpenAI Codex docs | N/A docs | coding agent product |
| S58 | Benchmark | SWE-bench | N/A benchmark | software engineering benchmark |
| S59 | Benchmark | Terminal-Bench | N/A benchmark | terminal agent benchmark |
| S60 | Product | Cursor | N/A product | AI IDE |
| S61 | Product | GitHub Copilot coding agent | N/A docs | enterprise SDLC adoption |
| S62 | Product | Sourcegraph Cody | N/A product | codebase context |
| S63 | Product | JetBrains AI | N/A product | IDE AI |
| S64 | Product | Replit Agent | N/A product | agentic app building |
| S65 | OSS | OpenCode | N/A repo | terminal coding agent |
Data Quality / Scan Health Appendix
Status: QUALITY_GATE_PARTIAL. Raw candidates: 130 (GitHub 50, Reddit 50, arXiv 30, HN API query 0). Social-first attempted: Reddit pass; X/YouTube/Facebook public unavailable in this runtime/no dedicated API, supplemented by product/benchmark seeds. Links preserved. Confidence reduced 8-12pp for missing X/YT engagement metrics.