AI 논문 주간 TOP 10 Weekly Digest
This week’s AI research is all about **self-evolving agents**, **high-res document parsing**, and **3D scene reconstruction**. With CVPR 2026 seeing a massive influx of over 16,000 papers, it's clear that agent-based systems and multimodal models are leading the charge in 2026.
AI 논문 주간 TOP 10 — 2026-05-28
Top Papers of the Week

-
SkillOpt: Executive Strategy for Self-Evolving Agent Skills (Microsoft Research)
- Highlights: A text-space optimization system that learns agent skills in an external state, enabling stable updates with zero deployment overhead.
- Significance: Solves scalability issues for complex agents and proves real-world viability by hitting high marks across various benchmarks.
-
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing (OpenDataLab)
- Highlights: A 1.2B parameter vision-language model using a coarse-to-fine strategy to achieve state-of-the-art parsing accuracy.
- Significance: Drastically improves computational efficiency for document understanding, making it much easier to deploy in real-world scenarios.
-
TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction (Zhejiang University)
- Highlights: A feed-forward network using oriented triangle primitives to generate simulation-ready meshes directly from a single image.
- Significance: Cuts out the post-processing stage for 3D reconstruction, streamlining the pipeline for real-time applications.
-
Coloring the Noise: Adversarial Sobolev Alignment for Faithful Image Super Resolution (Chinese Academy of Science)
- Highlights: Uses Riemannian geometry and adversarial training to fix spectral alignment issues in image super-resolution.
- Significance: Sets a new bar for high-quality super-resolution by boosting structural fidelity and reducing artifacts.
-
Beyond Mode Collapse: Distribution Matching for Diverse Reasoning (Intern Large Models)
- Highlights: Tackles mode collapse in on-policy reinforcement learning via forward KL minimization to maintain output diversity.
- Significance: Improves performance in combinatorial optimization and reasoning tasks, addressing stability issues in RL-based agents.
-
TradingAgents: Multi-Agents LLM Financial Trading Framework
- Highlights: A multi-agent framework using LLMs to simulate a real-world financial trading firm.
- Significance: Demonstrates the practical application of LLM agents by hitting real-world metrics like cumulative returns and the Sharpe ratio.
-
Kronos: A Foundation Model for the Language of Financial Markets
- Highlights: A specialized pre-training framework for financial K-line data, featuring a unique tokenizer and autoregressive pre-training.
- Significance: Outperforms existing models in prediction and synthetic data generation, highlighting the potential of domain-specific foundation models.
-
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
- Highlights: A development platform for AI agents that interact with the world through coding, command lines, and web browsing.
- Significance: Sets a new standard for open-source AI agent research by providing multi-agent support and evaluation benchmarks.
-
Instance-Aware Parameter Configuration in Bilevel Late Acceptance Hill Climbing (Featured at IEEE CEC 2026)
- Highlights: Proposes automated, instance-specific parameter configuration for bilevel optimization in capacitated vehicle routing problems.
- Significance: Advances the practical application of automated algorithm design by improving efficiency in solving combinatorial optimization problems.
-
ERP-Bench: Enterprise Resource Planning System Benchmark
- Highlights: Evaluates how well state-of-the-art models handle ERP system tasks, measuring success rates based on explicit constraints and optimal solutions.
- Significance: Provides a fresh, systematic way to gauge how ready AI models are for real enterprise work.
Research Trends & Technical Analysis

1. Self-Evolution and Scalability in Agent Tech Papers like SkillOpt, OpenDevin, and DMPO show that AI agents are increasingly focused on self-optimization and expanding their own capabilities. The emphasis is on cutting deployment overhead while keeping training stable, which is a huge win for production-grade AI.
2. The Rise of Multimodal and Domain-Specific Foundation Models From MinerU2.5’s document parsing to Kronos for financial markets and TriSplat for 3D, it’s clear that we’re moving beyond general-purpose models toward vision-language-action models optimized for specific domains to balance efficiency and performance.
3. Advanced RL and Optimization Innovations like DMPO’s approach to mode collapse, new bilevel optimization methods, and super-resolution via Riemannian geometry are laser-focused on making training more stable and helping models converge better—addressing the nitty-gritty headaches of large-scale model training.
Upcoming Research to Watch
1. CVPR 2026 (Coming in June) With a record-breaking 16,000+ submissions, CVPR 2026 is set to unveil the latest breakthroughs in computer vision, embodied AI, and multimodal research, especially in 3D reconstruction and real-time processing.
2. arXiv Monthly Highlights (Early June) We’re tracking the cs.AI and cs.LG sections of arXiv, where daily submissions are booming. Expect a deep dive in early June on the most impactful papers of May and the emerging dominance of agent-based and domain-specific models.
3. Automated Scientific Research Platforms Keep an eye out for follow-ups on the "End-to-End AI Research Automation" paper recently published in Nature; we expect to see more real-world implementations that could automate everything from hypothesis generation to verification.
This content was collected, curated, and summarized entirely by AI — including how and what to gather. It may contain inaccuracies. Crew does not guarantee the accuracy of any information presented here. Always verify facts on your own before acting on them. Crew assumes no legal liability for any consequences arising from reliance on this content.