MLA V3 - Build Domain-Specific SOTA-Level AI Agents

🌟 Introduction

infiAgent Also called MLA (Multi-Level Agent) is an agent framework designed for unlimited runtime without tool calling chaos or system crashes caused by cumulative task resources and conversation history. With MLA, you can build powerful general-purpose and semi-specialized agents simply by writing configuration files.

Key Features

✅ Unlimited Runtime: No degradation from context accumulation
✅ Multi-Level Agent Hierarchy: Serial execution with tree-structured agent orchestration
✅ Zero Context Compression: File-based state management eliminates the need for context compression
✅ Task Memory: Persistent memory across sessions using workspace as task ID
✅ Complete Research Workflows: From literature search to experiments, plotting, and LaTeX papers

Default Configuration

The default configuration in this repository is a research-oriented semi-specialized agent capable of:

📝 Academic Paper Writing: Complete end-to-end workflow from research to LaTeX submission
✅ Human-Level Quality: Papers can pass EI/IEEE conference peer reviews
🧪 Scientific Computing: ECM protein simulation, logistics scheduling, assignment grading, etc.
🔬 Full Research Pipeline: Literature collection, experiments, figures, and paper drafting

Update & News🔥

If you pulled the image or code before the latest update date, please refer to the issues that have been fixed and, based on your needs, pull the image and code again.

[2026/01/13] Supports breakpoint recovery for program errors (the original Ctrl+C resume function is retained). Please access the resume function using your CLI version and type /resume.
[2026/01/08] Our Paper "InfiAgent: An Infinite-Horizon Framework for General-Purpose Autonomous Agents" released
[2026/01/07] Web UI: This is a temporary fix for the "处理事件异常: 'int' object has no attribute 'get'". It will not affect subsequent agent output or operation, but the error will still be displayed. A full fix is pending.
[2026/01/06] Web UI: add an entry-agent selector next to Task ID so you can choose the root agent for the conversation, with an agent list and a visual agent tree for the selected root.
[2026/01/05] Resolves global freeze caused by prolonged unresponsiveness of the primary token. Please update code or pull latest docker image!
[2026/01/04] Support different Language of Agent output base on user input.
[2026/01/03] Optimize LiteLLM’s native retry mechanism by enhancing error-aware retry prompts to improve small-model call success rates; add connection timeout detection to reduce task interruption risks.
[2026/01/02] Install and how use vedio please click infiagent:全自动写作工具
[2026/01/02] fix some bugs about reference manage, Please clone latest repo or pull latest docker image: chenglinhku/mlav3.
[2026/01/01] support web_ui and qwen api. Also fix some problem when using third part oepnai format api. please using latest chenglinhku/mlav3 docker image and see the example configs.
[2025/12/31] support gemini api key from google ai studio now. Please See the gemini config in dir.

Attention: Current coding task only support python project. Other language may supported later. In old version execute_command only support safe command like cd or grep，now it include every commands including rm. Please try to use it in docker mode if your task may edit system file.

🎬 Outputs

complete academic papers generated by MLA:

Demo 1:

Demo 2:

Demo 3:

MLA handles the entire research workflow - from literature search and experiment design to code execution, figure generation, and LaTeX paper writing. All automatically orchestrated through multi-level agents.

📚 Table of Contents

🚀 Quick Start

Vedio of Docker Mode:

infiagent:全自动写作工具

Option 1: Docker (Recommended - No Python Required)

1. Install Docker

Mac/Windows: Docker Desktop
Linux: curl -fsSL https://get.docker.com | sh

2. Pull Image

docker pull chenglinhku/mlav3:latest

3. Choose Your Mode

Option A: Web UI Mode (Recommended)

open localhost:9641 to set keys and base url.

cd /your/workspace
# XXXX is optional port for agent web development (replace with your port like 5002)
docker run -d --name mla \
  -e HOST_PWD=$(pwd) \
  -v $(pwd):/workspace$(pwd) \
  -v ~/.mla_v3:/root/mla_v3 \
  -v mla-config:/mla_config \
  -p 8002:8002 \
  -p 9641:9641 \
  -p 4242:4242 \
  -p 5002:5002 \
  chenglinhku/mlav3:latest webui && docker logs -f mla

Then open browser: http://localhost:4242 default username：user defaultpassword：password

📖 Web UI usage & UI details: see web_ui/README.md.

Option B: CLI Mode

cd /your/workspace
# XXXX is optional port for agent web development (replace with your port like 5002)
docker run -it --rm \
  -e HOST_PWD=$(pwd) \
  -v $(pwd):/workspace$(pwd) \
  -v ~/.mla_v3:/root/mla_v3 \
  -v mla-config:/mla_config \
  -p 8002:8002 \
  -p 9641:9641 \
  -p 5002:5002 \
  chenglinhku/mlav3:latest cli

Windows Users:

Windows users need to manage conversation IDs manually. Different task IDs maintain different memories.

# CLI Mode (PowerShell)
docker run -it --rm `
  -e HOST_PWD="/{your_conversation_id}" `
  -v "${PWD}:/workspace/{your_conversation_id}" `
  -v "${HOME}\.mla_v3:/root/mla_v3" `
  -v mla-config:/mla_config `
  -p 8002:8002 `
  -p 9641:9641 `
  -p 5002:5002 `
  chenglinhku/mlav3:latest cli

# Web UI Mode (PowerShell)
docker run -d --name mla-webui `
  -e HOST_PWD="/{your_conversation_id}" `
  -v "${PWD}:/workspace/{your_conversation_id}" `
  -v "${HOME}\.mla_v3:/root/mla_v3" `
  -v mla-config:/mla_config `
  -p 8002:8002 `
  -p 9641:9641 `
  -p 4242:4242 `
  -p 5002:5002 `
  chenglinhku/mlav3:latest webui

# Then open browser: http://localhost:4242
# View logs: docker logs -f mla-webui

4. Configure API Key

Open browser: http://localhost:9641

Edit run_env_config/llm_config.yaml, fill in your API key, and save.

🎉 Done! Start using MLA CLI.

📖 Complete Docker Guide

Option 2: Local Installation (Python Required)

1. Install the package

# Ensure Python version > 3.10
cd install_path
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
git clone https://github.com/ChenglinPoly/infiAgent.git
cd infiAgent
pip install -e .

2. Install Playwright

playwright install chromium

3. Configure API Key

mla-agent --config-set api_key "your-api-key"

4. Start Tool Server

mla-tool-server start

5. Start CLI

cd /your/workspace
mla-agent --cli

📖 Complete CLI Guide

🎯 How It Works

MLA's design philosophy is "Provide short but high-value context for the next step." To achieve this, the framework implements multiple innovations:

1. 🌲 Serial Multi-Agent System

MLA deploys agents in a tree-structured hierarchy (e.g., Grandparent → Parent → Child). This ensures:

✅ Single-purpose agents: Each agent has a focused role
✅ Minimal tool sets: Agents only access necessary tools
✅ Task alignment: Serial execution prevents parallel conflicts
✅ Clear delegation: Parent agents orchestrate child agents

Example Hierarchy:

alpha_agent (Level 3)
  ├── data_collection_agent (Level 2)
  │   └── web_search_agent (Level 1)
  ├── coder_agent (Level 2)
  └── material_to_document_agent (Level 2)

2. 🎯 Nested Attention Mechanism

Long documents (PDFs, novels, papers) are never directly loaded into context. Instead:

✅ Use answer_from_pdf, answer_from_document tools
✅ Query-driven content extraction
✅ Only relevant excerpts or summaries enter context
✅ Application-layer attention allocation through tools

Traditional Approach:

Load entire 50-page PDF → Agent processes everything → Token overflow

MLA Approach:

Agent asks: "What is the methodology?"
→ Tool extracts relevant sections (2 pages)
→ Returns concise answer → Minimal token usage

3. 📁 File-Centric Architecture

"Files are everything." All outputs and interactions are saved to the file system:

✅ Web scraping → Saves as Markdown files
✅ PDF parsing → Extracts to structured documents
✅ Sub-agent results → Stored as files
✅ No immediate returns cluttering context

Benefits:

Clear audit trail
Reusable artifacts
Context-free state representation

4. ⚡ Ten-Step Strategy (No Context Compression)

A key insight: The current file system state represents the effect of all historical actions.

✅ A separate thinking module updates file space state every 10 steps
✅ Agents only retain the last 10 actions (since last state update)
✅ No need for context compression
✅ Historical actions are reflected in file system, not conversation history

Traditional LLM Agents:

Step 1: Create file A
Step 2: Edit file B
...
Step 100: Context overflow → Compression needed → Information loss

MLA Approach:

Steps 1-10: Actions recorded
Step 10: Thinking module updates "Current State: Files A, B, C exist with..."
Steps 11-20: Only these + Current State kept
→ No compression, no information loss

5. 🔧 Batch File Operations

Inspired by Claude Code, MLA uses list-based tool parameters to save tokens:

✅ Read multiple files in one call
✅ Batch operations reduce cumulative overhead
✅ Significant token savings on repeated actions

Example:

# Traditional: 3 separate calls
file_read(path="file1.txt")
file_read(path="file2.txt")
file_read(path="file3.txt")

# MLA: 1 batch call
file_read(paths=["file1.txt", "file2.txt", "file3.txt"])

6. 💾 Long-Term Memory with Task ID

✅ Task ID = Workspace absolute path (not user-configurable)
✅ Same task ID allows unlimited conversation sessions
✅ Agents remember all historical tasks in the workspace
✅ Persistent memory across interruptions and restarts

Usage:

# First session
mla-agent --task_id ~/research --user_input "Collect papers on Transformers"
# → Stores conversation in ~/mla_v3/conversations/{hash}_research_*

# Second session (days later)
mla-agent --task_id ~/research --user_input "Summarize the collected papers"
# → Agent remembers previous session and accesses collected files

7. 📊 Call Graph-Based Shared Context

The hierarchy_manager maintains a dynamic call relationship graph:

✅ Tracks parent-child agent relationships
✅ Injects call graph into shared context
✅ Prevents agents from overstepping boundaries
✅ Maintains task alignment across multi-agent system

Call Graph Example:

{
  "current_agent": "coder_agent",
  "parent": "alpha_agent",
  "siblings": ["data_collection_agent", "material_to_document_agent"],
  "allowed_tools": ["python_run", "file_write", "file_read"]
}

This ensures coder_agent won't accidentally call web_search (not in its scope) or interfere with sibling agents.

📸 Interface Screenshots

CLI Interface

MLA provides a rich interactive CLI with real-time task monitoring, HIL handling, and agent switching:

System Selection:

Tool Mode Configuration:

Starting Tasks:

Interactive CLI with prompt_toolkit and rich terminal UI - featuring multi-turn conversations, automatic HIL detection, and tool execution confirmation.

VS Code Plugin

Build powerful IDE extensions using MLA's JSONL mode:

VS Code extension powered by MLA - seamless integration with workspace context and real-time streaming output.

⚙️ Configuration Guide

MLA uses YAML files for agent and tool configuration. Configuration files are located in:

config/
├── agent_library/
│   └── Default/                    # Default agent system
│       ├── general_prompts.yaml    # Shared prompts
│       ├── level_-1_judge_agent.yaml  # Judge agent
│       ├── level_0_tools.yaml      # Tool definitions
│       ├── level_1_agents.yaml     # Low-level agents
│       ├── level_2_agents.yaml     # Mid-level agents
│       └── level_3_agents.yaml     # Top-level agents
└── run_env_config/
    ├── llm_config.yaml             # LLM settings
    └── tool_config.yaml            # Tool server settings

Key Configuration Files

1. `llm_config.yaml` - LLM Configuration

api_key: "your-api-key"
base_url: "https://openrouter.ai/api/v1"
models:
  - "openai/anthropic/claude-sonnet-4"
  - "openai/anthropic/claude-haiku-4.5"
temperature: 0.7
max_tokens: 8000
figure_models:
  - "google/gemini-2.0-flash-thinking-exp-01-21"

Note: Copy llm_config.example.yaml to llm_config.yaml to get started.

2. Agent Hierarchy

MLA organizes agents into levels:

Level 3: Top-level orchestrators (e.g., alpha_agent)
Level 2: Functional specialists (e.g., data_collection_agent, coder_agent)
Level 1: Basic executors (e.g., web_search_agent)
Level 0: Tool definitions
Level -1: Quality control (e.g., judge_agent)

3. Creating Custom Agents

Edit YAML files to customize agent behavior:

news_agent:
  type: llm_call_agent
  level: 1
  model_type: "advanced"
  available_tools:
    - data_collection_agent
    - coder_agent
    ...
  system_prompt: |
    You are a newspaper agent.

💻 CLI Interface

Interactive Mode

Start the CLI for a conversational experience:

mla-agent --cli

Key Features:

🔄 Multi-turn conversations with persistent context
🤖 Agent switching with @agent_name syntax
🔔 Automatic HIL detection with audio alerts
⚠️ Tool execution confirmation in manual mode
⏸️ Interrupt and resume support (Ctrl+C to pause)
🎨 Rich terminal UI powered by prompt_toolkit and rich

Usage Examples:

# Direct task input (uses default agent)
[alpha_agent] > Collect papers on Transformers

# Switch agent and execute task
[alpha_agent] > @data_collection_agent Search for recent NLP papers

# Switch default agent only
[alpha_agent] > @coder_agent
✅ Switched to: coder_agent
[coder_agent] >

CLI Commands:

Command	Description
`/help`	Show help and available commands
`/agents`	List all available agents
`/resume`	Resume interrupted tasks
`/quit` or `/exit`	Exit CLI mode
`Ctrl+C`	Interrupt current task (stays in CLI)
`Ctrl+D`	Exit CLI immediately

Human-in-Loop (HIL) Handling:

When an agent requests human input, the CLI automatically detects it:

🔔🔔🔔 Detected HIL task! Press Enter to handle... 🔔🔔🔔
================================================================================
🔔 Human Interaction Task (HIL)
================================================================================
📝 Task ID: upload_file_20250124
📋 Instruction: Please upload the required dataset files...
================================================================================
💡 Enter your response (any text)
   Type /skip to skip this task
================================================================================

[alpha_agent] HIL Response > Files uploaded successfully
✅ HIL task responded

Tool Confirmation (Manual Mode):

When --auto-mode false is set, each tool execution requires confirmation:

⚠️⚠️⚠️ Detected tool execution request! Press Enter to confirm... ⚠️⚠️⚠️
================================================================================
⚠️  Tool Execution Confirmation Request
================================================================================
🔧 Tool Name: python_run
📝 Confirmation ID: confirm_12345
📋 Parameters:
     code: import numpy as np...
     timeout: 300
================================================================================
💡 Choose action:
   yes / y - Approve execution
   no / n  - Reject execution
================================================================================

[alpha_agent] Confirm [yes/no] > yes
✅ Approved tool execution: python_run

Screenshot: (User will provide)

Command-Line Mode

For scripting and automation:

mla-agent \
  --task_id /path/to/workspace \
  --user_input "Your task description" \
  --agent_name alpha_agent

Common Parameters:

Parameter	Description	Default
`--task_id`	Workspace path (absolute)	Required
`--user_input`	Task description	Required
`--agent_name`	Agent to invoke	`alpha_agent`
`--agent_system`	Agent library name	`Default`
`--cli`	Interactive CLI mode	`false`
`--jsonl`	JSONL output mode	`false`
`--force-new`	Clear all state and start fresh	`false`
`--auto-mode`	Tool execution mode (`true`/`false`)	Auto-detect

Auto-Mode Examples:

# Automatic tool execution (no confirmation needed)
mla-agent --task_id ~/project --user_input "Task" --auto-mode true

# Manual confirmation for each tool
mla-agent --task_id ~/project --user_input "Task" --auto-mode false

Managing Tool Server

# Start server (background)
mla-tool-server start

# Check status
mla-tool-server status

# Stop server
mla-tool-server stop

# Restart server
mla-tool-server restart

🔌 SDK Integration

MLA provides two SDK options: Python SDK for direct integration and JSONL mode for IDE plugins.

Python SDK

Import and use MLA components directly in your Python code:

from pathlib import Path
from utils.config_loader import ConfigLoader
from core.hierarchy_manager import get_hierarchy_manager
from core.agent_executor import AgentExecutor

# Initialize components
task_id = str(Path.home() / "my_project")
agent_system = "Default"

config_loader = ConfigLoader(agent_system)
hierarchy_manager = get_hierarchy_manager(task_id)

# Get agent configuration
agent_config = config_loader.get_tool_config("alpha_agent")

# Create and run agent
agent = AgentExecutor(
    agent_name="alpha_agent",
    agent_config=agent_config,
    config_loader=config_loader,
    hierarchy_manager=hierarchy_manager
)

# Execute task
result = agent.run(
    task_id=task_id,
    user_input="Write a survey paper on Transformers"
)

print(f"Status: {result['status']}")
print(f"Output: {result['output']}")

Advanced: Custom Agent with Tool Permissions

# Set tool execution mode
agent.tool_executor.set_task_permission(task_id, auto_mode=True)

# Run with custom configuration
result = agent.run(task_id, user_input)

if result['status'] == 'success':
    print("Task completed successfully!")
else:
    print(f"Error: {result.get('error_information')}")

Use Cases for Python SDK:

🔧 Building custom workflows
🤖 Embedding agents in existing applications
📊 Batch processing multiple tasks
🔬 Research experiments with programmatic control

JSONL Mode for IDE Plugins

MLA provides a JSONL streaming mode for real-time integration with IDEs and editors:

mla-agent \
  --task_id $(pwd) \
  --user_input "Optimize code performance" \
  --jsonl 2>/dev/null

Output Format:

{"type":"start","call_id":"c-1760936557-474c43","project":"~/project","agent":"alpha_agent","task":"Optimize..."}
{"type":"token","text":"[alpha_agent] Analyzing code..."}
{"type":"progress","phase":"execution","pct":30}
{"type":"token","text":"Calling tool: code_analyzer"}
{"type":"result","ok":true,"summary":"Optimization complete"}
{"type":"end","status":"ok","duration_ms":5432}

Event Types:

Event Type	Description	Key Fields
`start`	Task begins	`call_id`, `agent`, `task`
`token`	Streaming text output	`text`
`progress`	Progress update	`phase`, `pct`
`result`	Task result	`ok`, `summary`
`end`	Task completed	`status`, `duration_ms`
`error`	Error occurred	`message`

TypeScript/JavaScript Integration

import { spawn } from 'child_process';

interface AgentEvent {
  type: 'start' | 'token' | 'progress' | 'result' | 'end' | 'error';
  [key: string]: any;
}

function runAgent(
  workspacePath: string, 
  userInput: string,
  onEvent: (event: AgentEvent) => void
): Promise<AgentEvent> {
  return new Promise((resolve, reject) => {
  const child = spawn('mla-agent', [
    '--task_id', workspacePath,
    '--user_input', userInput,
    '--jsonl'
  ]);
  
    let buffer = '';
    
  child.stdout.on('data', (data) => {
      buffer += data.toString();
      const lines = buffer.split('\n');
      buffer = lines.pop() || '';
      
      lines.forEach(line => {
      if (!line.trim()) return;
      
        try {
          const event: AgentEvent = JSON.parse(line);
          onEvent(event);
          
          if (event.type === 'end') {
            resolve(event);
          } else if (event.type === 'error') {
            reject(new Error(event.message));
          }
        } catch (e) {
          console.error('Failed to parse event:', line);
    }
  });
});

    child.stderr.on('data', (data) => {
      // Log errors to stderr
      console.error(data.toString());
    });
    
    child.on('error', reject);
  });
}

// Usage
await runAgent('/path/to/workspace', 'Write unit tests', (event) => {
      switch (event.type) {
        case 'start':
      console.log(`Task started: ${event.task}`);
          break;
        case 'token':
      process.stdout.write(event.text);
      break;
    case 'progress':
      updateProgressBar(event.pct);
          break;
        case 'result':
      console.log(`\nResult: ${event.summary}`);
          break;
      }
    });

VS Code Extension Example

Build your own Cursor/VS Code extension using MLA:

Extension Features:

🤖 Agent commands in command palette
💬 Inline chat with workspace context
📝 Automatic code generation and refactoring
🔍 Literature search within editor
🔔 HIL task handling with UI prompts

Basic Extension Structure:

// extension.ts
import * as vscode from 'vscode';
import { runAgent } from './mla-client';

export function activate(context: vscode.ExtensionContext) {
  let disposable = vscode.commands.registerCommand(
    'mla.executeTask', 
    async () => {
      const workspace = vscode.workspace.workspaceFolders?.[0].uri.fsPath;
      const input = await vscode.window.showInputBox({
        prompt: 'Enter task description'
      });
      
      if (!workspace || !input) return;
      
      // Show progress
      await vscode.window.withProgress({
        location: vscode.ProgressLocation.Notification,
        title: 'MLA Agent',
        cancellable: true
      }, async (progress, token) => {
        
        await runAgent(workspace, input, (event) => {
          if (event.type === 'token') {
            vscode.window.showInformationMessage(event.text);
          } else if (event.type === 'progress') {
            progress.report({ increment: event.pct });
          }
        });
      });
    }
  );
  
  context.subscriptions.push(disposable);
}

Screenshot: (User will provide)

📊 Example Outputs

Academic Paper Output

MLA can generate complete research papers with the following structure:

upload/
├── paper.tex               # Main LaTeX document
├── references.bib          # Bibliography
├── figures/
│   ├── architecture.png
│   ├── results_comparison.png
│   └── ablation_study.png
└── supplementary/
    └── detailed_results.pdf

Quality Metrics:

✅ Passes peer review at EI/IEEE conferences
✅ Proper citation formatting
✅ High-quality figures (300 DPI)
✅ Coherent structure and flow

Other Capabilities

1. Scientific Computing

ECM protein composition simulation
Logistics company shift scheduling
Student assignment grading with feedback

2. General Tasks

Web scraping and data extraction
Code generation and debugging
Document conversion and processing

📖 Documentation

Tool Server API Documentation - 18 available tools
Human-in-the-Loop API - User interaction integration
Configuration Examples - Agent YAML templates

🤝 Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

📄 License

see LICENSE for details.

📄 Citation

If you use InfiAgent in your research, please cite our paper:

@article{yu2026infiagent,
  title={InfiAgent: An Infinite-Horizon Framework for General-Purpose Autonomous Agents},
  author={Yu, Chenglin and Wang, Yuchen and Wang, Songmiao and Yang, Hongxia and Li, Ming},
  journal={arXiv preprint arXiv:2601.03204},
  year={2026}
}

🙏 Acknowledgments

Built with LiteLLM for unified LLM access
Uses Crawl4AI for web scraping

📬 Contact

Author: @yuchenglin

Thanks to Contributors： @wangyuchen @wangsongmiao @yuyang @lijinjia

Email: yuchenglin96@qq.com/cl0415@connect.hku.hk/chenglin.yu@poly.edu.h

GitHub: MLA V3 Repository

Dec	JAN	Feb
	16
2025	2026	2027

Name		Name	Last commit message	Last commit date
Latest commit History 209 Commits
assets		assets
config		config
core		core
docs		docs
services		services
tests		tests
tool_server_lite		tool_server_lite
utils		utils
web_ui		web_ui
.gitignore		.gitignore
Dockerfile.gpu		Dockerfile.gpu
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README_CN.md		README_CN.md
requirements.txt		requirements.txt
setup.py		setup.py
start.py		start.py
test_toolserver.bat		test_toolserver.bat
test_toolserver.sh		test_toolserver.sh

License

polyuiislab/infiAgent

Folders and files

Latest commit

History

Repository files navigation