Documentation Index
Fetch the complete documentation index at: https://novita.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
This document covers advanced features and best practices for Novita Agent Runtime.
Table of Contents
Configuration File Reference
.novita-agent.yaml Structure
The .novita-agent.yaml configuration file uses Kubernetes-style YAML format:
apiVersion: v1
kind: Agent
metadata:
name: my-agent # Agent name (must consist of lowercase letters, numbers, and hyphens only)
version: 1.0.0 # Agent version (semantic versioning)
author: dev@example.com # Author email (required)
description: My AI Agent # Agent description (optional)
created: '2025-10-23T10:30:00Z' # Creation time (auto-generated)
spec:
entrypoint: app.py # Python entry file (must be .py file)
# Environment variables configuration (optional)
envVars:
MODEL_NAME: deepseek/deepseek-v3-0324
TEMPERATURE: '0.7'
# Runtime configuration (optional, applied to the built sandbox template)
runtime:
timeout: 300 # Startup timeout in seconds (1-3600, default 300)
memory_limit: 1Gi # Memory limit (supports "512Mi", "1Gi", etc.)
cpu_limit: '1' # CPU limit (supports "1", "1000m", etc.)
# Status field (maintained by the system, should not be modified manually by users)
status:
phase: deployed # Current phase: pending/building/deployed/failed
agent_id: agent-xxxxx # Agent ID (auto-generated after deployment)
last_deployed: '2025-10-23T10:35:00Z' # Last deployment time
build_id: build_xyz789 # Build ID (auto-generated after deployment)
Modifying Configuration
Modifying CPU and Memory Settings
Modify resource configuration under spec.runtime in .novita-agent.yaml:
spec:
runtime:
# CPU configuration
cpu_limit: '2' # 2 CPU cores
# Memory configuration
memory_limit: 2Gi # 2 GB memory
Modifying Environment Variables
The spec.envVars in .novita-agent.yaml is only used for the CLI’s agent invoke command and will not be passed to the deployed sandbox template.
Modify environment variables under spec.envVars in .novita-agent.yaml:
spec:
envVars:
# LLM configuration
MODEL_NAME: deepseek/deepseek-v3-0324
TEMPERATURE: '0.7'
Note:
- ⚠️ Do not store sensitive information (such as API Keys) in
.novita-agent.yaml
- You can also pass environment variables via the
--env parameter when running the agent invoke command
Redeploy to Apply Configuration Changes
After modifying resource specifications in .novita-agent.yaml, redeploy is required:
# Redeploy (creates a new version)
npx novita-sandbox-cli agent launch
Environment Variables Management
There are several ways to pass environment variables to Agents running in sandbox instances:
Method 1: Define in Configuration File (CLI invocation only)
Define environment variables under spec.envVars in .novita-agent.yaml:
spec:
envVars:
MODEL_NAME: deepseek/deepseek-v3-0324
TEMPERATURE: '0.7'
Method 2: Pass Dynamically via SDK
When invoking an Agent using the SDK’s invoke_agent_runtime method, pass them dynamically via the envVars parameter:
import os
from novita_sandbox.agent_runtime import AgentRuntimeClient
client = AgentRuntimeClient(api_key=os.getenv("NOVITA_API_KEY"))
response = await client.invoke_agent_runtime(
agentId="agent-xxxxx",
payload=payload,
envVars={
# Read sensitive information from environment variables
"NOVITA_API_KEY": os.getenv("NOVITA_API_KEY"),
"DATABASE_PASSWORD": os.getenv("DATABASE_PASSWORD"),
# Or pass directly
"MODEL_NAME": "deepseek/deepseek-v3-0324",
"TEMPERATURE": "0.7"
}
)
Streaming Responses
Implementing Streaming with Synchronous Generators
Use Python generators to implement streaming responses:
from novita_sandbox.agent_runtime import AgentRuntimeApp
app = AgentRuntimeApp()
@app.entrypoint
def streaming_agent(request: dict):
"""Synchronous streaming response"""
prompt = request.get("prompt", "")
# Use generator to return chunks
for i, chunk in enumerate(generate_response(prompt)):
yield {
"chunk": chunk,
"type": "content",
"index": i
}
# Send end marker
yield {"chunk": "", "type": "end"}
Implementing Streaming with Async Generators
Use Python async generators:
import asyncio
@app.entrypoint
async def async_streaming_agent(request: dict):
"""Async streaming response"""
prompt = request.get("prompt", "")
async for chunk in async_generate_response(prompt):
yield {
"chunk": chunk,
"type": "content"
}
yield {"chunk": "", "type": "end"}
LangChain Streaming Response Example
Complete example using LangChain for streaming responses:
import os
from langchain_openai import ChatOpenAI
from langchain.callbacks.base import BaseCallbackHandler
from novita_sandbox.agent_runtime import AgentRuntimeApp
app = AgentRuntimeApp()
class StreamingHandler(BaseCallbackHandler):
"""Streaming callback handler"""
def __init__(self):
self.tokens = []
def on_llm_new_token(self, token: str, **kwargs):
self.tokens.append(token)
@app.entrypoint
def langchain_streaming_agent(request: dict):
"""LangChain streaming response"""
prompt = request.get("prompt", "")
# Create streaming-enabled LLM
llm = ChatOpenAI(
api_key=os.getenv("NOVITA_API_KEY"),
streaming=True
)
# Stream invocation
for chunk in llm.stream(prompt):
if chunk.content:
yield {
"chunk": chunk.content,
"type": "content"
}
yield {"chunk": "", "type": "end"}
Invoking a Streaming Agent
Invoke a streaming Agent using the SDK:
import asyncio
import json
import os
from novita_sandbox.agent_runtime import AgentRuntimeClient
async def call_streaming_agent():
client = AgentRuntimeClient(api_key=os.getenv("NOVITA_API_KEY"))
payload = json.dumps({
"prompt": "Tell me a story"
}).encode()
response = await client.invoke_agent_runtime(
agentId="agent-xxxxx",
payload=payload
)
# Process streaming response
print("Streaming response:")
print(response)
Version Management
Deploying a New Agent Version
Modify the version number and deploy a new version:
# Modify version number
npx novita-sandbox-cli agent configure --agent-version 1.1.0
# Deploy new version
npx novita-sandbox-cli agent launch
After successful deployment, a new agent_id is generated. Each deployment generates a unique agent_id that corresponds to a specific version.
Health Checks
Default Health Check Endpoint
AgentRuntimeApp automatically provides a /ping health check endpoint:
from novita_sandbox.agent_runtime import AgentRuntimeApp
app = AgentRuntimeApp()
# Default health check automatically responds with {"status": "Healthy"}
Custom Health Checks
Use the @app.ping decorator to customize health check logic:
@app.ping
def custom_health_check():
"""Custom health check"""
# Check dependent services
db_ok = check_database_connection()
llm_ok = check_llm_service()
if db_ok and llm_ok:
return {"status": "Healthy"}
elif db_ok or llm_ok:
return {"status": "HealthyBusy"} # Partially available
else:
return {"status": "Unhealthy"} # Unavailable
def check_database_connection():
"""Check database connection"""
try:
# Simulate database check
return True
except:
return False
def check_llm_service():
"""Check LLM service"""
try:
# Simulate LLM service check
return True
except:
return False
Supported Health Check Statuses
Agents can return the following health statuses:
| Status | Description | HTTP Status Code |
|---|
Healthy | Agent is fully available | 200 |
HealthyBusy | Agent is partially available (e.g., processing heavy load) | 200 |
Unhealthy | Agent is unavailable | 503 |
Multi-turn Conversations
Using Session ID for Multi-turn Conversations
Use the runtimeSessionId parameter to route multiple requests to the same sandbox instance:
import uuid
import json
import os
from novita_sandbox.agent_runtime import AgentRuntimeClient
async def multi_turn_conversation():
runtime_session_id = str(uuid.uuid4())
client = AgentRuntimeClient(api_key=os.getenv("NOVITA_API_KEY"))
agent_id = "agent-xxxxx"
# First turn
response1 = await client.invoke_agent_runtime(
agentId=agent_id,
payload=json.dumps({"prompt": "Hello, my name is John"}).encode(),
runtimeSessionId=runtime_session_id,
)
print(f"AI: {response1}")
# Second turn (sent to the same sandbox instance, Agent remembers the context)
response2 = await client.invoke_agent_runtime(
agentId=agent_id,
payload=json.dumps({"prompt": "What's my name?"}).encode(),
runtimeSessionId=runtime_session_id,
)
print(f"AI: {response2}") # Should answer "John"
Example Projects
We provide a complete example project based on LangGraph, demonstrating how to build real AI applications with Novita Agent Runtime.
Project Repository
🔗 https://github.com/novitalabs/Novita-CollabHub/tree/main/examples/agent-runtime/agentic-frameworks/langgraph