Monitoring & Debugging

When running Agent systems in production, "black box" is the biggest enemy. You need to know what the Agent is doing, why it is doing it, and how to recover when things go wrong. LangGraphGo provides multi-level monitoring and debugging tools.

1. Listeners

Background & Functionality

Listeners allow you to subscribe to graph lifecycle events in a non-intrusive way. You can monitor when nodes start, end, what the input is, what the output is, and whether an error occurred.

This is very useful for integrating logging systems (like Zap, Logrus), distributed tracing (like OpenTelemetry), or real-time monitoring dashboards.

Implementation Principle

The runtime triggers events at key execution points (before node start, after node end, on error). Registered listener functions are called synchronously or asynchronously.

Code Showcase

// Define listener function
listener := func(event graph.Event) {
    switch event.Type {
    case graph.NodeStart:
        log.Printf("[START] Node: %s, Input: %v", event.Node, event.Input)
    case graph.NodeEnd:
        log.Printf("[END] Node: %s, Output: %v", event.Node, event.Output)
    case graph.Error:
        log.Printf("[ERROR] Node: %s, Err: %v", event.Node, event.Error)
    }
}

// Register listener
runnable.AddListener(listener)

2. Durable Execution

Background & Functionality

Agent tasks might run for a long time (minutes or even hours). If the process crashes or server restarts, we don't want to start from scratch. Durable execution allows tasks to automatically resume from the last successful Checkpoint.

Implementation Principle

This relies on the Checkpointing mechanism. Every state update is persisted. When the system restarts and resumes with the same ThreadID, it first loads the latest Checkpoint and continues execution from there.

Code Showcase

// 1. Get latest state
latestCheckpoint, err := store.GetLatest(threadID)

if latestCheckpoint != nil {
    // 2. If history exists, resume
    log.Println("Resuming from checkpoint...")
    runnable.ResumeFromCheckpoint(ctx, latestCheckpoint.ID)
} else {
    // 3. Otherwise start new
    log.Println("Starting new execution...")
    runnable.Invoke(ctx, input)
}

3. Debugging Tools

Besides the above mechanisms, LangGraphGo also recommends combining standard Go debugging tools (like Delve) and profiling tools (pprof) to deeply analyze Agent behavior.