Files
camel-ops-platform/docs/camel_agent_spec.md

59 lines
4.7 KiB
Markdown

# Apache Camel Java Agent Specification
## 1. Overview
To achieve "zero-code change" data extraction for Apache Camel applications (similar to the nJAMS agent for MuleSoft/TIBCO), we utilize the JVM's `java.lang.instrument` API via a `-javaagent` flag. This agent dynamically attaches to the target JVM, hooks into the Camel Context initialization, and injects native Camel telemetry SPIs (`InterceptStrategy` and `EventNotifier`) without requiring developers to alter their Camel routes or application code.
## 2. Technical Implementation: The `-javaagent` Hook
### JVM Instrumentation & Bytecode Manipulation
When the application starts with `-javaagent:/path/to/camel-agent.jar`, the JVM invokes the agent's `premain` method before the application's `main` method.
1. **Bytecode Interception (ByteBuddy / ASM):**
The agent registers a `ClassFileTransformer` via the `Instrumentation` API. Using a library like [ByteBuddy](https://bytebuddy.net/) (which is safer and easier to maintain than raw ASM), we instrument the `org.apache.camel.impl.engine.AbstractCamelContext` or `DefaultCamelContext` classes depending on the Camel version.
2. **Hooking Context Initialization:**
We intercept the `start()` or `build()` methods of the `CamelContext`.
```java
// Pseudo-code for ByteBuddy Advice
@Advice.OnMethodEnter
public static void onStart(@Advice.This CamelContext context) {
CamelAgentRegistrar.register(context);
}
```
3. **Registering Native Camel SPIs:**
Inside `CamelAgentRegistrar.register(context)`, we inject our monitoring components directly into the context before the routes start processing messages. Because we are inside the JVM, we have direct access to the `CamelContext` object:
* **`EventNotifier`:** Added via `context.getManagementStrategy().addEventNotifier(...)`. This captures macro lifecycle events (`ExchangeCreatedEvent`, `ExchangeCompletedEvent`, `ExchangeFailedEvent`) with extremely low overhead.
* **`InterceptStrategy`:** Added via `context.addInterceptStrategy(...)`. This wraps every `Processor` in the route definition. It allows us to capture the payload (`Exchange.getIn().getBody()`), headers, and properties before and after each discrete step in the integration flow.
This approach guarantees zero-code instrumentation while remaining entirely "Camel Native," avoiding the fragility of intercepting arbitrary application methods.
## 3. High-Performance Payload Extraction
Capturing payloads at every step of a Camel route can introduce severe GC (Garbage Collection) pauses, CPU spikes, and memory bloat. We must extract data cheaply.
### Serialization Strategy
1. **Type-Aware Truncation (The First Line of Defense):**
Before attempting deep serialization, we inspect the payload type. If it's a `String`, `byte[]`, or `InputStream` (common in Camel), we read only the first *N* bytes (e.g., 4KB limit) and discard the rest. We use cached/pooled buffers to read streams without allocating new byte arrays per exchange.
2. **Fast Object Serialization (Kryo):**
If the payload is a complex POJO, standard Java serialization or Jackson/Gson JSON mapping is too slow and allocates too much memory.
* We embed [Kryo](https://github.com/EsotericSoftware/kryo), a fast, efficient object graph serialization framework.
* Kryo instances are heavily pooled (e.g., using `ThreadLocal` or a fast object pool) because they are not thread-safe but are extremely fast and produce compact binary outputs when reused.
### Memory & GC Pause Prevention
1. **Asynchronous Ring Buffers (LMAX Disruptor):**
Instead of creating new Event objects on the heap for every intercepted payload and doing synchronous I/O, the `InterceptStrategy` writes the extracted (and truncated) data directly into a pre-allocated, lock-free Ring Buffer (e.g., LMAX Disruptor or a simple circular array).
* A dedicated, low-priority background thread consumes from this ring buffer, batches the events, and flushes them to the local OTEL Collector or Log storage.
* **Load Shedding:** If the system is under extreme load and the buffer fills up, the agent **drops** the telemetry data rather than blocking the Camel routing threads. Monitoring must never crash the host application.
2. **Conditional Capture (Dynamic Sampling):**
To further reduce overhead, the agent queries the local Appliance Hub for rules:
* *Error-Only Mode:* Payloads are cached in a tiny thread-local circular buffer and only serialized/retained if the Exchange fails (`Exchange.isFailed()`).
* *Sampling:* Only 1 in 100 exchanges are deeply inspected.
* *Step Filtering:* Only capture payloads at the ingress and egress endpoints of the route, ignoring intermediate data transformation steps unless debug mode is triggered.