Writing a plugin for the Elastic APM Java agent
Dec 7, 2023
As we all know, observability is something we rely heavily on in our ever more complex landscape. In order to know if our applications are still doing what they should be doing, we can't live without it. But also, when something does go wrong, it can provide us with very valuable insight into what is going on inside our application.
One of the tools we can use to improve the observability of our applications is Application Performance Monitoring agents (or APM agents). The agents provide information on what is going on inside our applications. They generally do this by instrumenting our code, without necessarily needing any changes within our own code. Most providers of APM tooling have agents available for a wide range of programming languages, such as Java, .NET, Python, PHP, etc. Using the instrumentation, the agents can record events like HTTP requests, database queries, and messaging events.
To record events, for example, when an HTTP request starts and ends, the agent will need to hook into the web framework of your choice. The agent provider generally will provide a list of supported technologies and frameworks. In the case of the HTTP request example, the agent will dynamically add some code around the web framework to record when an HTTP request starts and ends.
So what do you actually get by using these agents? In the example of Elastic APM, you'll be able to see a timeline visualization like below.
We can see which HTTP requests are fired and what database calls are being made. This information can be extremely useful when trying to understand what's happening within your application, for example, when looking into a performance issue.
But what to do when your framework of choice is not supported by the agent? By not doing anything, you might be missing out on valuable information. So, is there anything we can do? Luckily, there is! Most APM agents will offer some way of adding manual instrumentation to your code. This might be suitable if you want to add some instrumentation for a very specific use case. But what if, for example, your database framework is not supported? Adding manual instrumentation for each query is not an ideal solution.
Today, we'll be looking at an example where a Java application was using R2DBC (a database framework) and Elastic as the APM provider. The Elastic APM agent offers a plugin API, so we can write our own instrumentation without needing to instrument each individual query.
Before we look into how we can write a plugin for our example case, let's dive into how the Elastic APM agent works. The Elastic APM agent uses bytecode manipulation to instrument code. By using bytecode manipulation, it can modify Java classes at runtime, allowing the agent to change a class without recompiling it. What will typically happen is that the agent will add some code when an instrumented method is entered and exited.
A good example to demonstrate how this works in practice is the Servlet API. The Servlet API in Java is the main entry point for most HTTP servers. There are many implementations, but by instrumenting on the API level, it doesn't matter which implementation is used. The main entry point for the Servlet API is the
service method1. We could write instrumentation that would indicate the start of an HTTP request the moment the
service method is entered and indicate the end of the HTTP request when the method is exited.
We now have a basic idea of how the APM agent works, so we can have a look at how we could write a plugin for our problem at hand. The aim is to be able to record queries that have been executed, including SQL statements. We first need to identify what our entry point will be to instrument. The most obvious choice seems to be
io.r2dbc.spi.Statement#execute[^2]. According to the javadoc, this method is responsible for
Executes one or more SQL statements and returns the Results. There is one problem, however, the
Statement API doesn't have any reference to the SQL statement. There are probably some ways around this, but to keep it simple, we'll instrument a specific implementation:
io.r2dbc.postgresql.PostgresqlStatement. This implementation has an
execute method which takes the SQL statement as a parameter.
Now that we know what to instrument, how do we actually record a query that has been executed? Elastic is using OpenTelemetry2 to record events. In the terminology, we call a request a 'trace', and within a trace, we can have multiple or nested 'spans'. A span describes a single unit of work, for example, a database query. OpenTelemetry also defines conventions on how to record data about the specific unit of work3.
Elastic offers a plugin API. To define the plugin, we need to extend
ElasticApmInstrumentation. This offers a couple of overrides to define the plugin. First of all, let's define the matchers, describing what we want to instrument:
We can see we match on 2 things here, the class name of the type we want to match and the method we want to match on. Now we need to define what we need to do when that method is called:
We wrap around the Reactor type here to start the span before we subscribe to the query result and once it completes we close the span. The span contains the statement (query) being executed. The query will now show up in our timeline (like in the screenshot earlier) and we'll be able to see a summary of the statement and how long it took to execute that statement.
The signature parser that is being used is a copy of the
SignatureParser class from the JDBC plugin of the APM agent4. To package the plugin a few more steps are needed, you can find them on the Elastic website: https://www.elastic.co/guide/en/apm/agent/java/current/plugin-api.html