Garbage collection of event subscribers

I once wrote a post on event subscribers and the presumed performance benefits of single instance subscriber codeunits, but the topic keeps reappearing every now and then, and I feel it could be a good idea to add a few more arguments to the same old debate. Because why not run a few useless, but highly entertaining tests on a Saturday evening.

One statement in defense of the single instance event subscribers is that the subscriber codeunit does not have to be disposed after the execution and remains in memory, while non-single instances are re-instantiated and disposed on every execution. In long, complex processes where hundreds of events are raised, all the disposed event subscribers may start triggering the garbage collection so frequently that this may be a performance concern. My (purely speculative) response is that this could equally be an argument against declaring any local variables of reference types in C# because the garbage collector has to take care of those. In case AL, this statement could be extended even further. Considering that any variable declaration in an AL object turns into a field in the respective parent class after transpiling, even value type variables are allocated in the heap and are subject to garbage collection. That is not to mention all the other stuff that the transpiler adds to AL objects. But, as I said, this is all speculations. Let's crunch some numbers and see what part garbage collection plays in the lifetime of an AL event subscriber.

The guinea pig for the test is a codeunit publishing one event and calling it in a loop.

trigger OnRun()
var
    I: Integer;
begin
    for I := 1 to 1000000 do
        OnTestEvent();
end;

[IntegrationEvent(false, false)]
local procedure OnTestEvent()
begin
end;

And another codeunit subscribing to the published event and doing nothing at all except instantiating two variables.

codeunit 56111 "Event Subscriber"
{
    [EventSubscriber(ObjectType::Codeunit,
        Codeunit::"Event Publisher", OnTestEvent, '', false, false)]
    local procedure HandleTestEvent()
    var
        I: Integer;
    begin
    end;

    var
        Item: Record Item;
}

While the event loop is running, I am capturing system events with PerfView which gives perfect insights into the process heap.

Let's start from 10 000 iterations in the loop.

The test codeunit runs for 130 ms and the garbage collection report comes pristine blank after this first run.

10 000 iterations are too insignificant. An event subscriber codeunit with one method and a record variable in it is not enough to trigger any garbage collection. And it does not change when the number of iterations is raised to 100 000 - still no garbage collection events in the trace. So I bump the number up to one million and run the same test again.

Now the process runs for 1.55 seconds and I finally see a lone GC event in the trace. And we can see that all it took to clean up one million codeunit instances is 3.8 ms.

When rerun multiple times, the same test can show different results depending on the state of the process memory: whether the garbage collection stops at Gen0 (then it can be as fast as 4 ms, like the picture above) or continues to Gen1 (and then it takes 130 - 160 ms). On a freshly restarted BC service, one million iterations of the loop may not even trigger any garbage collection at all.

The GC heap analyzer chart below shows another example of a GC event stemmed from the same event loop, but with more prior thread activity. This is a Gen1 GC with 138 ms pause.

A 200 megabyte memory block promoted to Gen2 is not something we would expect from this event subscriber. All objects in the test are short-living and should not survive Gen0 sweeping. And this is indeed what the chart shows if we look at the ephemeral generations only: the amount of committed memory quickly grows to approximately 1700 MB and is then completely eliminated during the GC. Gen1 memory also drops to nearly 0. The promoted 200 MB could be server objects and open pages (role center, the page with the action that starts the test, etc). But so far we see pretty much interference and have to guess which part of this memory is actually the subscriber codeunit. How about 5 000 000 iterations then? Can we derive anything from an even longer loop and separate the event subscriber memory from the rest of the server activity?

With 5 000 000 iterations, the loop runs for 8,17 seconds, and the total GC pause extends to 357 ms.

Now this is something to look into. The first event mirrors the previous one - a long (relatively) 331 ms pause, once again promoting a large block (runtime objects and open pages?) to Gen2, whilst the other two Gen0 events are exactly what I've been looking for: sweeping of short-lived event subscribers with no memory promoted to higher generations. And the total GC pause for the two garbage collections is 7.6 milliseconds. This is an example illustrating that garbage collection of consecutively allocated blocks without the compaction phase is extremely fast.

How about a single instance subscriber then? Once again, I am calling the same even publisher five million times in a loop, but the subscriber is single instance this time.

We can see the difference now - Gen1 size is increased during the first GC event: instead of being freed, more objects are promoted from Gen0. And the second GC pushes this memory to Gen2, raising the total Gen2 volume to almost exactly 1 GB. Our single instance event subscriber must be hiding somewhere there in this gigabyte. Total GC time is 182 ms, slightly lower indeed. But remember that this is the time amortized over 5 000 000 event calls.

And the whole event loop runs noticeably faster compared to the normal non-single instance codeunit: only 4.72 seconds vs 8.17. Which means approximately 0.7 microsecond of performance improvement per event call. And the charts above indicate that the difference does not come from GC, but probably from the initialization of the subscriber codeunit and the subscription binding.

So to sum up my pointless tests, garbage collection can take as much as 8-10% of doing nothing five million times. This test is very much simplified of course - it is only intended to estimate the percentage of time that memory management takes in a bare event call with no other activities. I am also only dealing with "fire and forget" kind of events here. Event call leaves no trace in memory, all subscribers are allocated consecutively and can be eliminated by a single move of a heap pointer. This could be different in a situation when memory allocation becomes holed when short-lived objects are interspersed with long-lived ones. This could happen, for example, if each event subscriber fetches a fresh database record which is saved in the server cache whole life span exceeds the lifetime of the subscriber itself. But when it comes to database queries, memory allocation/deallocation is definitely not going to be the bottleneck.

Analysis of memory allocation is something we could do out of curiosity, but doubtfully as a practical performance improvement exercise. It rather makes sense to consider code maintainability first, and only if its performance becomes paramount, and the event is called millions of times, can we think about things like memory management and garbage collection to fight for a few more milliseconds.

This is not to say that variable declarations per se are never a concern. There are situations when a mere declaration of an array variable can cause a performance setback. But let me leave it for another post.

Garbage collection of event subscribers

Recent Posts

2 Comments

SIGN UP AND STAY UPDATED!