Filtering on FlowFields

Feb 6

It has been said so many times that filtering on FlowFields is bad for performance that we are used to avoiding this kind of filters in AL at all costs. So whenever a developer needs to select a subset of records based on criteria derived from a FlowField, they are often likely to write a loop checking each value one by one, rather than use SetFilter or SetRange.

So instead of code similar to this:

Item.SetFilter(Inventory, '>0');
Item.FindSet();

we can encounter a loop:

Item.SetAutoCalcFields(Inventory);
Item.FindSet();
repeat
    if Item.Inventory > 0 then
	...
until Item.Next() = 0;

or sometimes even an alternative version of it:

Item.FindSet();
repeat
    Item.CalcFields(Inventory);
    if Item.Inventory > 0 then
	...
until Item.Next() = 0;

Things we do to avoid this cursed filter! But is it really so bad that we should avoid it no matter what? Let's see.

To verify the assumptions, I prepared a test running all the code samples above:

Select Item records filtered on the Inventory field;
Call FindSet on the Item with SetAutoCalcFields and check the value of the Inventory field in a loop;
And call FindSet with CalcFields on each record inside the loop.

I have seen occasions of counting records in a loop just to avoid filtering on a FlowField, so for this test I added a fourth scenario: filter the Item table on the Inventory field and call Count.

Item.SetFilter(Inventory, '>%1', InventoryFilter);
Item.Count();

Test

To test the performance of all four cases, I prepared 10 000 items and 1000000 item ledger entries with a randomly selected item assigned to each entry, so there are from 64 to 136 item ledger entries posted for each item.

And just to see how the number of selected records impacts the runtime, I run a series of measurements, adjusting the Inventory filter such that each step adds 1000 items to the result.

In all tests I limit the number of the selected fields by calling SetLoadFields with three fields included: Description, "Unit Cost", "Unit Price".

When all tests are completed, I received this result.

Time for the loop with CalcFields inside fluctuates significantly between 3 seconds and 4.5 seconds, but in any case, it is the slowest method of checking the condition (but we knew this, didn't we?) All other lines tend to stick together, suggesting that anything is better than calling CalcFields in a loop. Just the Count function seems to be behaving slightly better than the rest. Are they really all the same? Let's take a closer look at the queries behind these calculations to find out.

SQL queries behind AL code

CalcFields in a loop

In case of the CalcFields function called in a loop, the whole process is split into two parts: one query selecting the items and the query calculating the Inventory value for each item.

The first query is a simple SELECT with the list of columns requested by SetLoadiFields, plus additional columns loaded by the BC server.

SELECT
    "27"."timestamp","27"."No_","27"."Description","27"."Unit Price",
    "27"."Unit Cost","27"."Item Category Code",
    "27"."$systemId","27"."$systemCreatedAt","27"."$systemCreatedBy",
    "27"."$systemModifiedAt","27"."$systemModifiedBy"
FROM "CRONUS".dbo."New$Item$..." "27"  WITH(READUNCOMMITTED)
ORDER BY "No_" ASC OPTION(FAST 50)

The second query summarizes the quantity for the item from the SIFT view VSIFT$Key2, and this query is invoked repeatedly 10000 times - one for each item.

SELECT SUM("SUM$Quantity")
FROM "CRONUS".dbo."New$Item Ledger Entry$...$VSIFT$Key2" WITH(READUNCOMMITTED,NOEXPAND)
WHERE ("Item No_"=@0)

Loop with SetAutoCalcFields

If we compare the query generated by the FindSet with SetAutoCalcFields with the previous two queries, we can notice that now these two are combined into a single query with the help of the OUTER APPLY operator.

SELECT
    "Item"."timestamp" AS "timestamp","Item"."No_" AS "No_",
    "Item"."Description" AS "Description",
    "Item"."Unit Price" AS "Unit Price",
    "Item"."Unit Cost" AS "Unit Cost",
    "Item"."Item Category Code" AS "Item Category Code",
    "Item"."$systemId" AS "$systemId",
    "Item"."$systemCreatedAt" AS "SystemCreatedAt",
    "Item"."$systemCreatedBy" AS "SystemCreatedBy",
    "Item"."$systemModifiedAt" AS "SystemModifiedAt",
    "Item"."$systemModifiedBy" AS "SystemModifiedBy",
    ISNULL("SUB$Inventory"."Inventory$Item Ledger Entry$SUM$Quantity",0.0) AS "Inventory"
FROM "CRONUS".dbo."New$Item$..." AS "Item" WITH(READUNCOMMITTED)
OUTER APPLY (
    SELECT  TOP (1) SUM("Inventory$Item Ledger Entry"."SUM$Quantity") AS   
        "Inventory$Item Ledger Entry$SUM$Quantity"
    FROM "CRONUS".dbo."New$Item Ledger Entry$...$VSIFT$Key2" AS
        "Inventory$Item Ledger Entry" WITH(READUNCOMMITTED,NOEXPAND)
    WHERE ("Inventory$Item Ledger Entry"."Item No_"="Item"."No_")) AS "SUB$Inventory"
ORDER BY "No_" ASC OPTION(FAST 50)

OUTER APPLY executes the subquery for each value returned by the main query, meaning that for each item, the sum of the values in the SUM$Quantity column of the VSIFT$Key2 will be calculated. This is somewhat similar to the fist example where I call CalcFields 10000 times in a loop. If we look at the query execution plan and the details of the index seek operation for the seek on the SIFT view, we can see that it has also been executed 10000 times.

But even though the number of seeks is the same, the overhead of sending 10000 separate queries from the client makes the big difference in the execution time.

Filter on the Inventory FlowField

The query with the filter on the Inventory field looks almost the same as the previous one where I called SetAutoCalcFields before FindSet.

SELECT
    "Item"."timestamp" AS "timestamp","Item"."No_" AS "No_",
    "Item"."Description" AS "Description",
    "Item"."Unit Price" AS "Unit Price","Item"."Unit Cost" AS "Unit Cost",
    "Item"."Item Category Code" AS "Item Category Code",
    "Item"."$systemId" AS "$systemId",
    "Item"."$systemCreatedAt" AS "SystemCreatedAt",
    "Item"."$systemCreatedBy" AS "SystemCreatedBy",
    "Item"."$systemModifiedAt" AS "SystemModifiedAt",
    "Item"."$systemModifiedBy" AS "SystemModifiedBy"
FROM "CRONUS".dbo."New$Item$..." AS "Item" WITH(READUNCOMMITTED)
OUTER APPLY (
    SELECT  TOP (1) SUM("Inventory$Item Ledger Entry"."SUM$Quantity") AS 
        "Inventory$Item Ledger Entry$SUM$Quantity" 
    FROM "CRONUS".dbo."New$Item Ledger Entry$...$VSIFT$Key2" AS 
        "Inventory$Item Ledger Entry" WITH(READUNCOMMITTED,NOEXPAND)
    WHERE ("Inventory$Item Ledger Entry"."Item No_"="Item"."No_")) AS
        "SUB$Inventory"
WHERE (
    ISNULL(
      "SUB$Inventory"."Inventory$Item Ledger Entry$SUM$Quantity",0.0)>=@0)
ORDER BY "No_" ASC OPTION(FAST 50)

This query also uses the OUTER APPLY operator to calculate quantities for each item, and the only difference is the last WHERE condition. Essentially, this condition is exactly what we do in AL code comparing the returned value with the expected, but moved to the SQL Server side.

In fact, the effect of this shift of the comparison from AL to SQL is quite observable. If I remove SetLoadFields in my code and repeat all the same tests with the full record, the first thing I notice is that all times have now increased. And the second effect that catches the eye is the gradual decrease of the execution time for the FindSet with the filter on the flowfield. The fewer data has to be sent back to Business Central from the SQL server, the faster is the result.

FindSet with the SetFilter on the Inventory field takes approximately the same time as FindSet with SetAutoCalcFields and the selection condition on the AL side when all items in the table satisfy the condition. And the time decreases steadily as more items are excluded from the filter. At its lowest value, it equals the Count function with the filter on Inventory, which is now unequivocally the fastest.

Conclusion

Filtering on a flowfild is obviously not the fastest possible way of selecting records because it triggers calculations of the flowfield value for every record. Filters on a stored field (especially if it is indexed) will always be faster. But if you have to select records based on a flowfield, don't be scared to apply a filter directly on the field. In the worst case, it behaves as good as the autocalculated field with the condition in AL. In the best case, though, it can be noticeably faster.

The bigger the record size that we want to retrieve from the table, the more beneficial the filter on the record is compared to sending data from the DB server to the client for analysis.

And as for the Count, there is certainly no need to send records from SQL to BC just to count them. Filter on the flowfield with the Count function is certainly faster.

Filtering on FlowFields

Test

SQL queries behind AL code

Conclusion

Recent Posts

Comments

SIGN UP AND STAY UPDATED!