SIFT

Aug 5, 2024

SIFT, which stands for Sum Index Flow Technology, used to be a know-how in the Navision database, a data structure that stored precalculated summed amounts directly in index nodes, so that aggregated data could be accessed very quickly without the need to read data rows and calculate sums. To maintain the same data aggregations in a SQL Server database, an indexed view is created for each SIFT key defined in the table.

Although it is not SIFT it used to be in the Navision database, SumIndexFields property on table keys is still widely used. Even if SIFT stands for SunIndexField Technology now instead of SumIndexFlow. In this post and a few following ones, I want to touch upon the important topic of indexed views supporting data aggregations, their pros and cons, and performance considerations of these views - positive, as well as negative.

This post is about the bright side of things - what a SIFT key actually is and how SIFTs help us retrieve sums of field values from large tables in a blink of an eye.

For the following demo, I will use a made-up table with some code and date fields to filter on, and a decimal field Amount which I want to sum.

table 50501 "My Sales Table"
{
    DataClassification = CustomerContent;

    fields
    {
        field(1; "Entry No."; Integer)
        {
            Caption = 'Entry No.';
        }
        field(2; "Customer No."; Code[20])
        {
            Caption = 'Customer No.';
        }
        field(3; "Item No."; Code[20])
        {
            Caption = 'Item No.';
        }
        field(4; "Item Category Code"; Code[20])
        {
            Caption = 'Item Category Code';
        }
        field(5; "Customer Group Code"; Code[20])
        {
            Caption = 'Customer Group Code';
        }
        field(6; "Posting Date"; Date)
        {
            Caption = 'Posting Date';
        }
        field(7; Amount; Decimal)
        {
            Caption = 'Amount';
        }
    }

    keys
    {
        key(PK; "Entry No.")
        {
            Clustered = true;
        }
        key(CustomerItemDate; "Customer No.", "Item No.", "Posting Date")
        {
        }
    }
}

This is a hypothetical table of entries containing both customer and item information (which I actually fill with GUIDs truncated to 20 characters). I will be summarizing amounts by customers, items, and dates, so I add an index that can help me in this, but so far, this is a normal index without SumIndexFields. Before adding SumIndexFields, I want to run a few queries without it to have some baseline data for comparison.

Aggregations without SumIndexFields

Now I will insert one million records into the table and retrieve some aggregated amounts in a period, summarized by customer and item. I call the fields "Customer No." and "Item No." just to refer to something familiar, but in fact these are nothing but GUIDs and can as well be called "Code 1" and "Code 2". Let's start by calling CalcSums on a record, filtering one customer, one item, and picking records from one day.

MySalesTable.SetRange("Customer No.", CustomerNo);
MySalesTable.SetRange("Item No.", ItemNo);
MySalesTable.SetRange("Posting Date", 20260101D);
MySalesTable.CalcSums(Amount);

For this query, I intentionally select all query arguments with a significant number of entries for the given combination of "Customer No.", "Item No.", and "Posting Date". To be precise, this combination of filters returns 1550 records.

Quite as expected, this code generates a SQL query which invokes the SUM function on a filtered subset of table rows.

SELECT SUM("50501"."Amount")
FROM "CRONUS".dbo."CRONUS International Ltd_$My Sales Table$382b9cf1-7c89-48eb-bdc6-be609426126a" "50501" WITH(READUNCOMMITTED)
WHERE (
  "Customer No_"=@0
  AND "Item No_"=@1
  AND "Posting Date">=@2 AND "Posting Date"<=@3)

The execution plan confirms that that the index on Customer, Item, and Date facilitates the query and the query optimizer uses it to run an index seek operation. Still, the database engine has to perform the key lookup into the clustered index to retrieve the values from the Amount column. And, as we can see, the key lookup is by far the most expensive operation of the query.

So far, it doesn't outweigh the benefits of the index seek, though.

And how about the timing? It doesn't look so scary yet. Time it took to sum up the amount from 1550 records is 63 ms.

Now I will increase the number of records the query has to read by expanding the date filter. I keep the same customer and item, but summarize amounts in 11 days instead of just one.

MySalesTable.SetRange("Customer No.", CustomerNo);
MySalesTable.SetRange("Item No.", ItemNo);
MySalesTable.SetRange("Posting Date", 20260203D, 20260213D);
MySalesTable.CalcSums(Amount);

Total number of records in the filtered period is 18458. It is not huge, but sufficient to steer the query optimizer towards an execution plan based on the clustered index scan. Now the key lookup becomes too expensive, eliminating the positive effect of fast index seeks.

Query execution time in the client statistics is almost 900 ms now.

This looks slow for a single sum operation, especially compared to 63 ms in the previous run. On the other hand, 900 ms is all it takes to scan 1 000 000 records from an SSD drive into memory and find 18458 that satisfy the query predicates. We can sum up 50 000 records, 500 000, or all of them - the query can' get any slower, since the whole table is cached.

Now we've seen that somewhere along the way between 1550 records and 18458 records fetched by the sum query, SQL Server decides that the index is not helpful anymore and the full table scan is the way forward.

And this is the situation where SIFT views come to aid if we want to improve performance of our aggregation queries.

Adding SumIndexFields

After running a few baseline tests on the table without SumIndexFields, let's add one SIFT key and see how this changes the calculation.

In order to create the SIFT, I simply add the SumIndexFields property to the existing key, specifying the field which I want to be aggregated.

key(CustomerItemDate; "Customer No.", "Item No.", "Posting Date")
{
    SumIndexFields = Amount;
}

Once I add the SumIndexFields property on the key and deploy the code, I can find a new view created in the database. The name of the view is the combination of the table name and the key name. So the full name of the view is this bulky string:

CRONUS International Ltd_$My Sales Table$382b9cf1-7c89-48eb-bdc6-be609426126a$VSIFT$CustomerItemDate

This is an indexed view, which means that it has a clustered index and is, basically, a table whose content is stored in the database and is automatically updated by the database engine when INSERT / MODIFY / DELETE queries are executed against its source table "My Sales Table".

The view structure follows the definition of the key, with all the key fields and SumIndexFields included in the view. One additional field $Cnt stores the number of records that constitute this amount to facilitate the Count function, as well as Sum.

After defining the SIFT, I can run the same AL code calculating the total amount in My Sales Table, and Business Central server is smart enough to identify the best suitable SIFT view and change the query, selecting the sum from the indexed view instead of querying the underlying table.

SELECT SUM("SUM$Amount")
FROM "CRONUS".dbo."CRONUS International Ltd_$My Sales Table$382b9cf1-7c89-48eb-bdc6-be609426126a$VSIFT$CustomerItemDate" WITH(READUNCOMMITTED,NOEXPAND)
WHERE (
  "Customer No_"=@0
  AND "Item No_"=@1
  AND "Posting Date">=@2 AND "Posting Date"<=@3)

This is the new query selecting the amount summarized by customer, item, and date. Note that the data is selected from the $VSIFT$CustomerItemDate view, not the data table, as it was in previous queries.

And the time statistics now change significantly - total execution time now goes down to 13 ms from 900 previously.

Of course, this significant improvement is due to the fact that SQL Server now needs to read only 11 records from the indexed view instead of 18000 table records.

SIFT maintenance

A quick note on how SIFT values are updated and maintained. I already mentioned previously that all updates happen automatically on all data modifications in the source table (which is My Sales Table). Let's have a closer look at an example of such update.

The next screenshot shows a few entries in the indexed view based on My Sales Table.

The highlighted line displays the amount 44799 and a single source entry for this amount. Now I insert another entry with the same combination of Customer No., Item No., and Posting Date.

MySalesTable."Entry No." += 1;
MySalesTable."Customer No." := '662960EE-6CCD-41DC-B';
MySalesTable."Item No." := '23371360-BB9D-48F8-8';
MySalesTable."Posting Date" := 20251231D;
MySalesTable.Amount := 10000;
MySalesTable.Insert();

After the Insert, another Select with the same predicates returns two entries and the new amount 54799.

This way, SQL Server maintains the amount totals, shifting calculations from select to update queries to allow fast data retrieval.

SIFT and FlowFields

Although SIFT keys can be very efficient in CalcSums calculations, they are best suited as a supporting mechanism for FlowFields. Just like the CalcSums function, FlowField calculations can use available SIFT keys to fetch field data from a view instead of the table.

To illustrate a FlowField calculation backed by a SIFT, I declared a table which I called My Sales Stats with a FlowField defined as follows:

field(6; "Item Sales per Customer"; Decimal)
{
    FieldClass = FlowField;
    CalcFormula =
        sum("My Sales Table".Amount
        where("Customer No." = field("Customer Filter"),
              "Item No." = field("Item Filter"),
              "Posting Date" = field("Date Filter")));
}

FlowField definition is based on My Sales Table, same table that was used in previous examples. To change the test conditions a little, I calculate the sums for each day in a two-months period and measure the total time from the start of the loop till its end.

trigger OnAction()
var
    MySalesStats: Record MySalesStats;
    ReportDate: Date;
    StartTime: Time;
begin
    MySalesStats.SetRange("Customer No.", CustomerNo);
    MySalesStats.SetRange("Item No.", ItemNo);

    StartTime := Time();
    for ReportDate :=
        CalcDate('<-1M>', WorkDate()) to CalcDate('<+1M>', WorkDate()) do
    begin
        MySalesStats.SetRange("Posting Date", ReportDate);
        MySalesStats.CalcFields("Item Sales per Customer");
    end;

    Message(Format(Time() - StartTime) + ' ms');
end;

First, I run this code without SumIndexFields on the index.

Then reinstate the indexed view and run the same calculation again.

The difference is quite impressive - 18 ms with the SIFT view vs 3 seconds without it. Here, we can see an example of how inefficient a series of queries can be when each query is executed independently and the query optimizer tries to find the best execution plan for each query separately. I showed previously that the full scan of My Sales Table takes less than one second, and following calculations, once the table is in memory, are blazingly fast. SQL Server can estimate this performance gain and switch the execution plan to clustered index scan for a single query that reads a large amount of records, but does not have sufficient information for this estimate when each number is calculated in a separate query. SIFT is one way of improving this calculation - total run time with SIFT drops to impressive 18 ms. But this is not the only possible method, and it has its downsides too. I will leave details for the next post, though.

Summary

To sum up, the SumIndexFields property creates an indexed view based on the table where the key is defined. This view allows to shift aggregations from the read queries to updates. Every data modification statement (Insert / Modify / Delete) on the table will validate relevant views and update the sums as necessary. This way, AL data aggregation functions like CalcSums and CalcFields will not have to read the data table, but can get the pre-calculated values from the view, and this makes CalcSums and CalcFields based on SIFTs so fast.

But at the same time, we need to remember about the applicability limits of SIFTs. First of all, every field in the table filter or the FlowField definition that is supposed to benefit from a SIFT must match the key, otherwise the amount cannot be derived from the view.

To illustrate this statement, let's look at the following code:

MySalesTable.SetRange("Customer No.", CustomerNo);
MySalesTable.SetRange("Item No.", ItemNo);
MySalesTable.SetRange("Item Category Code", CategoryCode);
MySalesTable.SetRange("Posting Date", 20260203D, 20260213D);
MySalesTable.CalcSums(Amount);

This is a variation of the same calculation that I used in previous examples, but with one extra filter highlighted red in the snippet above. With this set of filters, the SIFT view cannot be used anymore, and Business Central will send a query selecting the sum from the table itself. Item Category Code field is not included in the key, and the view simply does not have necessary information to produce the required result. Therefore, SIFTs are not the best tool when the filters for aggregations are not strictly defined. If, for example, the set of filters for the CalcSums comes from a UI page and users can set any filters they want, there is a good chance that SIFT will never be used. And this is the main reason SumIndexFields work so well together with FlowFields where the possible set of filters is strictly limited by the field declaration.

Another thing to note is that SumIndexFields defined on the table's clustered index usually does not result in such an impressive performance boost. Seeks and aggregations on the clustered index are fast enough to calculate the required sums with acceptable performance even without a SIFT view, and the performance gain from such a view would be marginal in most cases, whilst the negative impact on the updates can be quite significant. It is generally a good practice to avoid unnecessary indexed views and keep the number of SIFT keys as low as practically possible because of the problems they cause in updates. But more on this in the next post.

4 Comments

artexs

Aug 06, 2024

Can you please compare this to a key with "IncludedFields=Amount"?

adrogin

Aug 07, 2024

Replying to

Thanks for the suggestion! This is exactly my idea. But this will probably be the third post of the series - I want to do a post on possible alternatives after the negative impact of SumIndexFields.

This is what I usually do when I find SIFTs that cause too much trouble on updates - I try to replace them with covering indexes with IncludedFields and run some tests to see the effect. Often, it is a good compromise. Although, it's not a 100% replacement for SIFT and sometimes performance of aggregations suffers a lot.

lohndorf

Great article, thanks for taking the time to document this in such a detailed way, I am already looking forward to the next one. I also like your way to run through dates:

for MyDate := Date1 to Date2 do begin

I did not know we could to a For-To loop like this, just based on a begin and an end date.

Thanks for your comment! I will do my best to publish it soon. I've been lazy with blog posts for a while, need to catch up. :-)

Dates in AL are very nicely cast to integers in arithmetic operations.