According to this article on Microsoft Learn portal, non-clustered columnstore indexes (or NCCI) are seen as a possible replacement for Business Central's SumIndexFields (SIFT views) in the future. Previously, I wrote a post about SIFTs and their impact on concurrent inserts and updates. They indeed slow down the updates and increase the chances of lock timeouts and deadlocks in concurrent sessions. Another post in this series explored included fields in indexes as an alternative solution, and now I want to share my experience of applying columnstore indexes in Business Central with some pros and cons.
What a columnstore index is and what it is not.
I think I started understanding the concept of columnstore indexes better when I mentally dropped the word "index" in my mind. Do not think about them as pointers to table data, it is nothing like a B-Tree index. Yes, the clustered index in SQL is not a tree of pointers as well, but the table data itself. And similarly, the columnstore index is rather an alternative way to store table data designed for large data warehouse tables. "Large" in this context means really large, at a scale of billions of rows.
When a columnstore index is created, its columns are compressed and stored in LOB allocation units rather than data or index pages. Each column included in the index is stored separately, independently from other columns - hence the name "columnstore". This method allows the database engine to access each column separately and limit the disk I/O to read only columns included in the query.
And these are two of the three pillars that the high performance of columnstore indexes rests upon: data compression and selective column reads. The third one is segment elimination, which I will also touch upon below.
Storage and data access mechanisms for columnstore indexes are very much different from traditional rowstore data and optimized for scans and aggregations of large amounts of data for DWH reporting. Columnstore indexes are not designed as a replacement for rowstore data and B+ tree indexes, and usually perform worse in searches by arbitrary predicates.
If you wan to read more about columnstore indexes, there is a lot of information available online. An overview article on Microsoft Learn is probably the best place to start, and I will share my tests and the outcomes.
Test data
Data distribution is an important factor of query performance, and as I will show in a few introductory examples, it is even more crucial when a columnstore index is involved.
For the following tests, I will be using my favourite table "My Sales Table" with this structure.
field(1; "Entry No."; Integer) {}
field(2; "Customer No."; Code[20]) {}
field(3; "Item No."; Code[20]) {}
field(4; "Item Category Code"; Code[20]) {}
field(5; "Customer Group Code"; Code[20]) {}
field(6; "Posting Date"; Date) {}
field(7; Amount; Decimal) {}
"Entry No." here is the primary key, and other keys will be created as necessary for each test.
To generate the data resembling a natural distribution in an ERP system, I assign Posting Date values incrementally, whilst Customer No. and Item No. are selected randomly from dictionaries with 10 000 distinct values each. Item Category Code and Customer Group Code are also assigned randomly, although, strictly speaking these values must be correlated with Customer No. and Item No. respectively. But I don't use groups and categories in the tests, so will just leave these two fields random.
The table contains 5 000 000 records spanning 500 days, 10000 entries per day. During the data initialisation, the date is incremented after each 10000 records, so the clustered index is ordered on the date, as well as the Entry No.
Unlike my previous post where I was comparing performance of SIFT views vs indexes with included fields, distribution of entries per customer/item in this dataset is more sparce. On average, there are between 0.8 and 1.1 records per customer per day and around 900 records for each customer per day.
Columnstore indexes and parallel plans
I am not running my tests in a Docker container as I usually do. SQL Server Express shipped with Docker images does not support parallel query execution which is crucial for the performance of queries based on columnstore indexes, and this time I set up a test environment with a standalone SQL Server Developer edition to make use of parallel execution plans.
Before moving on to the test results, I will share a small example to demonstrate the impact of parallel execution on the query performance.
For the demo, I created the following index on my test table:
ColumnStoreIndex =
"Customer No.", "Item No.", "Item Category Code",
"Customer Group Code", "Posting Date", Amount;
And run a simple query to summarize amounts in a certain date range (no BC yet, running it in SSMS for now). To simulate my Docker environment, I force the query optimizer to use a single-threaded plan by setting the MAXDOP option to 1.
select sum(Amount)
from [dbo].[CRONUS$My Sales Table]
where [Posting Date] >= '2025-04-21'
and [Posting Date] <= '2025-06-01'
option (maxdop 1)
And this is the plan the it comes up with.
Rerunning the same query without limiting the degree of parallelism, and SQL Server suggests a different execution plan that completes in slightly lower time. Although the hash match operator and the regathering of the data from parallel streams take additional time, both the hash match and the batched index scan can make use of parallel threads.
The difference between 512 ms and 200 ms does not look too impressive but it becomes more notable if I filter the data on a range of items instead of dates. Same query as above, but with filtering on Item No. instead of Posting Date, takes 5.6 seconds:
When selecting rows filtered on the Item No., the database engine reads all 5000000 rows sending them all downstream to the filter operator, and all this reading and filtering takes over 5 seconds.
Running this query again without limiting the degree of parallelism yields a result very similar to the query with the date filter. Time is even a little lower for the item filter - probably because of a lower number of rows satisfying the predicate (although 18 milliseconds could actually be blamed on some background process suddenly waking up).
The secret of such big difference in running time between two filters (Posting Date and Item No.) is index ordering, a few words about this also later on.
The first conclusion drawn from this example is that running a query with a columnstore index on a SQL Express or otherwise limiting its degree of parallelism to a single thread is a really, really bad idea. Even MAXDOP = 2 can make a big difference compared to a single-threaded execution, all thanks to the batch mode execution optimized for parallel multithreaded processing.
The execution mode is reflected in the properties of the index scan operator. In batch mode the scan operation is split between multiple threads.
When the max degree of parallelism if limited to 1, index scan operator falls back to the single-threaded row mode.
Data ordering and segment elimination
Performance of the query depends on data ordering in the columnstore index to a great extent. If the table data is sorted on the values of the column in the search predicate, this can significantly reduce the query execution time. And when we speak about entry tables, usually we have this kind of natural sorting key, which is the entry posting date. There are always some outliers, backdated entries, but they never reach too far into the past. We do not post in closed periods covered with dust or far ahead into the misty future. And this ordering enables the optimisation technique called segment elimination.
I ran the previous queries with the columnstore index on My Sales Table, the way Business Central creates it, without any tweaks in the SQL Server. To show one little trick and the effect of sorting, I rebuild my columnstore index from SSMS, limiting the degree of parallelist to 1 to ensure proper ordering.
alter index [$NCCI] on [dbo].[CRONUS$My Sales Table]
rebuild with (maxdop = 1)
Similarly to the previous query example, MAXDOP = 1 will slow down the index rebuilding task. But in this case, a slow index rebuild will have a positive effect on the queries using this index.
Now I run the same query summarizing amounts in a date range, but with the freshly updated index.
And we can see how ordering of the index further reduces the time of the scan operator.
We can find the key difference if we look at the IO stats output. This are the stats without MAXDOP limit and index created from BC:
Table 'CRONUS$My Sales Table$382b9cf1-7c89-48eb-bdc6-be609426126a'. Segment reads 5, segment skipped 1.
And this is the IO stats output after rebuilding the index with MAXDOP = 1:
Table 'CRONUS$My Sales Table$382b9cf1-7c89-48eb-bdc6-be609426126a'. Segment reads 2, segment skipped 3.
In the first case, index rows are split into 6 segments, and the query reads 5 of those. In the second case, when the index is built with MAXDOP 1, rows are packed in 5 segments, and 3 of the segments are skipped during the query execution, so the query reads only 2 index segments.
This happens due to the natural data ordering in the demo data - posting date is increasing monotonically, so if the index preserves the same ordering as the table's clustered key, this ensures that each date value is found in one row group or two adjacent row groups, but unlikely to be present in any other segments. Sequential index build with MAXDOP = 1 preserves this ordering, which gives the query a good performance boost. Parallel index restructuring does not guarantee any particular order of index rows, therefore we find dates spread throughout multiple rowgroups, and the database engine has to scan more data segments.
And of course, this rebuilding has no effect on queries where filter is applied on other fields that do not have this inner ordering imposed by business rules.
If I run the same query with a filter on item or customer, I won't see any performance improvement from rebuilding the index, because these values will remain scattered throughout the index. But if values in column tend to be clustered together rather than spread uniformly, this can be a good performance benefit.
Anyway, this is just a theoretical detour to demonstrate the effect of ordering on the performance of columnstore indexes. Since in BC we cannot force specific order or the degree of parallelism on the index build task, this does not apply to BC indexes. In my following tests, I will be using indexes as they are created by Business Central without any optimisation techniques.
Test results
I'm finally done with the necessary introductions which took a good half of the post, and now ready to present the test results.
Test 1: Group by customers with increasing date range
The first test summarizes amounts for 100 customers over increasing time periods using different methods:
Calling CalcSums for each customer in a loop;
A single GROUP BY query (an AL query object) with a filter on the same 100 customers.
The same tests are repeated 3 times with different indexes:
Rowstore index with included fields
SIFT view
Columnstore index
In all cases the results in the series are filtered on the posting date spanning different date ranges, starting from 10 days and adding 10 days at each step. In terms of the number of records, this means 100 000 records falling within the first range, increasing this value to 1 000 000 at the last step. The number of entries per customer is random ranging between 0.8 and 1.1 record per customer per day on average, and each step adds around 800 to 1100 records to the set of the aggregated rows.
The first graph shows the time in milliseconds it takes to calculate amounts grouped by customers, while increasing the included date range. There are 100 customer records, so all queries yield 100 output records, but the number of records included in the aggregation is increased at each step.
And the first thing we see is that calling CalcFields in a loop on a table with a columnstore index is by far the worst way of calculating the sums. Remember that each CalcSums triggers a scan of the columnstore index (or its segment at least). It is fast, much faster than scanning a traditional rowstore index, but still not as fast as a B-tree search.
Just to see all the other lines at a comprehensible scale, the next graph shows the same numbers without the Columnstore + CalcSums line.
For this type of data distribution, SIFT and the covering index with included fields perform at the same level, the "group by" query being slightly faster than 100 separate CalcFields calls.
And the query on the columnstore index sits very close to CalcSums. Since the query requires only a single scan to calculate all values, this brings the measurement closer to other methods, closing the gap.
Test 2: Group by customers and dates
Here I reduce the date range, adding only one day at each test step, but aggregate amounts on two fields instead of just one: Customer No. and Posting Date. With the same 100 customers included in the test, the first iteration yields 100 groups, and the last one brings this number up to 1000 (100 customers by 10 days).
Once again, CalcSums on the columnstore index did not earn a good mark. 1000 invocation of CalcSums in the last iteration means 1000 scans for the columnstore.
When the line depicting the slowest calculation is removed from the graph, we can see the rest of the picture.
With this sparse data distribution (one entry per customer per day and only a handful of records to aggregate in total) SIFT cannot use its superpower and push the aggregations to the update side. It's still just half a second per thousand calls, but the query object is almost insensitive to the number of of records. Number of records is still too low to make a difference, quite the opposite of the results in my previous post where the amount of data to sum was much higher.
Test 3: SetAutoCalcFields
Test number 3 is a variation of Test 1 but with SetAutoCalcFields before the loop. So far, I did not use best BC performance practices and called CalcSums inside the loop. But if I declare a FlowField calculated from My Sales Table, I would be able to use the SetAutoCalcFields function to join the aggregation with the selection in a single query. Let's see how a "Smart SQL" query improves performance.
Similarly to Test 1, here I filter records on the Posting Date field, increasing the range from 10 to 100 days and calculate sums for 100 customers.
It is indeed noticeably faster for both included fields and SIFT and performs similar to a query object. But hey! Where is the line for the columnstore index?
This time the line was so far away that I did not even try to show it on the same graph. This line would be at around 64727 milliseconds.
Yes, FindSet with AutoCalcFields and a columnstore index took over one minute.
Let's have a look at the query behind the autocalculated FlowFields and its execution plan to see how one good practice can sometimes turn into a performance disaster.
First, the query. When SetAutoCalcFields is called before FindSet, Business Central runtime modifies the SELECT statement issued by FindSet, attaching a subquery with the OUTER APPLY statement. This subquery is still executed on each row from the left table expression but unlike separate CalcFields called on each row, the query optimizer has all the information it needs to find the best execution plan to calculate all required values.
This is what the query looks like. I removed a part of it to make it more readable.
SELECT
...
ISNULL("SUB$Amount"."Amount$My Sales Table$SUM$Amount",0.0) AS "Amount"
FROM "CRONUS$MyDimTable$" AS "MyDimTable"
OUTER APPLY (
SELECT TOP (1) SUM("Amount$My Sales Table"."Amount")
FROM "CRONUS$My Sales Table$" AS "Amount$My Sales Table"
WHERE (
"Amount$My Sales Table"."Customer No_"=
"MyDimTable"."Dim Value"
AND "Amount$My Sales Table"."Posting Date"=@3))
AS "SUB$Amount"
WHERE (
"MyDimTable"."Dim Code"=@0 AND
"MyDimTable"."Dim Value">=@1 AND "MyDimTable"."Dim Value"<=@2
...
And its execution plan. Click on the image to see the details. The plan is pretty long and does not fit into the post format.
Here the optimizer preferred to build a temporary index to calculate aggregations from this new index, and inserted the index spool operator in the plan. Index spooling consumed 98% of the query time, while 2% is spent on the columnstore scan and all other operations are completed in nearly zero time.
Cardinality estimate is off the mark for this query, no matter what kind of index we use, but the rowstore index happens to be more resilient to this kind of estimation error. With a rowstore index, query optimizer still suggests an index seek for the same cardinality estimate.
In case of a columnstore index, this number is sufficient to tip the plan towards eager index spooling.
I always said that using Business Central's Smart SQL with AutoCalcFields is a practice more efficient than running CalcFields in a loop, and this is true if we use rowstore indexes. But be especially careful when running AutoCalcFields with a columnstore index, and yet twice more cautious when applying range filters on data whose distribution is non-uniform.
And now finally the scenario where the columnstore index rules its dominion, unreachable to any competition. A kind of workload it was designed for - massive index scans.
Here I'm running another query that calculates amount grouped by the posting date only. My test data spans 500 days with 10000 entries per day, so that each step in the chart adds 50 days and 500 000 records. First point is the time it takes to aggregate 500 000 rows, the second - 1 000 000, and so on up to the full table load of 5 000 000 rows.
Probably I should add a caveat here. I'm not comparing read performance with SumIndexFields because in this scenario aggregated values in SIFTs would be calculated during inserts, and selects from a view would still be faster that all other methods. In order to properly compare SumIndexFields with a columnstore index, I would need to capture the time for the whole update/select cycle.
I also run the test for the included fields with two different indexes: first one has only the Posting Date field indexed and the Amount as the included field:
key(PostingDate; "Posting Date")
{
SumIndexFields = Amount;
}
Unlike previous examples, I use a different index because it better suites the modified query (the red line on the graph): there is no filtering or grouping on customers, therefore this field is not required for the query.
The yellow line shows a similar index, but with two more fields:
key(DateCustomerItem; "Posting Date", "Customer No.", "Item No.")
{
SumIndexFields = Amount;
}
These two additional fields are irrelevant for the query, but they reduce the selectivity of the index and increase its size. And as the test shows, they make a noticeable impact on heavy scans increasing the total time by circa 1.5 seconds when the query aggregates data on all 500 days.
As a conclusion, I can say that in my experience, columnstore indexes can be a real game changer for certain types of tasks, but at the same time, they are not a direct replacement for SIFT. In some cases removing a SIFT and plugging in a columnstore index can ruin performance of the app.
Columnstore indexes shine in a kind of task where large amounts of data must be aggregated and the query requires an index scan rather than a seek. The real benefit of a columnstore index is that is makes scanning a large table really fast. But when a task requires extracting a few specific values, a B-tree index is still better fit for the purpose.
Similarly, when a task requires fast calculations on a small subset of data, especially repeated many times (for example, calculating a customer's balance or the remaining item stock during transaction posting), columnstore is not an alternative to SIFT.
On the other hand, matrix pages presenting aggregated data for a large list of entities in a single view can greatly benefit from it. But also not as a direct replacement. This kind of overview and summary pages usually rely on SIFT views with lots of CalcSums or FlowFields calculating each individual value or running AutoCalcFields to aggregate all values in subqueries. Separate calculations of each aggregated value are much slower when a columnstore index is at play. Best performance can be achieved with a query object that can calculate aggregattion from a large table with a single index scan.
Comments