Following my series of posts where I have been addressing various aspects of table indexing, this post delves into the same old argument from a different point of view. Previously, I measured timings of select queries and explored some concurrency problems arising from indexed views, and now I want to ask a question: how indexing impacts query performance from the insert/update side.
To refresh the context of the post, you can refer to my previous posts on the related topics.
And the code samples for this post are in my GitHub repository.
Indexes must be updated during data modifications, and this operation requires time, thus slowing down updates and inserts. We know that each new index on a table drags the update time up, but how intense is this effect? This is the question I am trying to answer. The focus of all tests and measurements below is pure insert time in a table with varying number of indexes - concurrency and contention aside, just inserts in a single session.
In all the following tests, I use a very simple "entry-like" table with an Integer field for the primary key, one decimal field, and ten code fields.
fields
{
field(1; "Entry No."; Integer) {}
field(2; Amount; Decimal) {}
field(3; "Dimension Value 1"; Code[20]) {}
field(4; "Dimension Value 2"; Code[20]) {}
field(5; "Dimension Value 3"; Code[20]) {}
field(6; "Dimension Value 4"; Code[20]) {}
field(7; "Dimension Value 5"; Code[20]) {}
field(8; "Dimension Value 6"; Code[20]) {}
field(9; "Dimension Value 7"; Code[20]) {}
field(10; "Dimension Value 8"; Code[20]) {}
field(11; "Dimension Value 9"; Code[20]) {}
field(12; "Dimension Value 10"; Code[20]) {}
}
The primary key is on the Entry No. field, and for each test run, I keep adding one key on one of the Code fields:
key(Dim1; "Dimension Value 1") { }
key(Dim2; "Dimension Value 2") { }
...
So my first series of inserts runs on a table with the clustered key only, the next one adds a single key, and in the last run, SQL Server has to maintain ten additional non-clustered indexes.
Each test run is actually a series of ten tests, every one of them measuring the time it takes to insert the given number of records: from 1000 to 10000.
In the end, each test run forms one line in the chart below, and each line represents a series with a certain number of indexes created on the table - from 0 to 10.
As a result, I have this bunch of lines starting very close to each other at one thousand rows and gradually diverging as the number of rows grows.
The effect of each individual index on the insert is hardly noticeable - every index adds circa 2.5 - 2.7% to the insert time, and this difference can be easily obscured by irregularities of the testing conditions. But the effect multiplies as the number of indexes increases, so inserts with ten indexes are 27% slower compared to inserts without indexes.
This was the timing for a set of simple indexes, without any complications. Now what happens if I add an included column to each index? Included columns (or IncludedFields, as this property is called in Business Central) consume additional space in index pages bloating the index size. Will it impact the insert time as well? Let's run another series of measurements.
In this chart, I did not include the baseline test with no indexes, since it would be a copy of the line from the previous picture, so this set starts from one index. And the results are practically the same: 2.7% of extra time per index on average.
In fact, the difference between two adjacent lines in the first chart averaged on 2.5%, whilst the second set results in average 2.7%, and the total increase from 0 to 10 indexes changed from 25.4% to 29.2%. These numbers could indicate that addition of more included columns has an impact on the update time, and probably ten more fields would reveal a more obvious trend. But so far, we can see that one included field in the index does not have any apparent effect.
Now, how will this dependency change when indexes are replaced with SIFT keys? Here is another test.
First thing we can see looking at this chart is that the lines for different numbers of keys are spread noticeably wider. Each new SIFT key on a table adds approximately 6-7% to the insert time, and an insert in a table with 10 keys is almost twice slower compared to a table with a single key (and 128% slower than the same insert in a table without any indexes).
Next set of lines shows a series of inserts with a columnstore index. This test is not quite aligned with all the rest. Since it is not possible to define more than one columnstore index on a table, I could not follow the same setup and add another index for each test series. Therefore, in this test, each series adds one field to the index, starting from two fields: Dimension Code 1 and Amount, and up to all 10 dimension fields, plus Amount.
ColumnStoreIndex =
"Dimension Value 1", "Dimension Value 2", "Dimension Value 3",
"Dimension Value 4", "Dimension Value 5", "Dimension Value 6",
"Dimension Value 7", "Dimension Value 8", "Dimension Value 9",
"Dimension Value 10",
Amount;
And what we can see from this chart, is that there is no strong correlation between the number of columns in the index and the insert time: all 10 series are closely grouped around one line.
Whilst the lower left side of the chart is packed closer compared to the top-right, and there are some outliers and irregularities in the middle, these anomalies do not exhibit any trend of increasing the time depending on the number of index columns.
The next chart represents the same dataset from a different angle: it shows the dependency of the insert time from the number of indexes when inserting 10000 rows. This perspective can help in understanding the impact of different types of indexes on the same insert operation.
There are a few things we this chart confirms (most of these conclusions were drawn from previous graphics, so I will just reiterate here):
For rowstore indexes, insert time grows by approximately 2-3% with every new index added;
Adding one included field to the index does not impose any penalty on the insert time
SIFT keys have the highest impact on inserts, and the growth is much steeper, with 10 keys almost doubling the insert time compared to one key.
Adding more fields to the columnstore index did not change the time at all.
And finally, this chart demonstrates the ratio between insert times for a table with one columnstore index and a table with multiple rowstore indexes. One columnstore index is roughly equivalent to 4-5 rowstore indexes in terms of the time it requires for the update.
And the last test with its own chart to summarize everything said above. Here, I am measuring time while inserting 100 000 records with some selected sets of indexes, and the chart ranks the timing from the fastest version to the slowest one.