Table Scans and Index Scans affects more than the table they access

SQL Server only queries data in memory (data cache). If the data needed is not cached, SQL Server will retrieve the data from disk and load it to data cache, and then SQL Server will use the data from the cache.

I have a general guide line that Table Scans and Index Scans are bad. This may not be an issue for small tables, but for large tables scans can cause significant performance issues. For example, if a query accesses a table that is 20 GB in size and a scan occurs, then there is a good chance that all data for that entity will be loaded in memory. If this data is not in memory, then SQL must fetch the data from disk and load it into memory. Fetching data from disk is usually an expensive IO process. If there is not enough available space in the data cache, SQL Server will remove (flush) data from the cache to make room for data that was retrieved from disk.

Data that is used often and cached can be removed from the cache due to poor queries or the lack of an index. Here’s a contrived example. We have 2 tables. The 1st table is Orders and it contains 10 million records and requires 3 GB of disk space. The Orders table is extremely important and is used in most queries and queries that access this table must return very quickly. The 2nd table TaskLog; contains 200 million records and requires 7 GB of disk space. For simplicity, neither table has any non-cluster indexes.

Let’s presume that the server has 8 GB memory. If all queries are executed on the Orders table, eventually most of the data from the Orders table would be in the data cache. There would be little need for SQL to access the disk. Queries would execute fairly fast.

Now, UserA queries the TaskLog table. The query gets counts of TaskType(see example query below). When the user executes this query a table scan is used. Since the data is not in memory, SQL Server will transfer the data from disk to memory. The problem is that there is not enough memory to contain both the Orders and TaskLog table. Since there’s not enough memory SQL Server will flush Orders data from memory and replace it with data with the TaskLog data.

Now the issue is that any queries that need to access Ordres will be retrieve from disk. This will incur a penalty in performance.

There are many options to solve this problem; indexes could be created on both the Orders and TaskLog table, more memory could be added, and there are probably other options.

But how do you identify if memory allocation is a problem. Below is a query that retrieves space used by all Cluster Indexes and Non-Cluster Indexes. It will show the size of the entity on disk and how much of the entity is in memory.

Here sample result from the query (names have been changed to protect the innocent)

Table Name Index Name Index MB Buffer MB Index In Buffer Percent
Table1 PK_Table1 211875 20586 10%
Table2 PK_Table2 3711 3348 90%
Table3 PK_Table3 27689 2246 8%
Table4 IX_Table4_A 52181 1675 3%
Table5 PK_Table5 278409 1436 1%
Table4 IX_Table4_B 28585 1418 5%
Table2 IX_Table2_A 725 745 103%
Table6 PK_Table6 572 572 100%
Table3 IX_Table3_A 15701 493 3%
Table3 IX_Table3_B 17756 467 3%
Table7 PK_Table7 461 461 100%

Table2 is equivalent to our Orders table in the example. It’s very important that results from this table are returned fairly fast. As we can see 90% of data for PK_Table2 is stored in memory; this is good.

PK_Table1 is 211 GB and 20 GB are in memory. For this example speed in retrieving data from this table isn’t that important and 20GB in memory seems too much. This could be an indication that a scan is being used to access this data, or someone is running a query that they shouldn’t. This provides some good information to further my investigation.

Having one bad query can affect not just the performance of 1 table but the performance of the system as a whole.

Leave a Reply

Your email address will not be published. Required fields are marked *