Не думайте, что одна секунда быстро для выполнения запроса

Я продолжаю сталкиваться с ситуациями, когда пользователи RDBMS думают, что одна секунда для выполнения запроса — что-то почти быстрое. Совсем недавно, в этом вопросе переполнения стека:

Предложение Hibernate SQL In делает загрузку процессора до 100%

Первоначальный вопрос автора заключался в том, почему аналогичный запрос выполняется за одну секунду при выполнении в SQL Server Management Studio, тогда как (по-видимому) тот же запрос выполняется за 60 секунд при выполнении из Hibernate. Запрос выглядит примерно так:

select Student_Id 
from Student_Table 
where Roll_No in ('A101','A102','A103',.....'A250');

Там может быть много причин для этого различия. Скорее всего, где-то спрятана проблема Hibernate N + 1 . Но самое яркое сообщение здесь:

Пожалуйста, не верьте, что 1 секунда — это быстро.

Базы данных — это невероятно быстрые и простые запросы, подобные приведенному выше, которые должны выполняться практически мгновенно, даже на посредственных серверах. Даже на вашем ноутбуке! Маркус Винанд сказал, что в 80% всех проблем с производительностью все, что вам нужно сделать, это добавить этот отсутствующий индекс . И это также здесь!

Исходная таблица автора содержит только два указателя:

CREATE TABLE student_table (
       Student_Id BIGINT NOT NULL IDENTITY
     , Class_Id BIGINT NOT NULL
     , Student_First_Name VARCHAR(100) NOT NULL
     , Student_Last_Name VARCHAR(100)
     , Roll_No VARCHAR(100) NOT NULL
     , PRIMARY KEY (Student_Id)
     , CONSTRAINT UK_StudentUnique_1 
         UNIQUE  (Class_Id, Roll_No)
);

Существует индекс для реализации PRIMARY KEY, и есть другой индекс для UNIQUEограничения, но оба индекса на самом деле не очень полезны, так как фильтры предикатов запроса включены Roll_No, что является только вторым столбцом UNIQUEограничения. При выполнении вышеупомянутого запроса для примерно 8 миллионов строк я получаю сканирование индекса по UNIQUEиндексу, и запрос выполняется через три секунды:

Эта операция «индексного сканирования» совсем не годится. Я на самом деле бегу через весь индекс, чтобы найти все применимые Roll_Noзначения в каждом узле листа индекса. Это хорошо объяснено на странице «Использование индекса» Люка о составных индексах.

Решение

But the good news is, SQL Server Management Studio gives you immediate tuning advice. Just right click on the execution plan and choose “Missing Index Details…” to get the following advice:

/*
Missing Index Details from SQLQuery1.sql -
    LUKAS-ENVY\SQLEXPRESS.test
The Query Processor estimates that implementing the 
    following index could improve the query cost 
    by 87.5035%.
*/

/*
USE [test]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index>]
ON [dbo].[student_table] ([Roll_No])
INCLUDE ([Student_Id])
GO
*/

This doesn’t necessarily mean that the above index is the optimal choice for all your queries, but the fact that you’re querying using a predicate on Roll_No should be a strong-enough indicator that you should have an index on at least this Roll_No column. The simplest possible index here is simply:

CREATE INDEX i_student_roll_no
ON student_table (Roll_No);

With that index in place, we’ll now get an “Index Seek” operation, which runs instantly:

Covering index

In this particular case, a “covering index” as suggested by Vlad Mihalcea in his answer might be appropriate. For instance:

CREATE INDEX i_student_roll_no
ON student_table (Roll_No, Student_Id);

The advantage of a covering index is that all the information needed for the query execution is already contained in the index. This is true in this particular case, but it can also be dangerous as:

The covering index needs more space, and if the table is already large, space might become an issue
The covering index only adds value as long as the query doesn’t also use additional columns (e.g. for projections, calculations, additional filtering, sorting, etc.). This is probably not the case in this simple example, which might change quickly in the near future

Thus, a covering index shouldn’t be your default choice in such cases. Better be conservative and add only those columns in the index, that add immediate value for filtering.

Conclusion

I’d like to repeat this time and again:

Do NOT think that one second is fast

In fact:

Do NOT think that anything beyond 2-3ms is fast!

Unless you’re doing heavy reporting or batch processing, where processing time obviously might take a bit longer, simple queries like these should never be anything slower than instant. And most of the time, you can achieve this speed by adding an index.

On a side-note

The poster of the aforementioned question obviously has other issues as well. None of the above explain the execution speed difference between Hibernate and “plain SQL”. But again, the message here is that if your “plain SQL” already takes more than one second, you have a very low-hanging fruit to fix.

Не думайте, что одна секунда быстро для выполнения запроса

Решение

Covering index

Conclusion

On a side-note

Категории

Последние статьи

Рефакторинг Hudson God Class

Альтернативы синтаксиса Java лямбда

Morphia и MongoDB: развивающиеся структуры документов

OpenShift Express: развертывание приложения Java EE (с поддержкой AS7)

Интеграция jqGrid, REST, AJAX и Spring MVC