The N + 1 Query Problem

https://secure.phabricator.com/book/phabcontrib/article/n_plus_one/

What is N+1 Query Problem

The N+1 query problem is a common performance anti-pattern. It looks like this:

1
2
3
4
5
6
7
Set<division> divisions = getAllDivisions();
divisions.foreach (
d -> {
Set<Employee> employees = getEmployeesByDivision(d)Ï
// ...
}
);

Assuming getAllDivisions() has an implementation that boils down to:

1
SELECT * FROM divisions;

and getEmployeesByDivision(d) has an implementation something like this:

1
SELECT * FROM employees WHERE divisionId = d.id;

You will issue “N+1” queries when the code executes, where N is the number of divisions:

1
2
3
4
5
6
7
SELECT * FROM divisions;
SELECT * FROM employees WHERE divisionID = 1
SELECT * FROM employees WHERE divisionID = 2
SELECT * FROM employees WHERE divisionID = 3
SELECT * FROM employees WHERE divisionID = 4
SELECT * FROM employees WHERE divisionID = 5
...

The problem with this is that each query has quite a bit of overhead. It is much faster to issue 1 query which returns 100 results than to issue 100 queries which each return 1 result. This is particularly true if your database is on a different machine which is, say, 1-2ms away on the network. In this case, issuing 100 queries serially has a minimum cost of 100-200ms, even if they can be satisfied instantly by MySQL. This is far higher than the entire server-side generation cost for most Phabricator pages should be.

Solution – Eager Loading

We solve this problem by loading all your data before iterating through it. Note that reading from memory is much faster than reading from database (mainly disk).

1
2
3
4
5
6
7
8
Set<Division> divisions = getAllDivisions();
Set<Employee> employees = getAllEmployees();

divisions.forEach(
d -> {
Set<employee> divisionEmployees = getEmployeesByDivision(employees, d.getId());
}
);

That is, issue these queries:

1
2
SELECT * FROM divisions;
SELECT * FROM employees WHERE divisionID IN (1, 2, 3, 4, 5, ...)

In this case, the total number of queries issued is always 2, no matter how many objects there are. You’ve removed the “N” part from the page’s query plan, and are no longer paying the overhead of issuing hundreds of extra queries. This will perform much better (although, as with all performance changes, you should verify this claim by measuring it).

But loading all data from database into memory also deserves a careful consideration. Is the size of all data going to fit into your memory? If you have a 8 GB memory, perhaps 1 or 2 GB are the maximum you can spare for in-memory data set.

Detecting the Problem

Beyond reasoning about it while figuring out how to load the data you need, the easiest way to detect this issue is to check the “Services” tab in DarkConsole (see Using DarkConsole), which lists all the service calls made on a page. If you see a bunch of similar queries, this often indicates an N+1 query issue (or a similar kind of query batching problem). Restructuring code so you can run a single query to fetch all the data at once will always improve the performance of the page.