Hugo Alliaume

Doctrine, QueryBuilder::setMaxResults() and collections associations are on a boat

tip

This blog post is associated with some code that you can find here.

Today, we gonna talk about an issue with DBMS, Doctrine, QueryBuilder::setMaxResults() and collections associations. I've faced this issue while trying to optimize my code to reduce the number of SQL requests after identifying a N+1 query issue.

Some context...

I will try to simplify as much as possible, I can't leak code from my job, but imagine we have 2 entities:

  1. Pet, which contains 3 fields id, pictures (contains 0 or more Picture instances) and createdAt:
 1// src/Core/Entity/Pet.php
 2<?php
 3
 4namespace App\Core\Entity;
 5
 6use Doctrine\Common\Collections\ArrayCollection;
 7use Doctrine\Common\Collections\Collection;
 8use Doctrine\ORM\Mapping as ORM;
 9
10/**
11 * @ORM\Entity()
12 * @ORM\Table(name="adoption_pet")
13 */
14class Pet
15{
16    /**
17     * @ORM\Id
18     * @ORM\Column(type="integer")
19     * @ORM\GeneratedValue(strategy="AUTO")
20     */
21    protected ?int $id = null;
22
23    /**
24     * @ORM\OneToMany(targetEntity="Picture", mappedBy="pet", cascade={"all"}, orphanRemoval=true)
25     */
26    protected Collection $pictures;
27
28    /**
29     * @ORM\Column(type="datetime_immutable", name="created_at")
30     */
31    protected \DateTimeImmutable $createdAt;
32
33    public function __construct()
34    {
35        $this->pictures = new ArrayCollection();
36    }
37
38    /**
39     * @return int|null
40     */
41    public function getId(): ?int
42    {
43        return $this->id;
44    }
45
46    public function getPictures(): Collection
47    {
48        return $this->pictures;
49    }
50
51    public function addPicture(Picture $picture): void
52    {
53        if (!$this->pictures->contains($picture)) {
54            $this->pictures->add($picture);
55            $picture->setPet($this);
56        }
57    }
58
59    public function setCreatedAt(\DateTimeImmutable $createdAt): void
60    {
61        $this->createdAt = $createdAt;
62    }
63}
  1. Picture, which contains 2 fields id and pet (linked to 1 Pet):
 1// src/Core/Entity/Picture.php
 2<?php
 3
 4namespace App\Core\Entity;
 5
 6use Doctrine\ORM\Mapping as ORM;
 7
 8/**
 9 * @ORM\Entity()
10 * @ORM\Table(name="adoption_pet_picture")
11 */
12class Picture
13{
14    /**
15     * @ORM\Id
16     * @ORM\Column(type="integer")
17     * @ORM\GeneratedValue(strategy="AUTO")
18     */
19    protected ?int $id = null;
20
21    /**
22     * @ORM\ManyToOne(targetEntity="Pet", inversedBy="pictures")
23     * @ORM\JoinColumn(onDelete="CASCADE")
24     */
25    protected ?Pet $pet = null;
26
27    // ...
28
29    public function getId(): ?int
30    {
31        return $this->id;
32    }
33
34    public function setPet(?Pet $pet): void
35    {
36        $this->pet = $pet;
37    }
38
39    public function getPet(): ?Pet
40    {
41        return $this->pet;
42    }
43}

And we have a seeded database like this;

  • the 1st Pet has 4 Picture,
  • the 2nd Pet has 2 Picture,
  • the 3rd and last Pet has 0 Picture:
 1$entityManager = ...;
 2
 3$pet1 = new Pet();
 4$pet1->setCreatedAt(new \DateTimeImmutable('+1 min'));
 5$pet1->addPicture($picture1 = new Picture());
 6$pet1->addPicture($picture2 = new Picture());
 7$pet1->addPicture($picture3 = new Picture());
 8$pet1->addPicture($picture4 = new Picture());
 9$entityManager->persist($pet1);
10
11$pet2 = new Pet();
12$pet2->setCreatedAt(new \DateTimeImmutable('+2 min'));
13$pet2->addPicture($picture1 = new Picture());
14$pet2->addPicture($picture2 = new Picture());
15$entityManager->persist($pet2);
16
17$pet3 = new Pet();
18$pet3->setCreatedAt(new \DateTimeImmutable('+3 min'));
19$entityManager->persist($pet3);
20
21$entityManager->flush();

We are ready!

So what's the issue?

If we want to fetch the 1st created entity Pet from the database, we can do something like this:

 1$qb = $entityManager->createQueryBuilder()
 2    ->select('pet')
 3    ->from(Pet::class, 'pet')
 4    ->addOrderBy('pet.createdAt', 'asc')
 5    ->setMaxResults(1);
 6
 7$pet = $qb->getQuery()->getSingleResult();
 8
 9echo $pet->getId(); // outputs "1"
10echo count($pet->getPictures()); // outputs "4"

$pet is an instance of Pet, $pet->getPictures() returns 4 Picture, everything is fine... except we have a N+1 query issue when using $pet->getPictures(), because we didn't join the field pictures.

So let's join pictures and select it aswell:

 1$qb = $entityManager->createQueryBuilder()
 2    ->select('pet')
 3    ->addSelect('pictures')
 4    ->from(Pet::class, 'pet')
 5    ->leftJoin('pet.pictures', 'pictures')
 6    ->addOrderBy('pet.createdAt', 'asc')
 7    ->setMaxResults(1);
 8
 9$pet = $qb->getQuery()->getSingleResult();
10
11echo $pet->getId(); // outputs "1"
12echo count($pet->getPictures()); // outputs "1"... wait what?

$pet is still as instance of Pet, but $pet->getPictures() returns only 1 Picture instead of 4.

Why?

This is easily understandable if we take a look to the SQL query generated by Doctrine:

1SELECT
2       a0_.id AS id_0,
3       a0_.created_at AS created_at_1,
4       a1_.id AS id_2,
5       a1_.pet_id AS pet_id_3
6FROM adoption_pet a0_
7    LEFT JOIN adoption_pet_picture a1_ ON a0_.id = a1_.pet_id
8ORDER BY a0_.created_at ASC
9LIMIT 1

The issue comes from the LIMIT 1 clause. It allows us to fetch only 1 Pet, but is also half-initialize our Pictures:

id_0 created_at_1 id_2 pet_id_3
1 2022-01-08 09:31:48 1 1

This is documented on Doctrine website, but I wasn't aware at all about this behaviour:

"If your query contains a fetch-joined collection specifying the result limit methods are not working as you would expect. Set Max Results restricts the number of database result rows, however in the case of fetch-joined collections one root entity might appear in many rows, effectively hydrating less than the specified number of results."

First and Max Result Items (DQL Query Only):

How to avoid that?

The Doctrine's Paginator

Doctrine provides a class Paginator which allows to use ->setMaxResults() and collections associations:

 1$qb = $entityManager->createQueryBuilder()
 2    ->select('pet')
 3    ->addSelect('pictures')
 4    ->from(Pet::class, 'pet')
 5    ->leftJoin('pet.pictures', 'pictures')
 6    ->addOrderBy('pet.createdAt', 'asc')
 7    ->setMaxResults(1);
 8
 9$paginator = new \Doctrine\ORM\Tools\Pagination\Paginator($qb->getQuery(), $fetchJoinCollection = true);
10$pet = $paginator->getIterator()->current();
11
12echo $pet->getId(); // outputs "1"
13echo count($pet->getPictures()); // outputs "4"

Internally, the Paginator clone your Query and use it for two SQL requests:

  1. Fetch a or multiple identifier(s) (if you used ->setMaxResults(>1)):
1SELECT DISTINCT id_0
2FROM (SELECT DISTINCT id_0, created_at_1
3      FROM (SELECT a0_.id AS id_0, a0_.created_at AS created_at_1, a1_.id AS id_2, a1_.pet_id AS pet_id_3
4            FROM adoption_pet a0_
5                     LEFT JOIN adoption_pet_picture a1_ ON a0_.id = a1_.pet_id) dctrn_result_inner
6      ORDER BY created_at_1 ASC) dctrn_result
7LIMIT 1;

Which returns:

id_0
1
  1. And then, it uses this result to nicely fetch your entities with their collections associations:
1SELECT a0_.id AS id_0, a0_.created_at AS created_at_1, a1_.id AS id_2, a1_.pet_id AS pet_id_3
2FROM adoption_pet a0_
3         LEFT JOIN adoption_pet_picture a1_ ON a0_.id = a1_.pet_id
4WHERE a0_.id IN (?)
5ORDER BY a0_.created_at ASC

Which returns 4 rows (because our 1st Pet has 4 Picture):

id_0 created_at_1 id_2 pet_id_3
1 2022-01-08 09:31:48 1 1
1 2022-01-08 09:31:48 2 1
1 2022-01-08 09:31:48 3 1
1 2022-01-08 09:31:48 4 1

Keep the control

If you don't want to use the Doctrine's Paginator to keep the control (or if you don't find logical to use a paginator to fetch only 1 entity), then you can use 2 separate SQL queries:

 1$qb = $entityManager->createQueryBuilder()
 2    ->select('pet')
 3    ->addSelect('pictures')
 4    ->from(Pet::class, 'pet')
 5    ->leftJoin('pet.pictures', 'pictures')
 6    ->addOrderBy('pet.createdAt', 'asc');
 7    // ^ we removed the usage of `->setMaxResults(1)` here
 8
 9// Clone the previous QueryBuilder, we only select the identifier
10$qbId = (clone $qb)
11    ->select('pet.id')
12    ->setMaxResults(1); // select 1 result
13
14// Then we use the previously fetched identifier
15$pet = $qb
16    ->andWhere('pet.id = :id')
17    ->setParameter('id', $qbId->getQuery()->getSingleScalarResult())
18    ->getQuery()
19    ->getSingleResult();
20
21echo $pet->getId(); // outputs "1"
22echo count($pet->getPictures()); // outputs "4"

And if you need to fetch multiple results:

 1$qb = $entityManager->createQueryBuilder()
 2    ->select('pet')
 3    ->addSelect('pictures')
 4    ->from(Pet::class, 'pet')
 5    ->leftJoin('pet.pictures', 'pictures')
 6    ->addOrderBy('pet.createdAt', 'asc');
 7    // ^ we removed the usage of `->setMaxResults(1)` here
 8
 9// Clone the previous QueryBuilder, we only select the identifiers
10$qbId = (clone $qb)
11    ->select('pet.id')
12    ->setMaxResults(2); // select 2 results
13
14// Then we use the previously fetched identifiers
15$pets = $qb
16    ->andWhere('pet.id IN (:id)')
17    ->setParameter('id', array_column($qbId->getQuery()->getScalarResult(), 'id'));
18    ->getQuery()
19    ->getResult();
20
21echo $pets[0]->getId(); // outputs "1"
22echo count($pets[0]->getPictures()); // outputs "4"
23
24echo $pets[1]->getId(); // outputs "2"
25echo count($pets[1]->getPictures()); // outputs "2"

tip

Even if it looks like a good idea to keep the control, @Ocramius recommends to use the Doctrine's Paginator:

<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>