Doctrine, QueryBuilder::setMaxResults() and collections associations are on a boat
tip
This blog post is associated with some code that you can find here.
Today, we gonna talk about an issue with DBMS, Doctrine, QueryBuilder::setMaxResults()
and collections associations.
I've faced this issue while trying to optimize my code to reduce the number of SQL requests after identifying a N+1 query issue.
Some context...
I will try to simplify as much as possible, I can't leak code from my job, but imagine we have 2 entities:
-
Pet
, which contains 3 fieldsid
,pictures
(contains 0 or morePicture
instances) andcreatedAt
:
1<?php 2 3namespace App\Core\Entity; 4 5use Doctrine\Common\Collections\ArrayCollection; 6use Doctrine\Common\Collections\Collection; 7use Doctrine\ORM\Mapping as ORM; 8 9/** 10 * @ORM\Entity() 11 * @ORM\Table(name="adoption_pet") 12 */ 13class Pet 14{ 15 /** 16 * @ORM\Id 17 * @ORM\Column(type="integer") 18 * @ORM\GeneratedValue(strategy="AUTO") 19 */ 20 protected ?int $id = null; 21 22 /** 23 * @ORM\OneToMany(targetEntity="Picture", mappedBy="pet", cascade={"all"}, orphanRemoval=true) 24 */ 25 protected Collection $pictures; 26 27 /** 28 * @ORM\Column(type="datetime_immutable", name="created_at") 29 */ 30 protected \DateTimeImmutable $createdAt; 31 32 public function __construct() 33 { 34 $this->pictures = new ArrayCollection(); 35 } 36 37 /** 38 * @return int|null 39 */ 40 public function getId(): ?int 41 { 42 return $this->id; 43 } 44 45 public function getPictures(): Collection 46 { 47 return $this->pictures; 48 } 49 50 public function addPicture(Picture $picture): void 51 { 52 if (!$this->pictures->contains($picture)) { 53 $this->pictures->add($picture); 54 $picture->setPet($this); 55 } 56 } 57 58 public function setCreatedAt(\DateTimeImmutable $createdAt): void 59 { 60 $this->createdAt = $createdAt; 61 } 62}
-
Picture
, which contains 2 fieldsid
andpet
(linked to 1Pet
):
1<?php 2 3namespace App\Core\Entity; 4 5use Doctrine\ORM\Mapping as ORM; 6 7/** 8 * @ORM\Entity() 9 * @ORM\Table(name="adoption_pet_picture") 10 */ 11class Picture 12{ 13 /** 14 * @ORM\Id 15 * @ORM\Column(type="integer") 16 * @ORM\GeneratedValue(strategy="AUTO") 17 */ 18 protected ?int $id = null; 19 20 /** 21 * @ORM\ManyToOne(targetEntity="Pet", inversedBy="pictures") 22 * @ORM\JoinColumn(onDelete="CASCADE") 23 */ 24 protected ?Pet $pet = null; 25 26 // ... 27 28 public function getId(): ?int 29 { 30 return $this->id; 31 } 32 33 public function setPet(?Pet $pet): void 34 { 35 $this->pet = $pet; 36 } 37 38 public function getPet(): ?Pet 39 { 40 return $this->pet; 41 } 42}
And we have a seeded database like this;
- the 1st
Pet
has 4Picture
, - the 2nd
Pet
has 2Picture
, - the 3rd and last
Pet
has 0Picture
:
1$entityManager = ...; 2 3$pet1 = new Pet(); 4$pet1->setCreatedAt(new \DateTimeImmutable('+1 min')); 5$pet1->addPicture($picture1 = new Picture()); 6$pet1->addPicture($picture2 = new Picture()); 7$pet1->addPicture($picture3 = new Picture()); 8$pet1->addPicture($picture4 = new Picture()); 9$entityManager->persist($pet1); 10 11$pet2 = new Pet(); 12$pet2->setCreatedAt(new \DateTimeImmutable('+2 min')); 13$pet2->addPicture($picture1 = new Picture()); 14$pet2->addPicture($picture2 = new Picture()); 15$entityManager->persist($pet2); 16 17$pet3 = new Pet(); 18$pet3->setCreatedAt(new \DateTimeImmutable('+3 min')); 19$entityManager->persist($pet3); 20 21$entityManager->flush();
We are ready!
So what's the issue?
If we want to fetch the 1st created entity Pet
from the database, we can do something like this:
1$qb = $entityManager->createQueryBuilder() 2 ->select('pet') 3 ->from(Pet::class, 'pet') 4 ->addOrderBy('pet.createdAt', 'asc') 5 ->setMaxResults(1); 6 7$pet = $qb->getQuery()->getSingleResult(); 8 9echo $pet->getId(); // outputs "1" 10echo count($pet->getPictures()); // outputs "4"
$pet
is an instance of Pet
, $pet->getPictures()
returns 4 Picture
, everything is fine... except we have a N+1 query issue when using $pet->getPictures()
, because we didn't join the field pictures
.
So let's join pictures
and select it aswell:
1$qb = $entityManager->createQueryBuilder() 2 ->select('pet') 3 ->addSelect('pictures') 4 ->from(Pet::class, 'pet') 5 ->leftJoin('pet.pictures', 'pictures') 6 ->addOrderBy('pet.createdAt', 'asc') 7 ->setMaxResults(1); 8 9$pet = $qb->getQuery()->getSingleResult(); 10 11echo $pet->getId(); // outputs "1" 12echo count($pet->getPictures()); // outputs "1"... wait what?
$pet
is still as instance of Pet
, but $pet->getPictures()
returns only 1 Picture
instead of 4.
Why?
This is easily understandable if we take a look to the SQL query generated by Doctrine:
1SELECT 2 a0_.id AS id_0, 3 a0_.created_at AS created_at_1, 4 a1_.id AS id_2, 5 a1_.pet_id AS pet_id_3 6FROM adoption_pet a0_ 7 LEFT JOIN adoption_pet_picture a1_ ON a0_.id = a1_.pet_id 8ORDER BY a0_.created_at ASC 9LIMIT 1
The issue comes from the LIMIT 1
clause. It allows us to fetch only 1 Pet
, but is also half-initialize our Pictures
:
id_0 | created_at_1 | id_2 | pet_id_3 |
---|---|---|---|
1 | 2022-01-08 09:31:48 | 1 | 1 |
This is documented on Doctrine website, but I wasn't aware at all about this behaviour:
"If your query contains a fetch-joined collection specifying the result limit methods are not working as you would expect. Set Max Results restricts the number of database result rows, however in the case of fetch-joined collections one root entity might appear in many rows, effectively hydrating less than the specified number of results."
How to avoid that?
The Doctrine's Paginator
Doctrine provides a class Paginator
which allows to use ->setMaxResults()
and collections associations:
1$qb = $entityManager->createQueryBuilder() 2 ->select('pet') 3 ->addSelect('pictures') 4 ->from(Pet::class, 'pet') 5 ->leftJoin('pet.pictures', 'pictures') 6 ->addOrderBy('pet.createdAt', 'asc') 7 ->setMaxResults(1); 8 9$paginator = new \Doctrine\ORM\Tools\Pagination\Paginator($qb->getQuery(), $fetchJoinCollection = true); 10$pet = $paginator->getIterator()->current(); 11 12echo $pet->getId(); // outputs "1" 13echo count($pet->getPictures()); // outputs "4"
Internally, the Paginator
clone your Query
and use it for two SQL requests:
- Fetch a or multiple identifier(s) (if you used
->setMaxResults(>1)
):
1SELECT DISTINCT id_0 2FROM (SELECT DISTINCT id_0, created_at_1 3 FROM (SELECT a0_.id AS id_0, a0_.created_at AS created_at_1, a1_.id AS id_2, a1_.pet_id AS pet_id_3 4 FROM adoption_pet a0_ 5 LEFT JOIN adoption_pet_picture a1_ ON a0_.id = a1_.pet_id) dctrn_result_inner 6 ORDER BY created_at_1 ASC) dctrn_result 7LIMIT 1;
Which returns:
id_0 |
---|
1 |
- And then, it uses this result to nicely fetch your entities with their collections associations:
1SELECT a0_.id AS id_0, a0_.created_at AS created_at_1, a1_.id AS id_2, a1_.pet_id AS pet_id_3 2FROM adoption_pet a0_ 3 LEFT JOIN adoption_pet_picture a1_ ON a0_.id = a1_.pet_id 4WHERE a0_.id IN (?) 5ORDER BY a0_.created_at ASC
Which returns 4 rows (because our 1st Pet
has 4 Picture
):
id_0 | created_at_1 | id_2 | pet_id_3 |
---|---|---|---|
1 | 2022-01-08 09:31:48 | 1 | 1 |
1 | 2022-01-08 09:31:48 | 2 | 1 |
1 | 2022-01-08 09:31:48 | 3 | 1 |
1 | 2022-01-08 09:31:48 | 4 | 1 |
Keep the control
If you don't want to use the Doctrine's Paginator
to keep the control (or if you don't find logical to use a paginator to fetch only 1 entity), then you can use 2 separate SQL queries:
1$qb = $entityManager->createQueryBuilder() 2 ->select('pet') 3 ->addSelect('pictures') 4 ->from(Pet::class, 'pet') 5 ->leftJoin('pet.pictures', 'pictures') 6 ->addOrderBy('pet.createdAt', 'asc'); 7 // ^ we removed the usage of `->setMaxResults(1)` here 8 9// Clone the previous QueryBuilder, we only select the identifier 10$qbId = (clone $qb) 11 ->select('pet.id') 12 ->setMaxResults(1); // select 1 result 13 14// Then we use the previously fetched identifier 15$pet = $qb 16 ->andWhere('pet.id = :id') 17 ->setParameter('id', $qbId->getQuery()->getSingleScalarResult()) 18 ->getQuery() 19 ->getSingleResult(); 20 21echo $pet->getId(); // outputs "1" 22echo count($pet->getPictures()); // outputs "4"
And if you need to fetch multiple results:
1$qb = $entityManager->createQueryBuilder() 2 ->select('pet') 3 ->addSelect('pictures') 4 ->from(Pet::class, 'pet') 5 ->leftJoin('pet.pictures', 'pictures') 6 ->addOrderBy('pet.createdAt', 'asc'); 7 // ^ we removed the usage of `->setMaxResults(1)` here 8 9// Clone the previous QueryBuilder, we only select the identifiers 10$qbId = (clone $qb) 11 ->select('pet.id') 12 ->setMaxResults(2); // select 2 results 13 14// Then we use the previously fetched identifiers 15$pets = $qb 16 ->andWhere('pet.id IN (:id)') 17 ->setParameter('id', array_column($qbId->getQuery()->getScalarResult(), 'id')); 18 ->getQuery() 19 ->getResult(); 20 21echo $pets[0]->getId(); // outputs "1" 22echo count($pets[0]->getPictures()); // outputs "4" 23 24echo $pets[1]->getId(); // outputs "2" 25echo count($pets[1]->getPictures()); // outputs "2"
tip
Even if it looks like a good idea to keep the control, @Ocramius recommends to use the Doctrine's Paginator
:
What's important is that:
— @ocramius@mastodon.social 🇺🇦🍥 (@Ocramius) January 5, 2022
* if you do a fetch-join, **DO** use the paginator
* if you don't do a fetch-join, setMaxResults() should suffice