Generate PDFs on Amazon AWS with PHP and Puppeteer
EDIT: 21st april 2020
This article was initially written by comparing 3 solutions and described solution #1.
Since the 21st april 2020, a new solution was added, and it's definitely the best solution, see solution #4.
Some context
Those last months at work, for a new big functionality in our CMS, we had to think to « how to generate a lot of PDFs (~1000 and more in the future) in a really short amount of time? ». Our servers are great, but they weren't powerful enough and scalable to generate a lot of PDFs without slowing performances, that's why we go for Amazon AWS by using Amazon Simple Queue Service and Amazon Lambda.
I assume you have some knowledge about AWS SQS/Lambda and the Symfony Messenger Component before reading this article. More info on Symfony Messenger on AWS Lambda
This is the plan:
- our CMS (Symfony) generates and send a message to the SQS queue. Thanks to the Messenger component, happyr/message-serializer, sroze/messenger-enqueue-transport and enqueue/sqs
- the SQS queue receives messages and pass them to the lambda
- our lambda consumes the message, generates a PDF and save it on Scaleway (Amazon S3 like, but cheaper and easier to use)
The lambda
To handle the message from the queue, the lambda will have to run PHP and Symfony because the actual Messenger component only supports Symfony apps (read and vote for RFC Improve Messenger to support other app consuming/recieving the message).
We will use Bref to run PHP on our lambda. Bref is a Serverless plugin, and Serverless is a framework to build and operate serverless applications. Here is a simplified version of our Serverless configuration file:
1# serverless.yml 2service: app 3 4provider: 5 name: aws 6 runtime: provided 7 region: eu-west-2 8 stage: ${opt:stage,'dev'} # we had two stages "dev" (default) and "prod" 9 environment: 10 APP_ENV: ${self:provider.stage} 11 12plugins: 13 # Include Bref plugin 14 - ./vendor/bref/bref 15 16package: 17 exclude: 18 # Excluding those files/directories will reduce deploy time and lambda size a lot 19 - bin/.phpunit/** 20 - vendor/bin/.phpunit/** 21 - var/log/** 22 - var/storage/** 23 - var/cache/** 24 - "!var/cache/${opt:stage,'dev'}/**" # include cache of targeted stage 25 - var/cache/*/profiler/** 26 27functions: 28 generate_pdf: 29 handler: bin/consume-generate-pdf 30 reservedConcurrency: 50 # 50 lambda invocations at the same time 31 timeout: 60 32 layers: 33 # Use the Bref layer, see https://bref.sh/docs/runtimes 34 - 'arn:aws:lambda:us-west-1:209497400698:layer:php-74:1' 35 events: 36 - sqs: 37 arn: <arn SQS> 38 # We tell Amazon SQS to send only 1 message from the queue to the function, 39 # otherwise if we send more than 1 message and one of them fails, then ALL messages are put again in the queue. 40 batchSize: 1
How to generate a PDF?
We didn't want to use wkhtmltopdf/KnpLabs/KnpSnappyBundle, because we had enough issues in the past to install and use it (missing shared Linux libraries, crash when SSL errors, the render is not predictable and can be different of what Chrome renders ...).
Instead, we thought about using Puppeteer and Browsershot. Puppeteer is a Node.js library which profides an API to control Chrome, and Browsershot is a nice PHP wrapper around Puppeteer.
1$ yarn add puppeteer 2$ composer require spatie/browsershot
But Puppeteer won't work because the lambda doesn't have Node.js support yet. To fix this, we used a layer provided by lambci/node-custom-lambda:
1# ... 2 3functions: 4 generate_pdf: 5 handler: bin/consume-generate-pdf 6 # ... 7 layers: 8 - 'arn:aws:lambda:<region>:553035198032:layer:nodejs12:21' 9 # Use the Bref layer, see https://bref.sh/docs/runtimes 10 - 'arn:aws:lambda:us-west-1:209497400698:layer:php-74:1' 11 # ...
Then run serverless deploy
and... uh? the lambda size is too big?
Yup, it's too big because of the Chrome binary that has been downloaded when installing puppeteer:
1➜ puppeteer-deps l node_modules/puppeteer/.local-chromium/linux-706915/chrome-linux 2total 279M 3drwxr-xr-x 7 kocal kocal 4,0K janv. 2 10:09 . 4drwxr-xr-x 3 kocal kocal 4,0K janv. 2 10:09 .. 5-rwxr-xr-x 1 kocal kocal 229M janv. 2 10:09 chrome 6-rw-r--r-- 1 kocal kocal 1,2M janv. 2 10:09 chrome_100_percent.pak 7-rw-r--r-- 1 kocal kocal 1,5M janv. 2 10:09 chrome_200_percent.pak 8-rwxr-xr-x 1 kocal kocal 326K janv. 2 10:09 chrome_sandbox 9-rwxr-xr-x 1 kocal kocal 5,0K janv. 2 10:09 chrome-wrapper 10drwxr-xr-x 3 kocal kocal 4,0K janv. 2 10:09 ClearKeyCdm 11-rwxr-xr-x 1 kocal kocal 1,5M janv. 2 10:09 crashpad_handler 12-rw-r--r-- 1 kocal kocal 10M janv. 2 10:09 icudtl.dat 13-rwxr-xr-x 1 kocal kocal 345K janv. 2 10:09 libEGL.so 14-rwxr-xr-x 1 kocal kocal 12M janv. 2 10:09 libGLESv2.so 15drwxr-xr-x 2 kocal kocal 4,0K janv. 2 10:09 locales 16drwxr-xr-x 2 kocal kocal 4,0K janv. 2 10:09 MEIPreload 17-rwxr-xr-x 1 kocal kocal 4,3M janv. 2 10:09 nacl_helper 18-rwxr-xr-x 1 kocal kocal 9,5K janv. 2 10:09 nacl_helper_bootstrap 19-rwxr-xr-x 1 kocal kocal 3,7M janv. 2 10:09 nacl_helper_nonsfi 20-rwxr-xr-x 1 kocal kocal 3,7M janv. 2 10:09 nacl_irt_x86_64.nexe 21-rw-r--r-- 1 kocal kocal 1 janv. 2 10:09 natives_blob.bin 22-rw-r--r-- 1 kocal kocal 2,5K janv. 2 10:09 product_logo_48.png 23drwxr-xr-x 3 kocal kocal 4,0K janv. 2 10:09 resources 24-rw-r--r-- 1 kocal kocal 12M janv. 2 10:09 resources.pak 25drwxr-xr-x 2 kocal kocal 4,0K janv. 2 10:09 swiftshader 26-rw-r--r-- 1 kocal kocal 619K janv. 2 10:09 v8_context_snapshot.bin 27-rwxr-xr-x 1 kocal kocal 37K janv. 2 10:09 xdg-mime 28-rwxr-xr-x 1 kocal kocal 33K janv. 2 10:09 xdg-settings 29➜ puppeteer-deps
On AWS Lambda limits page, the deployment package size is:
- 50 MB (zipped)
- 250 MB (unzipped)
But when we zip the Chrome binary and its libraries, the size is about 100 MB and so it fails:
1➜ puppeteer-deps l node_modules/puppeteer/.local-chromium/linux-706915 2total 106M 3drwxr-xr-x 3 kocal kocal 4,0K janv. 2 10:13 . 4drwxr-xr-x 3 kocal kocal 4,0K janv. 2 10:09 .. 5drwxr-xr-x 7 kocal kocal 4,0K janv. 2 10:09 chrome-linux 6-rw-r--r-- 1 kocal kocal 106M janv. 2 10:14 chrome-linux.zip
What can we do?
Use a Brotli-fied Chrome
During all my research to make Chrome runnable on AWS Lambda, I've found chrome-aws-lambda, a Node.js package that:
- ship a Brotli-fied Chrome (~ 36MB) which can run on AWS Lambda (see
bin/
directory) - provide a small wrapper around Puppeteer which uncompress Chrome on-the-fly
Okay great, we have a Chrome that can by used on AWS Lambda, but now we are facing many solutions.
Solution #1
Download the brotlified Chrome, commit it in our project, and write some PHP to uncompress Chrome at runtime.
Pros:
- Fatest solution
- We have a total control over Chrome binaries
Cons:
- Chrome updates should be applied manually
Solution #2
(I've thought about this solution when writing this article, not when working on the lambda 3/4 months ago.)
Install the package chrome-aws-lambda
and write some PHP to uncompress Chrome at runtime.
Pros:
- Chrome updates are automatically applied
Cons:
- The binaries are hidden by
chrome-aws-lambda
, it means that you can't rely on them without using the provided wrapper. Between v1.20.1 and v1.20.2 thebin/
directory structure has been modified and shared libraries are archived withtar
. If we had installedchrome-aws-lambda
without a fixed version constraint (eg.:1.20.1
), then the PDFs generation might have fails and it would have been really critical for us.
Solution #3
Fork chrome-aws-lambda
, write a PHP wrapper, and open a pull request.
Pros:
- The PHP wrapper would have been available for more users
Cons:
- Time to wait before potential merging? We had a deadline for our new big functionality
- Maybe the PR could have been refused
- Two wrappers to maintain and test
EDIT 21/04/2020: Solution #4
I've found a better solution by using chrome-aws-lambda
in a bridge.
Pros:
- No manual updates
- No need to handle Chrome binaries uncompressing ourself
Cons:
- I didn't find anyone yet
Please read article Generate PDFs on Amazon AWS with PHP and Puppeteer: The Best Way to know more about.
Use Chrome, Browsershot and Puppeteer on Amazon AWS
We used the Solution #1 for the stability and lake of time.
Don't deploy Puppeteer's Chrome binary
Since Chrome binary from puppeteer
package is to large, we can replace it by puppeteer-core
(same puppeteer-core
but without Chrome binary), but Browsershot is only compatible with puppeteer
.
A solution is to configure Serverless to exclude Puppeteer's Chrome binary folder like this:
1#... 2 3package: 4 exclude: 5 # ... 6 - node_modules/puppeteer/.local-chromium/** # we will ship a brotli-compressed Chrome binary 7 8#...
Download brotlified Chrome binary
When working on the lambda, the latest version of chrome-aws-lambda
was 1.20.1 (see binary files).
We have created a directory chromium
, downloaded .br
files and put them like this:
1➜ the-lambda git:(master) tree chromium 2chromium 3├── chromium-78.0.3882.0.br 4└── swiftshader 5 ├── libEGL.so.br 6 └── libGLESv2.so.br 7 81 directory, 3 files
Uncompress Chrome binary on-the-fly
Install Brotli binary
We use vdechenaux/brotli-bin-amd64 to download the brotli binary.
1composer require vdechenaux/brotli-bin-amd64
The file bin/brotli-bin-amd64
should now exists.
Create a Chromium
class
I prefer to manipulate an object instead of a scalar values. Later we can imagine we had to store Chrome version and using an object will make things easier.
1<?php declare(strict_types=1); 2 3namespace App\Chromium; 4 5class Chromium 6{ 7 private $path; 8 9 public function __construct(string $path) 10 { 11 $this->path = $path; 12 } 13 14 public function getPath(): string 15 { 16 return $this->path; 17 } 18}
Create a ChromiumFactory
class
This is the class which will uncompress Chrome at the runtime into /tmp/chromium
folder.
We have profiled this part of code and it takes ~2-3 seconds on a fresh lamda, but it can be much faster if the lambda is re-used (/tmp
is not cleared and uncompressed Chrome is still here).
1<?php declare(strict_types=1); 2 3namespace App\Chromium\Factory; 4 5use App\Chromium\Chromium; 6use Symfony\Component\Finder\Finder; 7use Symfony\Component\Finder\SplFileInfo; 8use Symfony\Component\Process\Exception\ProcessFailedException; 9use Symfony\Component\Process\Process; 10 11class ChromiumFactory 12{ 13 private $binDir; 14 private $tmpDir; 15 private $chromiumDir; 16 17 public function __construct(string $binDir, string $tmpDir, string $chromiumDir) 18 { 19 $this->binDir = $binDir; 20 $this->tmpDir = $tmpDir; 21 $this->chromiumDir = $chromiumDir; 22 } 23 24 public function initialize(): Chromium 25 { 26 $finder = new Finder(); 27 $finder->name('chromium-*')->files()->in($this->chromiumDir); 28 29 foreach ($finder as $chromiumFile) { 30 break; 31 } 32 33 if (!isset($chromiumFile) || !($chromiumFile instanceof SplFileInfo)) { 34 throw new \RuntimeException(sprintf( 35 'Unable to find Chromium binary in "%s" directory.', 36 $this->chromiumDir 37 )); 38 } 39 40 $this->inflate($chromiumFile->getFilename()); 41 $this->inflate('swiftshader/libEGL.so.br'); 42 $this->inflate('swiftshader/libGLESv2.so.br'); 43 44 $chromiumPath = $this->tmpDir.'/'.$chromiumFile->getFilenameWithoutExtension(); 45 46 $this->markAsExecutable($chromiumPath); 47 48 return new Chromium($chromiumPath); 49 } 50 51 protected function inflate(string $filename): void 52 { 53 $extension = '.br'; 54 $extensionLength = strlen($extension); 55 56 if (substr($filename, -$extensionLength) !== $extension) { 57 throw new \InvalidArgumentException('Not a brotli file.'); 58 } 59 60 $outputFilename = $this->tmpDir.'/'.substr($filename, 0, -$extensionLength); 61 @mkdir(dirname($outputFilename), 0777, true); 62 63 // Inflate file only if output file does not exist 64 if (!file_exists($outputFilename)) { 65 $process = new Process(["{$this->binDir}/brotli-amd64", '-d', "{$this->chromiumDir}/{$filename}", '-o', $outputFilename]); 66 $process->run(); 67 68 if (!$process->isSuccessful()) { 69 throw new ProcessFailedException($process); 70 } 71 } 72 } 73 74 protected function markAsExecutable(string $filename): void 75 { 76 $process = new Process(['chmod', '+x', $filename]); 77 $process->run(); 78 79 if (!$process->isSuccessful()) { 80 throw new ProcessFailedException($process); 81 } 82 } 83}
and configure it like this:
1services: 2 # default configuration for services in *this* file 3 _defaults: 4 autowire: true # Automatically injects dependencies in your services. 5 autoconfigure: true # Automatically registers your services as commands, event subscribers, etc. 6 7 # ... your Symfony services ... 8 9 App\Chromium\Factory\ChromiumFactory: 10 arguments: 11 $tmpDir: '/tmp/chromium' # it probably better to use `sys_get_temp_dir()` 12 $binDir: '%kernel.project_dir%/bin' 13 $chromiumDir: '%kernel.project_dir%/chromium'
Use Browsershot with the ChromiumFactory
This is an example of how to use Browsershot and the ChromiumFactory
inside a Message handler (specific to Symfony Messenger Component), but you can use them anywhere you want.
I've used league/flysystem-bundle
and configured a Scaleway filesystem adapter in order to save my PDF on Scaleway.
1<?php declare(strict_types=1); 2 3namespace App\MessageHandler; 4 5use App\Chromium\Factory\ChromiumFactory; 6use App\Message\GeneratePdfMessage; 7use League\Flysystem\FilesystemInterface; 8use Psr\Log\LoggerAwareInterface; 9use Psr\Log\LoggerAwareTrait; 10use Spatie\Browsershot\Browsershot; 11use Symfony\Component\Messenger\Handler\MessageHandlerInterface; 12 13class GeneratePdfMessageHandler implements MessageHandlerInterface, LoggerAwareInterface 14{ 15 use LoggerAwareTrait; 16 17 private $chromiumFactory; 18 private $s3Storage; 19 20 public function __construct(ChromiumFactory $chromiumFactory, FilesystemInterface $s3Storage) 21 { 22 $this->chromiumFactory = $chromiumFactory; 23 $this->s3Storage = $s3Storage; 24 } 25 26 public function __invoke(GeneratePdfMessage $message): void 27 { 28 $pdf = $this->getBrowsershot() 29 ->setHtml('My html...') 30 ->pdf(); 31 // $pdf contains binary file content 32 33 // Let's save it on Scaleway! 34 $this->s3Storage->put('my-file.pdf', $pdf); 35 } 36 37 protected function getBrowsershot(): Browsershot 38 { 39 $chromium = $this->chromiumFactory->initialize(); 40 41 $browsershot = (new Browsershot()) 42 ->setChromePath($chromium->getPath()) 43 44 // recommended arguments 45 ->addChromiumArguments([ 46 'disable-dev-shm-usage', // https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md#tips 47 'disable-gpu', 48 'single-process', 49 'no-sandbox', 50 ]) 51 52 // we needed those options in our lambda to prevent issues, but you can ignore them 53 ->ignoreHttpsErrors() 54 ->setOption('waitUntil', 'domcontentloaded') // when event `DOMContentLoaded` is fired, external resources that takes longer to load (or timeout after 2 min) are not waited. 55 ; 56 57 return $browsershot; 58 } 59}
And voilà! When executing this code, a PDF should have been generated with Browsershot and Puppeteer and be saved on Scaleway.