Generate PDFs on Amazon AWS with PHP and Puppeteer: The Best Way


warning

This article is a following of article Generate PDFs on Amazon AWS with PHP and Puppeteer, you must consider reading it before going further.

Several months ago, I wrote my first article explaining how to use Browsershot and Puppeteer on AWS Lambda. We saw how to ship a brotli-fied Chrome with our lambda, how to un-brotlify Chrome at the runtime, and how to use it with Browsershot.

But yesterday, I had to update the Chrome version and I faced many issues:

  • I had to download Chrome binary and Swiftshader librairies from chrome-aws-lambda and do the update manually
  • Since binaries are not the same, I had to update the ChromiumFactory to handle the file swiftshader.tar.br
  • I had to update the Chrome flags list by using those from chrome-aws-lambda.

This is the first, and the last time I want to do that.

Why should I do what chrome-aws-lambda already does well? Isn't possible to use chrome-aws-lambda with Browsershot?

After many hours, I was able to use chrome-aws-lambda with a bridge between Browsershot PHP class and Browsershot's bin/browser.js, thanks to the method Browsershot#setBinPath that allows us to use a custom .js file.

Cleaning

First, let's clean a bunch of things:

  • delete chromium/ directory
  • delete Chromium and ChromiumFactory classes (and remove them from Symfony services configuration)
  • uninstall dependency vdechenaux/brotli-bin-amd64: composer remove vdechenaux/brotli-bin-amd64

Installing chrome-aws-lambda

You can't install whatever version of chrome-aws-lambda or puppeteer you want, they must be compatible together, see chrome-aws-lambda's versioning table.

When writing this article, I decided to go with chrome-aws-lambda@~2.0.0 (which use Chrome 79):

package.json
  1{
  2  ...
  3  "dependencies": {
4 +    "chrome-aws-lambda": "~2.0.0",
  5    "puppeteer": "~2.0.0"
  6  }
  7}

Creating the bridge

The most important thing is to handle the input and the output the same way than Browsershot does. It means:

  • your binary must be able to handle argument -f <file.json> or JSON passed at 1st argument
  • your binary must output data in base64 when needed

It may be hard, but in fact it's not.

I've created a bin/browser.js file which:

  • get input (request) like Browsershot does (literally a copy/paste)
  • update this request with chrome-aws-lambda's data (Chrome path and flags)
  • override process.argv[2] with the new JSON request
  • and run the original Browsershot JS file
bin/browser.js
 1#!/usr/bin/env node
 2
 3const fs = require('fs');
 4const chromium = require('chrome-aws-lambda');
 5
 6const [, , ...args] = process.argv;
 7
 8/**
 9 * There are two ways for Browsershot to communicate with puppeteer:
10 * - By giving a options JSON dump as an argument
11 * - Or by providing a temporary file with the options JSON dump,
12 *   the path to this file is then given as an argument with the flag -f
13 */
14const request = args[0].startsWith('-f ')
15  ? JSON.parse(fs.readFileSync(new URL(args[0].substring(3))))
16  : JSON.parse(args[0]);
17
18async function bridge() {
19  // merge Browsershot options with chromium-aws-lambda options
20  request.options.executablePath = await chromium.executablePath;
21  request.options.args = [
22    ...chromium.args,
23    ...request.options.args,
24    '--disable-dev-profile',
25    '--user-data-dir=/dev/null',
26  ];
27
28  // override process arguments
29  process.argv[2] = JSON.stringify(request);
30
31  // then execute Browsershot's initial binary
32  return require('../vendor/spatie/browsershot/bin/browser');
33  // or if you use Browsershot ^3.38, see https://github.com/spatie/browsershot/pull/399
34  return require('../vendor/spatie/browsershot/bin/browser').callBrowser(chromium.puppeteer);
35}
36
37bridge();

This is a real bridge between the Browsershot PHP and Browsershot JS.

Using the bridge

In your PHP code, when you use Browsershot:

1$myBrowsershotInstance->setBinPath('/path/to/bin/browser.js');

It is also possible to manually run this file, like Browsershot can do:

1$ PATH=$PATH:/usr/local/bin NODE_PATH=`npm root -g` node 'bin/browser.js' \
2    '{"url":"https:\/\/google.fr\/","action":"screenshot","options":{"type":"png","args":["--disable-dev-shm-usage"],"viewport":{"width":1920,"height":1080},"ignoreHttpsErrors":true,"waitUntil":"domcontentloaded"}}'

If some base64 code is shown, then the bridge is working correctly!