Robust Segment implementation in a serverless CDN-cached Laravel deployment

Published in Programming on Jan 16, 2021

How to implement Segment with CloudFlare and AWS Lambda (Laravel Vapor).

Segment is an analytics service that lets you connect data between multiple analytics platforms without having to collect data for each one individually. It's a great service, but integrating it well can be quite hard.

I spent quite a bit of time on this so I'm documenting the process for everyone else.

Plan

Here's the tech that we'll be using. It's picked in a way that allows us to:

  • have virtually 100% of the visit data
  • process requests & serve responses with very high performance
  • work on AWS Lambda, specifically with Laravel Vapor
  • make use of edge & browser response caching without losing any data

Note: Segment also supports email pixel tracking, but we won't be doing that here. That's the same for all apps and here we're focusing on the "HTTP aspect" of things with this specific deployment setup.

PHP SDK

Segment has an official SDK for PHP that lets you easily track page visits and log events in Segment without having to write API calls manually.

That said, the SDK is a bit old and we'll have to make some manual changes to make it work with our setup.

DynamoDB

The SDK ships with several drivers (called consumers) for logging events.

The most basic one is libcurl which, as the name suggests, just makes curl requests to the Segment API in your request's runtime.

Another one is file, which stores events in a temporary file. The temporary file can be sent to Segment's API & cleaned in a scheduled job.

Files are preferable over curl, but we can't use them in

We could queue the payload for Segment on each request, but it could easily make our queue invocation count massive.

So we need some temporary storage that will be read from & sent to Segment using a scheduled job, but it needs to support Vapor.

For that, we'll use DynamoDB. It's a serverless key-value store and it's free-tier eligible. On each request we'll add the Segment payload to the "buffer" in our scheduled job we'll flush it and send it over to the API.

JS + PHP fallback

Segment lets you track actions both using JS and PHP.

Generally, you want to use JS for tracking page visits, and you want to use PHP for tracking internal events.

JS is also better because it doesn't slow down the response. Having to send data to Segment from PHP on each request can slow down page loads for no good reason.

So, if you can use JS, use it.

But, don't use only JS. The Segment JS analytics get very often blocked by adblocker extensions.

So we'll have a PHP fallback in case JS gets blocked.

Implementation

DynamoDB

Create a DynamoDB table called e.g. segment_messages. Each Segment event will be a row in this table.

DynamoDB is a schemaless database, so you just need to specify a key name. Go with messageId for consistency with Segment's API.

Next, install the baopham/laravel-dynamodb package.

It lets you create Eloquent-like models that represent DynamoDB rows. It's pretty cool.

composer install baopham/laravel-dynamodb

Once it's installed, disable autodiscovery for this package by adding this bit of JSON to your composer.json file:

"extra": {
    "laravel": {
        "dont-discover": [
            "baopham/dynamodb"
        ]
    }
},

The reason for that is that we'll need the package to use AWS keys that are not part of .env but are injected at runtime into $_ENV. This is not explained in Vapor docs, but seems to be the case as the official vapor-core package uses this to inject AWS credentials into the DynamoDB cache driver's config.

Now publish the cache file:

php artisan vendor:publish

And make the connections array look like this:

'connections' => [
    'aws' => [
        'credentials' => [
            'key' => env('AWS_ACCESS_KEY_ID'),
            'secret' => env('AWS_SECRET_ACCESS_KEY'),
            // If using as an assumed IAM role, you can also use the `token` parameter
            'token' => env('AWS_SESSION_TOKEN'),
        ],
        'region' => env('AWS_DEFAULT_REGION'),
        'debug' => env('DYNAMODB_DEBUG', false),
    ],
],

And finally, create the model:

<?php

namespace App\Models;

use BaoPham\DynamoDb\DynamoDbModel;

/**
 * @param string $messageId
 * @param array<string, mixed> $payload
 */
class SegmentMessage extends DynamoDbModel
{
    protected $primaryKey = 'messageId';
    public $incrementing = false;

    public $fillable = [
        'messageId',
        'payload',
    ];

    public $casts = [
        'messageId' => 'string',
        'payload' => 'array',
    ];

    public function getTable()
    {
        // You may want to read from config if you have a staging deploy that uses a different table
        // return config('services.segment.table', 'segment_messages_test');
    
        return 'segment_messages';
    }
}

Segment SDK

Next, we'll need to install and customize the Segment SDK.

We'll be making some slightly ugly changes due to some mistakes in the SDK's design, but overall it's not too ugly.

First require the package:

composer require segmentio/analytics-php

Next, create app/Providers/SegmentServiceProvider.php that looks like this:

<?php

namespace App\Providers;

use App\SegmentDynamoDBConsumer;
use Facade\Ignition\Facades\Flare;
use Illuminate\Foundation\Application;
use Illuminate\Support\Facades\Config;
use Illuminate\Support\ServiceProvider;
use ReflectionClass;
use ReflectionObject;
use Segment;

class SegmentServiceProvider extends ServiceProvider
{
    /** @var Application */
    public $app;

    /**
     * Register services.
     *
     * @return void
     */
    public function register()
    {
        //
    }

    /**
     * Bootstrap services.
     *
     * @return void
     */
    public function boot()
    {
        $this->configureDynamoDBForVapor();

        if ($key = config('services.segment.key')) {
            if ($this->app->runningInConsole() && ! $this->app->runningUnitTests()) {
                return;
            }

            Segment::init($key, ['consumer' => 'file']);

            if (! $this->app->runningUnitTests()) {
                $client = (new ReflectionClass(Segment::class))->getStaticPropertyValue('client');

                // Replace the consumer with our own class
                $property = (new ReflectionObject($client))->getProperty('consumer');
                $property->setAccessible(true);
                $property->setValue($client, new SegmentDynamoDBConsumer($key));

                $this->app->terminating(function () {
                    Segment::flush();
                });
            }
        }
    }

    protected function configureDynamoDBForVapor()
    {
        if (! isset($_ENV['VAPOR_SSM_PATH'])) {
            return;
        }

        Config::set('dynamodb.connections.aws', array_merge(Config::get('dynamodb.connections.aws') ?? [], [
            'credentials' => [
                'key' => $_ENV['AWS_ACCESS_KEY_ID'] ?? null,
                'secret' => $_ENV['AWS_SECRET_ACCESS_KEY'] ?? null,
                'token' => $_ENV['AWS_SESSION_TOKEN'] ?? null,
            ],
            'region' => $_ENV['AWS_DEFAULT_REGION'] ?? 'us-east-1',
        ]));
    }
}

The service provider does these tasks:

  • it sets the DynamoDB package's config credentials to the environment AWS keys
  • it initializes the Segment SDK if we're running in the browser (= not console but also not tests)
  • the hacky part: it swap's the SDK's consumer with our own class using Reflection
  • it registers a terminating hook that sends the data to DynamoDB. This hook can be executed after the response was sent to the browser if the webserver is configured in a way that allows this. This means that the request might not be blocking.

Now that this is done, let's create the consumer class:

<?php

namespace App;

use App\Models\SegmentMessage;
use Segment_Consumer;

class SegmentDynamoDBConsumer extends Segment_Consumer
{
    public array $messages = [];

    public function getConsumer()
    {
        return 'DynamoDB';
    }

    public function flush()
    {
        $rows = [];

        foreach ($this->messages as &$message) {
            $rows[] = [
                'messageId' => $message['messageId'],
                'payload' => $message,
            ];

            unset($message);
        }
      
        SegmentMessage::createMany($rows);
    }

    public function write(array $message)
    {
        $this->messages[] = $message;
    }

    public function track(array $message) { $this->write($message); }
    public function identify(array $message) { $this->write($message); }
    public function group(array $message) { $this->write($message); }
    public function page(array $message) { $this->write($message); }
    public function screen(array $message) { $this->write($message); }
    public function alias(array $message) { $this->write($message); }

    public function __destruct()
    {
        return;
    }
}

Again, slightly hacky. The logic of the class is that we store any events inside the $messages property and on flush() we create a SegmentMessage model (DynamoDB row). The rest of the class is compatibility with the abstract class including some methods that aren't specified in the construct (e.g. the __destruct() is called directly from the Segment SDK).

And now let's register our SegmentServiceProvider and the DynamoDB's service provider in this specific order in config/app.php:

App\Providers\SegmentServiceProvider::class,
BaoPham\DynamoDb\DynamoDbServiceProvider::class,

Add those lines to the end of the providers array.

Finally, add your Segment keys to .env:

SEGMENT_KEY=...
SEGMENT_DYNAMO_TABLE=segment_messages

In dev/staging you can use a separate table and set the SEGMENT_KEY to some nonexistent key like test. The HTTP calls won't fail and you'll be able to observe the data flow in DynamoDB. The data just won't end on Segment, which is what you want in a testing environment.

Now that this is set up, Segment for PHP is done. You can use any of the Segment SDK methods and they will work.

JS

Add a JavaScript source in your Segment dashboard and copy the generated snippet. Paste it somewhere in the HTML, you know how to do this.

You can make this script smarter by identifying authenticated users:

<script>
    !function(){var analytics=window.analytics=...;
    analytics.load("your key");
    analytics.page();

    @auth
        if (analytics.user && analytics.user().properties().name === undefined) {
            analytics.identify(,
                @json([
                    'name' => auth()->user()->name,
                    'email' => auth()->user()->email,
                ])
            );
        }
    @endauth
    }}();
</script>

PHP fallback

A fun enhancement we can do is to send a request to our server if the JS script gets adblocked.

Create a script tag like this:

{{-- If JS analytics get blocked --}}
<script>
    window.segPinged = false;

    window.seg = function () {
        window.segPinged = true;

        document.cookie = '_seg=';

        if (! document.cookie.includes('_segid')) {
            document.cookie = '_segid=';
        }

        const segframe = document.createElement('img');
        segframe.setAttribute('src', "{!! route('segment.track', [
            'u' => request()->fullUrl(),
            't' => $title,
            'r' => request()->route()->getName(),
        ]) !!}");
        segframe.setAttribute('style', 'display: none');
        document.getElementsByTagName('body')[0].appendChild(segframe);
    }
</script>

And add onerror="window.seg()" to your JS analytics tags. The global segPinged variable makes sure that this call is not made multiple times, even if you use multiple analytics scripts.

In Segment's case, this tag is generated using JS, so in your pasted Segment snippet, make this change:

-t.src="https://cdn.segment.com/analytics.js/v1/" + key + "/analytics.min.js";
+t.src="https://cdn.segment.com/analytics.js/v1/" + key + "/analytics.min.js";t.setAttribute('onerror', "window.seg()");

When the function is called, it will create an <img> tag that requests the image using an URL like /seg?u=url-of-this-page&t=title-of-this-page&r=route-name-of-this-page.

Now, let's create the controller for our segment.track route:

<?php

namespace App\Http\Controllers;

// use App\Http\Middleware\CacheControl;
use Exception;
use Illuminate\Http\Request;
use Segment;
use Illuminate\Support\Str;

class SegmentController extends Controller
{
    public function __invoke(Request $request)
    {
        try {
            // Trim query string and /
            $signature = $this->cleanUrl(decrypt($request->cookie('_seg')));
        } catch (Exception $e) {}

        $referer = $this->cleanUrl($request->header('referer'));

        if ($signature === $referer &&
            Str::of($referer)->startsWith(config('app.url'))) {
            $this->logRequest($request, $referer);
        }

        // In my case, I use response caching by default, so I have to disable it here.
        // You can read more about that in my previous blog post.
        // CacheControl::forceNoCache();

        // 1x1 gif
        return response(base64_decode('R0lGODlhAQABAJAAAP8AAAAAACH5BAUQAAAALAAAAAABAAEAAAICBAEAOw=='), 200, [
            'Content-Type' => 'image/gif'
        ]);
    }

    protected function cleanUrl(string $url): string
    {
        return rtrim($url, '/=');
    }

    protected function logRequest(Request $request, string $referer)
    {
        [
            'u' => $url,
            't' => $title,
            'r' => $routeName,
        ] = $request->only(['u', 't', 'r']);

        $userId = auth()->id();
        $anonymousId = $request->cookie('_segid');

        if ($userId || $anonymousId) {
            $identification = [];

            if ($userId) {
                $identification += [
                    'userId' => $userId,
                    'traits' => [
                        'name' => auth()->user()->name,
                        'email' => auth()->user()->email,
                    ]
                ];
            }

            if ($anonymousId) {
                $identification += compact('anonymousId');
            }

            Segment::identify($identification);

            $search = Str::of($referer)->contains('?')
                ? $request->normalizeQueryString(Str::of($referer)->after('?'))
                : '';

            Segment::page($identification + [
                'name' => $title,
                'properties' => [
                    'url' => $url,
                    'title' => $title,
                    'routeName' => $routeName,

                    'path' => $request->getPathInfo(),
                    'referer' => $request->header('referer'),
                    'search' => $search,
                ],
            ]);
        }
    }
}

This controller does some checks to make sure that the request wasn't obviously forged, and it reads the user's data from the request. We're only sending the page metadata (url, title, Laravel route name) and the user's identifier (_segid cookie).

Also note that we're setting the cookie on the frontend which means that we can use edge caching services such as CloudFlare.

And of course, let's create a route for this controller:

Route::get('/seg', SegmentController::class)->name('segment.track');

That's it. You should now have a robust PHP analytics fallback for when JS analytics fail, and all the quality JS data (and performance & cost benefits) when JS is available.

We're also not sending any sensitive data. Just a user identifier so that we're able to track an anonymous visitor's path on our website, and page metadata to know where he was.

This is extremely valuable, as services like Google Analytics often have only a fraction of the visit data, skewing many metrics.

Speaking of Google Analytics, you can easily configure Segment to send both JS and PHP data to Google Analytics. This will let you work with all data in the Google Analytics dashboard.

Hope this helps and good luck optimizing your websites with all this newfound data :)

Newsletter

You will be notified whenever I publish an article about the topics you selected. Unsubscribe anytime.

Comments

Your comment will appear once it's approved (to fight spam).