GCP & SFMC
This is the first in what will likely be many posts about how to use GCP to build, interrogate and extend your SFMC environment.
The aim of these posts is to share my learnings, solutions and code with like minded technical SFMC architects who have a curiosity and desire to push their solutions beyond SaaS.
A little Intro…
Like many of my colleagues I’ve been working with SFMC for several years and have a fairly deep understanding of the platform, it’s various nuances, quirks and limitations.
During the first COVID Lockdown of 2020 (there would be many) I decided the commit myself to learning something new (and becoming certified in) that I’d been meaning to commit some time to for some time. This thing was GCP (Google Cloud Platform).
You may ask “why not AWS or MS Azure?”
Well for me, I saw some instant low hanging fruit and correlations to my existing skillset along with some product gaps within existing SFMC integrations that I wanted to understand how I could extend or enhance.
Some of these include:
- Big Query (GCP’s Data warehousing product) uses standard SQL. We use T SQL for SFMC so there’s something there (BQ also offers Machine Learning capabilities using SQL and I’ll get to that in future articles).
- If you have Google Analytics 360 (Also GA4) there is a near native integration for that web data into BQ feeding intra-day data into the DWH.
This means you now have the ability to use SQL to aggregate, interrogate and manipulate web behaviour data that is highly likely to involve a click through and a SFMC Id at some point.
- GCP Auto ML (tables) this product has been in Beta for awhile but it allows a non technical, non data science user the ability to create, train and deploy machine learning models quite easily.
If I had a dollar for every client that I’ve worked with over the years that would die for a propensity to ….. score or predicted LTV score in their available sales/ marketing data…
*Also note the Auto ML uses GCP AI to build and train your models which provides an extremely high accuracy to the model results.
- It’s free and well documented.
GCP offers a “free tier” with $500 (this may have changed) of free credit a month. GCP like other cloud providers bills based on computation and storage unlike the license based cost that the typical SaaS platforms charge with (see GCP price calc here https://cloud.google.com/products/calculator ). This is great for building and managing your own tests and deployments plus it’s linked back to your Google account (authorised login and access to your’s company active directory if you work with Gmail etc).
- Simplicity, Google has always had somewhat of a user friendly approach to its’ products. This is no different with their cloud solutions. GCP has A LOT of products and A LOT of vendors/ associated partners (ISVs etc).
However, there seems to be less than AWS and they also seem to be easier to navigate in the console. There was also a clear delineation of when and where to use each of the products which I found quite simple to follow and memorise while I was studying for the GCP Certs (Professional Data Engineer & Architect).
If you sum up the above points (there are a few more to do with GA and media use cases) there’s a fairly compelling argument why someone who works on SFMC and in digital marketing might want to learn how about GCP and how to leverage it to enhance a Martech stack.
I’ve always found the easiest way to learn something new is to take something you’re familiar with and find a like for like comparison to understand how it works or what It does.
So, if we apply a similar lens to GCP’s products (right now we’ll touch on only a few) it will become easier to rationalise and piece together a solution like we would do in SFMC.
This is essentially a data lake consisting of “buckets” which can contain folders and files. Much like a SFTP site it doesn’t matter what type of files you store here.
This is GCP’s comparison with Amazon S3 and there are various config options for storage location (regional, multi region etc), levels of access, frequency of usage (this will dictate the pricing of storage), public vs private etc.
Pub/Sub is a cloud hosted message ingestion tool much like Apache Kafka which allows for high volume message streaming and process with a “Publish, Subscribe” model. This can be handy for some Martech solutions if you are posting data into GCP and is also something we can use to connect some of the products together natively rather than using APIs.
Big Query is a Peta Byte scale relational enterprise data warehouse. BQ uses standard SQL syntax and has the ability to ingest data from Cloud Storage, via streaming inserts, via its’ own import tool from locations such as AWS S3 and various other options. It also has the ability to store and schedule SQL activities with the resulting outputs being new tables OR appending the results into an existing table (sounds familiar right??)
For now let’s just compare this to Data Extensions & SQL Activities with SFMC.
BQ Also has additional capability with BQML for machine learning, data management with partitioned and clustered tables and a few other things that help to manage performance.
Storage within BQ is also extremely cheap even in comparison to Cloud Storage.
GCP has various compute options which range from VM’s on compute engine, Kubernetes, App Engine, and lastly Cloud Functions.
Cloud Functions are server-less Functions that run on request, This is not a new concept (AWS and Azure both have like for like comparisons) but this product will allow us to stitch together various products and invoke requests will a limited amount of effort.
Cloud functions can have up to 8GB of memory & a max runtime of 9 mins (540 seconds). The infrastructure can also be hosted in any of the typical GCP data centres globally.
9 mins doesn’t sound like a long time but from my experience 9 mins of GCP computer > 30 mins of SSJS run time on SFMC.
Given that CF’s are server-less means we don’t have to do a lot to configure them or maintain them, they also scale to 0 which means that you don’t pay for computation time while they aren’t running which is very cost effective.
CF’s are kind of like the SSJS activities of GCP (an invokable script that performs a certain task or tasks).
We can invoke a CF by a few different methods:
1. HTTPS — simply by posting to the dedicated URL that the CF is hosted on.
2. Pub/Sub — by subscribing to a Pub/Sub topic and subsequently managing it’s message (this doesn’t necessarily need to be from an external party, it can also be from within your GCP project).
3. Cloud Storage — much like SFMC we can kick off a script when a file lands in storage. We can also invoke a CF if a file within a bucket is updated or deleted. We can then listen for the data and context of the file to use in later processing within the CF.
Unlike SFMC we don’t need to import a file into a DE to then process it’s contents, files can be in any format with any contents. We can also programmatically unpack, repack and post files or data anywhere across the web
** Note that unlike Automation Studio there is no scheduled invocation process for CF’s however you can use cloud scheduler to invoke Pub/Sub or using stack driver to listen for an event and then invoke Pub/Sub. eg a new table being created in Big Query.
Given that Cloud Functions are running on server you will need to write your function in a server side language such as Python, Node.js, Java etc.
In my next posts I’ll share some of the architectural patterns I use, and Python scripts to do things like:
- Run an Auto Machine Learning model via a batch Method - Read a csv file in a Cloud Function and Post the results to a SFMC DE via API
- Programatically build a SFMC Data Extension from a csv file and import the contents of the file in the DE
- Build a Cloud Function to find and replace text values across various Content Builder and Automation Studio Activities.
- Add SFMC Transactional Triggered Email activities into your Cloud Functions to manage processing notifications.