machine learning model bigquery

Create Machine Learning model to predict online purchase conversion

This tutorial helps you to learn how to create machine learning model to predict online purchase conversion using Google Analytics and BigQuery.

If you had setup Google Analytics for your website to track visitors and you wanted to predict whether visitors will make a transaction/purchase, then you are at the right place.

1. Setup BigQuery export

You can export ga session and hit data from Google Analytics 360 account (Note, currently the integration is available only for Google Analytics 360) in to Google BigQuery.

machine learning model bigquery

Here are the steps to link Google Analytics to BigQuery and setup BigQuery export.

Note, the standard version of Google Analytics is completely free to use, but Google Analytics 360 is the premium version and it is not free. So for now let us use ga_sessions_ sample analytics data available in the google_analytics_sample public dataset for our learning purpose.

Navigate to Google Cloud Console -> BigQuery -> Resources -> bigquery-public-data (GCP Project) -> google_analytics_sample (public dataset) -> ga_sessions_

2. Create a BigQuery dataset

Create a dataset using the following command

[email protected]:~ (sneppets-gcp)$ bq mk bqml_example
Dataset 'sneppets-gcp:bqml_example' successfully created.

[email protected]:~ (sneppets-gcp)$ bq ls
   datasetId
 --------------
  bqml_example

3. Create a model

Run the following query in the Query Editor to create machine learning model to predict whether a visitor on the website will make online purchase.

CREATE OR REPLACE MODEL `bqml_example.sample_model`
OPTIONS(model_type='logistic_reg') AS
SELECT
  IF(totals.transactions IS NULL, 0, 1) AS label,
  IFNULL(device.operatingSystem, "") AS os,
  device.isMobile AS is_mobile,
  IFNULL(geoNetwork.country, "") AS country,
  IFNULL(totals.pageviews, 0) AS pageviews
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20160801' AND '20170631'
LIMIT 100000;

In the above query, the criteria considered to check whether online purchase has been made are visitor’s mobile device OS, visitor’s country and number of page views.

For instance, bqml_example is the name of the dataset and sample_model is the name of the model. The model type specified in the above query is binary logistic regression.

Note, when you run CREATE MODEL query, it actually creates a Query Job which runs asynchronously.

4. Evaluate the model

Run the following query

SELECT
  *
FROM
  ml.EVALUATE(MODEL `bqml_example.sample_model`, (
SELECT
  IF(totals.transactions IS NULL, 0, 1) AS label,
  IFNULL(device.operatingSystem, "") AS os,
  device.isMobile AS is_mobile,
  IFNULL(geoNetwork.country, "") AS country,
  IFNULL(totals.pageviews, 0) AS pageviews
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20170701' AND '20170801'));

As a result, you would see the following table metrics for logistic regression model.

Row precision recall accuracy f1_score log_loss roc_auc
1

0.432

0.05027933

0.98532971

0.090075

0.048936587

0.981565

5. Use the model

For instance, to predict number of online purchases each visitor makes, you can run the following query. As a result this query will return top 10 visitors by purchases made.

SELECT
  fullVisitorId,
  SUM(predicted_label) as total_predicted_purchases
FROM
  ml.PREDICT(MODEL `bqml_example.sample_model`, (
SELECT
  IFNULL(device.operatingSystem, "") AS os,
  device.isMobile AS is_mobile,
  IFNULL(totals.pageviews, 0) AS pageviews,
  IFNULL(geoNetwork.country, "") AS country,
  fullVisitorId
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))
GROUP BY fullVisitorId
ORDER BY total_predicted_purchases DESC
LIMIT 10;
Row fullVisitorId total_predicted_purchases
1
9417857471295131045
3
2
8388931032955052746
2
3
112288330928895942
2
4
0376394056092189113
2
5
1280993661204347450
2
6
806992249032686650
2
7
057693500927581077
2
8
2969418676126258798
2
9
7420300501523012460
2
10
1589021726696497303
1

Further Learning

References

Leave a Reply

avatar
  Subscribe  
Notify of