51ºÚÁϲ»´òìÈ-defined SQL functions in Query Service
51ºÚÁϲ»´òìÈ-defined functions, herein referred to as ADFs, are prebuilt functions in 51ºÚÁϲ»´òìÈ Experience Platform Query Service that help perform common business-related tasks on Experience Event data. These include functions for Sessionization and Attribution like those found in 51ºÚÁϲ»´òìÈ Analytics.
This document provides information for 51ºÚÁϲ»´òìÈ-defined functions available in Query Service.
Window functions window-functions
The majority of the business logic requires gathering the touchpoints for a customer and ordering them by time. This support is provided by Spark SQL in the form of window functions. Window functions are part of standard SQL and are supported by many other SQL engines.
A window function updates an aggregation and returns a single item for each row in your ordered subset. The most basic aggregation function is SUM()
. SUM()
takes your rows and gives you one total. If you instead apply SUM()
to a window, turning it into a window function, you receive a cumulative sum with each row.
The majority of the Spark SQL helpers are window functions that update each row in your window, with the state of that row added.
Query syntax
OVER ({PARTITION} {ORDER} {FRAME})
{PARTITION}
PARTITION BY endUserIds._experience.mcid.id
{ORDER}
ORDER BY timestamp
{FRAME}
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
Sessionization
When you are working with Experience Event data originating from a website, mobile application, interactive voice response system, or any other customer interaction channel, it helps if events can be grouped around a related period of activity. Typically, you have a specific intent driving your activity like researching a product, paying a bill, checking account balance, filling out an application, and so on.
This grouping, or sessionization of data, helps associate the events to uncover more context about the customer experience.
For more information about sessionization in 51ºÚÁϲ»´òìÈ Analytics, see the documentation on context-aware sessions.
Query syntax
SESS_TIMEOUT({TIMESTAMP}, {EXPIRATION_IN_SECONDS}) OVER ({PARTITION} {ORDER} {FRAME})
{TIMESTAMP}
{EXPIRATION_IN_SECONDS}
An explanation of the parameters within the OVER()
function can be found in the window functions section.
Example query
SELECT
endUserIds._experience.mcid.id as id,
timestamp,
SESS_TIMEOUT(timestamp, 60 * 30)
OVER (PARTITION BY endUserIds._experience.mcid.id
ORDER BY timestamp
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS session
FROM experience_events
ORDER BY id, timestamp ASC
LIMIT 10
Results
id | timestamp | session
----------------------------------+-----------------------+--------------------
100080F22A45CB40-3A2B7A8E11096B6 | 2018-01-18 06:55:53.0 | (0,1,true,1)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-01-18 06:56:51.0 | (58,1,false,2)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-01-18 06:57:47.0 | (56,1,false,3)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-01-18 06:58:27.0 | (40,1,false,4)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-01-18 06:59:22.0 | (55,1,false,5)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-02-03 01:16:23.0 | (1361821,2,true,1)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-02-03 01:17:17.0 | (54,2,false,2)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-02-03 01:18:06.0 | (49,2,false,3)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-02-03 01:18:39.0 | (33,2,false,4)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-02-03 01:19:10.0 | (31,2,false,5)
(10 rows)
For the sample query given, the results are given in the session
column. The session
column is made up of the following components:
({TIMESTAMP_DIFF}, {NUM}, {IS_NEW}, {DEPTH})
{TIMESTAMP_DIFF}
{NUM}
PARTITION BY
of the window function.{IS_NEW}
{DEPTH}
SESS_START_IF
This query returns the state of the session for the current row, based on the current timestamp and the expression given and starts a new session with the current row.
Query syntax
SESS_START_IF({TIMESTAMP}, {TEST_EXPRESSION}) OVER ({PARTITION} {ORDER} {FRAME})
{TIMESTAMP}
{TEST_EXPRESSION}
application.launches > 0
.An explanation of the parameters within the OVER()
function can be found in the window functions section.
Example query
SELECT
endUserIds._experience.mcid.id AS id,
timestamp,
IF(application.launches.value > 0, true, false) AS isLaunch,
SESS_START_IF(timestamp, application.launches.value > 0)
OVER (PARTITION BY endUserIds._experience.mcid.id
ORDER BY timestamp
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS session
FROM experience_events
ORDER BY id, timestamp ASC
LIMIT 10
Results
id | timestamp | isLaunch | session
----------------------------------+-----------------------+----------+--------------------
100080F22A45CB40-3A2B7A8E11096B6 | 2018-01-18 06:55:53.0 | true | (0,1,true,1)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-01-18 06:56:51.0 | false | (58,1,false,2)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-01-18 06:57:47.0 | false | (56,1,false,3)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-01-18 06:58:27.0 | true | (40,2,true,1)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-01-18 06:59:22.0 | false | (55,2,false,2)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-02-03 01:16:23.0 | false | (1361821,2,false,3)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-02-03 01:17:17.0 | false | (54,2,false,4)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-02-03 01:18:06.0 | false | (49,2,false,5)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-02-03 01:18:39.0 | false | (33,2,false,6)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-02-03 01:19:10.0 | false | (31,2,false,7)
(10 rows)
For the sample query given, the results are given in the session
column. The session
column is made up of the following components:
({TIMESTAMP_DIFF}, {NUM}, {IS_NEW}, {DEPTH})
{TIMESTAMP_DIFF}
{NUM}
PARTITION BY
of the window function.{IS_NEW}
{DEPTH}
SESS_END_IF
This query returns the state of the session for the current row, based on the current timestamp and the expression given, ends the current session, and starts a new session on the next row.
Query syntax
SESS_END_IF({TIMESTAMP}, {TEST_EXPRESSION}) OVER ({PARTITION} {ORDER} {FRAME})
{TIMESTAMP}
{TEST_EXPRESSION}
application.launches > 0
.An explanation of the parameters within the OVER()
function can be found in the window functions section.
Example query
SELECT
endUserIds._experience.mcid.id AS id,
timestamp,
IF(application.applicationCloses.value > 0 OR application.crashes.value > 0, true, false) AS isExit,
SESS_END_IF(timestamp, application.applicationCloses.value > 0 OR application.crashes.value > 0)
OVER (PARTITION BY endUserIds._experience.mcid.id
ORDER BY timestamp
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS session
FROM experience_events
ORDER BY id, timestamp ASC
LIMIT 10
Results
id | timestamp | isExit | session
----------------------------------+-----------------------+----------+--------------------
100080F22A45CB40-3A2B7A8E11096B6 | 2018-01-18 06:55:53.0 | false | (0,1,true,1)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-01-18 06:56:51.0 | false | (58,1,false,2)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-01-18 06:57:47.0 | true | (56,1,false,3)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-01-18 06:58:27.0 | false | (40,2,true,1)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-01-18 06:59:22.0 | false | (55,2,false,2)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-02-03 01:16:23.0 | false | (1361821,2,false,3)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-02-03 01:17:17.0 | false | (54,2,false,4)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-02-03 01:18:06.0 | false | (49,2,false,5)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-02-03 01:18:39.0 | false | (33,2,false,6)
100080F22A45CB40-3A2B7A8E11096B6 | 2018-02-03 01:19:10.0 | false | (31,2,false,7)
(10 rows)
For the sample query given, the results are given in the session
column. The session
column is made up of the following components:
({TIMESTAMP_DIFF}, {NUM}, {IS_NEW}, {DEPTH})
{TIMESTAMP_DIFF}
{NUM}
PARTITION BY
of the window function.{IS_NEW}
{DEPTH}
Pathing
Pathing can be used to understand the customer’s depth of engagement, confirm the intended steps of an experience are working as designed, and identify potential pain points impacting the customer.
The following ADFs support establishing pathing views from their previous and next relationships. You’ll be able to create previous pages and next pages, or step through multiple events to create pathing.
Previous page
Determines the previous value of a particular field a defined number of steps away within the window. Notice in the example that the WINDOW
function is configured with a frame of ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
setting the ADF to look at the current row and all subsequent rows.
Query syntax
PREVIOUS({KEY}, {SHIFT}, {IGNORE_NULLS}) OVER ({PARTITION} {ORDER} {FRAME})
{KEY}
{SHIFT}
{IGNORE_NULLS}
{KEY}
values should be ignored. By default, the value is false
.An explanation of the parameters within the OVER()
function can be found in the window functions section.
Example query
SELECT endUserIds._experience.mcid.id, timestamp, web.webPageDetails.name
PREVIOUS(web.webPageDetails.name, 3)
OVER(PARTITION BY endUserIds._experience.mcid.id
ORDER BY timestamp
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS previous_page
FROM experience_events
ORDER BY endUserIds._experience.mcid.id, timestamp ASC
Results
id | timestamp | name | previous_page
-----------------------------------+-----------------------+-------------------------------------+-----------------------------------------------------
457C3510571E5930-69AA721C4CBF9339 | 2017-11-08 17:15:28.0 | |
457C3510571E5930-69AA721C4CBF9339 | 2017-11-08 17:53:05.0 | Home |
457C3510571E5930-69AA721C4CBF9339 | 2017-11-08 17:53:45.0 | Kids | (Home)
457C3510571E5930-69AA721C4CBF9339 | 2017-11-08 19:22:34.0 | | (Kids)
457C3510571E5930-69AA721C4CBF9339 | 2017-11-08 20:01:12.0 | Home |
457C3510571E5930-69AA721C4CBF9339 | 2017-11-08 20:01:57.0 | Kids | (Home)
457C3510571E5930-69AA721C4CBF9339 | 2017-11-08 20:03:36.0 | Search Results | (Kids)
457C3510571E5930-69AA721C4CBF9339 | 2017-11-08 20:04:30.0 | Product Details: Pemmican Power Bar | (Search Results)
457C3510571E5930-69AA721C4CBF9339 | 2017-11-08 20:05:27.0 | Shopping Cart: Cart Details | (Product Details: Pemmican Power Bar)
457C3510571E5930-69AA721C4CBF9339 | 2017-11-08 20:06:07.0 | Shopping Cart: Shipping Information | (Shopping Cart: Cart Details)
(10 rows)
For the sample query given, the results are given in the previous_page
column. The value within the previous_page
column is based on the {KEY}
used in the ADF.
Next page
Determines the next value of a particular field a defined number of steps away within the window. Notice in the example that the WINDOW
function is configured with a frame of ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
setting the ADF to look at the current row and all subsequent rows.
Query syntax
NEXT({KEY}, {SHIFT}, {IGNORE_NULLS}) OVER ({PARTITION} {ORDER} {FRAME})
{KEY}
{SHIFT}
{IGNORE_NULLS}
{KEY}
values should be ignored. By default, the value is false
.An explanation of the parameters within the OVER()
function can be found in the window functions section.
Example query
SELECT endUserIds._experience.aaid.id, timestamp, web.webPageDetails.name,
NEXT(web.webPageDetails.name, 1, true)
OVER(PARTITION BY endUserIds._experience.aaid.id
ORDER BY timestamp
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
AS next_page
FROM experience_events
ORDER BY endUserIds._experience.aaid.id, timestamp ASC
LIMIT 10
Results
id | timestamp | name | previous_page
-----------------------------------+-----------------------+-------------------------------------+---------------------------------------
457C3510571E5930-69AA721C4CBF9339 | 2017-11-08 17:15:28.0 | | (Home)
457C3510571E5930-69AA721C4CBF9339 | 2017-11-08 17:53:05.0 | Home | (Kids)
457C3510571E5930-69AA721C4CBF9339 | 2017-11-08 17:53:45.0 | Kids | (Home)
457C3510571E5930-69AA721C4CBF9339 | 2017-11-08 19:22:34.0 | | (Home)
457C3510571E5930-69AA721C4CBF9339 | 2017-11-08 20:01:12.0 | Home | (Kids)
457C3510571E5930-69AA721C4CBF9339 | 2017-11-08 20:01:57.0 | Kids | (Search Results)
457C3510571E5930-69AA721C4CBF9339 | 2017-11-08 20:03:36.0 | Search Results | (Product Details: Pemmican Power Bar)
457C3510571E5930-69AA721C4CBF9339 | 2017-11-08 20:04:30.0 | Product Details: Pemmican Power Bar | (Shopping Cart: Cart Details)
457C3510571E5930-69AA721C4CBF9339 | 2017-11-08 20:05:27.0 | Shopping Cart: Cart Details | (Shopping Cart: Shipping Information)
457C3510571E5930-69AA721C4CBF9339 | 2017-11-08 20:06:07.0 | Shopping Cart: Shipping Information | (Shopping Cart: Billing Information)
(10 rows)
For the sample query given, the results are given in the previous_page
column. The value within the previous_page
column is based on the {KEY}
used in the ADF.
Time-between
Time-between allows you to explore latent customer behavior within a certain time period before or after an event occurs.
Time-between previous match
This query returns a number representing the unit of time since the previous matching event was seen. If no matching event was found, it returns null.
Query syntax
TIME_BETWEEN_PREVIOUS_MATCH(
{TIMESTAMP}, {EVENT_DEFINITION}, {TIME_UNIT})
OVER ({PARTITION} {ORDER} {FRAME})
{TIMESTAMP}
{EVENT_DEFINITION}
{TIME_UNIT}
An explanation of the parameters within the OVER()
function can be found in the window functions section.
Example query
SELECT
page_name,
SUM (time_between_previous_match) / COUNT(page_name) as average_minutes_since_registration
FROM
(
SELECT
endUserIds._experience.mcid.id as id,
timestamp, web.webPageDetails.name as page_name,
TIME_BETWEEN_PREVIOUS_MATCH(timestamp, web.webPageDetails.name='Account Registration|Confirmation', 'minutes')
OVER(PARTITION BY endUserIds._experience.mcid.id
ORDER BY timestamp
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS time_between_previous_match
FROM experience_events
)
WHERE time_between_previous_match IS NOT NULL
GROUP BY page_name
ORDER BY average_minutes_since_registration
LIMIT 10
Results
page_name | average_minutes_since_registration
-----------------------------------+------------------------------------
|
Account Registration|Confirmation | 0.0
Seasonal | 5.47029702970297
Equipment | 6.532110091743119
Women | 7.287081339712919
Men | 7.640918580375783
Product List | 9.387459807073954
Unlimited Blog|February | 9.954545454545455
Product Details|Buffalo | 13.304347826086957
Unlimited Blog|June | 770.4285714285714
(10 rows)
For the sample query given, the results are given in the average_minutes_since_registration
column. The value within the average_minutes_since_registration
column is the difference in time between the current and previous events. The unit of time was defined previously in the {TIME_UNIT}
.
Time-between next match
This query returns a negative number representing the unit of time behind the next matching event. If a matching event is not found, null is returned.
Query syntax
TIME_BETWEEN_NEXT_MATCH({TIMESTAMP}, {EVENT_DEFINITION}, {TIME_UNIT}) OVER ({PARTITION} {ORDER} {FRAME})
{TIMESTAMP}
{EVENT_DEFINITION}
{TIME_UNIT}
An explanation of the parameters within the OVER()
function can be found in the window functions section.
Example query
SELECT
page_name,
SUM (time_between_next_match) / COUNT(page_name) as average_minutes_until_order_confirmation
FROM
(
SELECT
endUserIds._experience.mcid.id as id,
timestamp, web.webPageDetails.name as page_name,
TIME_BETWEEN_NEXT_MATCH(timestamp, web.webPageDetails.name='Shopping Cart|Order Confirmation', 'minutes')
OVER(PARTITION BY endUserIds._experience.mcid.id
ORDER BY timestamp
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
AS time_between_next_match
FROM experience_events
)
WHERE time_between_next_match IS NOT NULL
GROUP BY page_name
ORDER BY average_minutes_until_order_confirmation DESC
LIMIT 10
Results
page_name | average_minutes_until_order_confirmation
-----------------------------------+------------------------------------------
Shopping Cart|Order Confirmation | 0.0
Men | -9.465295629820051
Equipment | -9.682098765432098
Product List | -9.690661478599221
Women | -9.759459459459459
Seasonal | -10.295
Shopping Cart|Order Review | -366.33567364956144
Unlimited Blog|February | -615.0327868852459
Shopping Cart|Billing Information | -775.6200495367711
Product Details|Buffalo | -1274.9571428571428
(10 rows)
For the sample query given, the results are given in the average_minutes_until_order_confirmation
column. The value within the average_minutes_until_order_confirmation
column is the difference in time between the current and next events. The unit of time was defined previously in the {TIME_UNIT}
.
Next steps
Using the functions described here, you can write queries to access your own Experience Event datasets using Query Service. For more information about authoring queries in Query Service, see the documentation on creating queries.
Additional resources
The following video shows how to run queries in the 51ºÚÁϲ»´òìÈ Experience Platform interface and in a PSQL client. Additionally, the video also uses examples involving individual properties in an XDM object, using 51ºÚÁϲ»´òìÈ-defined functions, and using CREATE TABLE AS SELECT (CTAS).
51ºÚÁϲ»´òìÈ Experience Platform use Query Service From the 51ºÚÁϲ»´òìÈ Experience Platform UI, let’s navigate to queries. From the query environment, let’s click the create query button. You are now seeing a command line like user interface which allows you to create platform directly by making use of SQL statements. For instance, by typing show tables and clicking the execute button, I’m seeing all platform datasets as a response. One of these datasets is the website interactions EMEA dataset. Let’s create that dataset now.
Let’s start with some basic queries. For instance, I’m interested in knowing from what location customers of La Boutique were visiting product pages. This query will give me some insight in that. Please note the limit of ten. This is for performance reasons. When executing queries, it’s best to limit your responses to just a couple of lines while you’re testing and developing your queries. You can now see a response from Platform containing location information of Belgian cities. This response is a flattened object, and not a single value. The place_context.geo XDM object contains four attributes: schema.latitude, schema.longitude, country, and city. When such an object is declared as a column, it will return the entire object as a string. In order to query individual properties of an XDM object, we have to use the dot notation. To find out more about the structure of an XDM schema, let’s go to schemas in the Platform UI, and open the EMEA website interaction data schema. You can see the place context object alongside many other. By opening the place context object, and going a little deeper, you can see the geo objects, and in there, for instance, country code. The way to get to this property from the query service UI is by specifying this path. The same concept applies also to the schema.latitude property. Let’s go back to the Query Service UI, and optimize the query that we entered a couple of minutes ago by specifying the specific properties of the place context object. And the result is now a lot more readable and understandable.
Next, I’m interested in knowing the amount of products used per day for the last ten days. So I’m selecting the dates and making a count of the number of products to events. Next, I’m interested in having a visitor count by the hour for July 17.
I’m also interested in knowing the amount of visitors on July 16 per country and city.
As an analyst, I also need to visualize the most popular products. In this case, I want to see the five most popular products with the Lisette dress being the clear winner there. So far, we’ve only used the Query Service UI inside 51ºÚÁϲ»´òìÈ Experience Platform to execute these queries, but we can also execute the same queries from an external environment by using PSQL to achieve the same outcome. PSQL is a command line application which requires me to authenticate first. The credentials to authenticate to query service from PSQL can be found in the queries and credentials window. By clicking this icon, I’m copying the full PSQL commands to my clipboard.
Let’s open a terminal window, paste the PSQL commands, and hit enter. I’m now authenticated, and can execute queries here as well.
With the Lisette dress being so popular I’d like to have an overview of the individual EC IDs of the visitors that came to see this product. This query is showing me the first ten EC IDs. I’m also very much interested about the activity funnel on my website. Products used don’t pay money for La Boutique, purchases do. So how does the flow from a product view to an add to cart and purchase look like. The result of this query shows the fallout in the funnel. So far, we’ve been using basic queries to query experience event data. Let’s go a bit deeper now, and use 51ºÚÁϲ»´òìÈ defined functions. SQL wasn’t built for experience analysis, so 51ºÚÁϲ»´òìÈ decide to create a number of 51ºÚÁϲ»´òìÈ defined functions to make it possible to use SQL to better understand experiences. The first two 51ºÚÁϲ»´òìÈ defined functions we’ll use are the Sessionization function and the Next Previous function. The Sessionization function, it produces the visit groupings found with 51ºÚÁϲ»´òìÈ Analytics. It performs a similar time-based grouping, but now has customizable parameters. And Next and Previous help you to understand how customers navigate your site.
For this specific query, I’d like to know what people did on my website before reaching the El Silencio product page as the third page in a session. The result is a table that shows a number of journeys that occurred and how those journeys look like. You can see that the third page visit is always El Silencio, and you can also see the two steps that happened before that, and the step that happened after that. Journeys are session based with the session timeout being set to thirty minutes here. With Sessionization, a number of standard fields are made available. Let’s explore these fields. As you can see, the session structure contains the fields is new, times, time difference, number, and depth. Let’s go back to the query service UI, and continue with another Sessionization example which shows me the visitors’ sessions and page views per day.
As a final query, I’d like to do advanced call center analysis. In this example, were joining three datasets in Platform: experience events data, call center interactions which are sent by the call center to Platform, and we are now combining those datasets with loyalty data. The output gives me a dataset with EC IDs and CRM IDs, location information, feedback from the call center including the topic of the call and the customer feeling, and also loyalty information. This is a really interesting combination of data, and as an analyst I’d like to provide La Boutique management with a visualization that shows this information. To do that, I’m going to use the CTAS functionality available with Query Service. CTAS stands for create table as select, and allows me to write the output of this query back to a dataset in Platform. By going back to the query UI, in the log menu option, I’m seeing an overview of all the queries that have been made. The first one in this list is my last query. By clicking the query, I’m seeing the query overview menu pop up, and I can view the full query that was sent to Query Service. I now want to output the result of this query into a dataset in Platform. To do that I need to click the output dataset button. I’m naming my query call center analysis, and by clicking run query, the process has now started through the HDP API of query service. This process takes a couple of minutes, and once finished will have a new dataset in Platform. By going to datasets in Platform, I can open the call center analysis datasets, and preview the results. By making use of Query Service, analysts can now explore and consume data from 51ºÚÁϲ»´òìÈ Experience Platform in near real time from any environment, and perform flexible home channel queries right at the heart of 51ºÚÁϲ»´òìÈ Experience Platform. Analysts can send their insights back to 51ºÚÁϲ»´òìÈ Experience Platform for activation, and grants can now maximize the usefulness of that data to create and deliver exceptional experiences. With that, you should now be able to set up and use Query Service.