Recharge your customer data to deliver electrifying experiences

Documentation Experience Platform Tutorials

Last update: Fri Feb 14 2025 00:00:00 GMT+0000 (Coordinated Universal Time)

Topics:
Queries

CREATED FOR:

Beginner
Developer

Omnichannel data is a critical ingredient to power actionable customer profiles used by marketers to orchestrate activation and measure the resulting customer journeys. However, organizations face challenges in managing the quality, scale, and variety of this data. They require streamlined solutions to mitigate the impact of low-quality data, reduce time to value, and multiply ROI by using the same data for multitude of use cases.
For more information, please visit the Query Service documentation.

This video explores:

51ºÚÁÏ²»´òìÈ Experience Platform data preparation capabilities that you can leverage
Increasing ROI from 51ºÚÁÏ²»´òìÈ Real-Time CDP, 51ºÚÁÏ²»´òìÈ Journey Optimizer, and Customer Journey Analytics

video poster

Transcript

My name is Alex Alinoli and Iâ€™m a senior data architect. Today Iâ€™ll be talking about advanced data processing and follow that up with a demo. To demonstrate the power of advanced data processing, weâ€™re going to use an example called Abandoned Browse. Now an example of Abandoned Browse is imagine I had signed up for a yoga class and my yogi told me that this Luma Apparel site was the best place for new gear. So imagine Iâ€™m on the site, browsing for a couple of hours, canâ€™t find what I like and I take a break. Me taking that break is considered an abandoned browse. I was on the site viewing products but did not ultimately purchase anything. Now a standard campaign might say, hey Alex, we saw you were on our site, why donâ€™t you continue shopping? But with advanced data processing you could say something like, Hey Alex, how about you crush that first yoga session with these buttery soft lime green yoga pants? If I saw that, Iâ€™d be like, wow guys, you totally get me. Now you might be thinking, can a database query really ignite that kind of passion? The answer is yes. Because think about what went into generating that message. We use advanced data processing to analyze this personâ€™s web behavior, to understand what products theyâ€™re viewing and calculate the most expensive thing they looked at were lime green yoga pants. And we use those data points to provide a personalized and targeted message. Now weâ€™ve talked about how advanced data processing can help your marketing campaigns. Letâ€™s go and talk to you how we can build this audience in AEP. The first thing we need to understand is the data architecture. The data architecture refers to understanding what tables you have available to you and which ones you may need to build your audience and how theyâ€™re all connected together. So in our case, we need three key pieces of data. We need our browsing history, which would come from our analytics data. We need our product data, which will give us the price of the SKUs that our customers have been browsing. And we need our customer data. And customer data can come from an out of the box table called profile attributes, which is our unified view of the profile. So now that we know what tables we need to build on our query, before we actually go and build out the query, we need to understand the quality of the data that weâ€™ll be using. Right? So the analytics data is really the key here. It is whatâ€™s joining our profile to our product, to our web browsing behavior. So let me drill into the analytics schema, and we can take a look at how data quality is important. First Iâ€™ll talk about is this SKU field. This is what weâ€™re using to join our analytics data to product. And you may think because itâ€™s called SKU that it always has SKU. But youâ€™d be wrong. Right? So weâ€™ve seen this populated with device IDs or product IDs or any other non-SKU data. Just because itâ€™s called SKU, or just as any field is called whatever itâ€™s called, you cannot assume that the data contained within is quality data. Thankfully, data processing gives us some tools to mitigate these kind of issues. For example, if we knew we had different types of data coming into SKU, but we knew that valid SKU data was, letâ€™s say, alphanumeric, in data processing, in our query, we could enforce a rule saying, I only want to capture data that has an alphanumeric pattern. That way I maintain the quality of my query, which would result in a quality audience. Similarly, if I knew I had device and product data that I didnâ€™t want to include in my query, I could always write a criteria to exclude that information as well. So you always want to make sure we know what data is flowing into our datasets so we know that when we generate our audience, how reliable itâ€™s going to be. So talk about architecture, the data quality, and before we get into the actual demo, I want to talk about why we donâ€™t even need data processing. We talked about earlier that we can use it to compute attributes, which allows us to have a very dynamic way of segmenting and personalizing our messages. But outside of that, we donâ€™t have any ability in even segmentation destinations to calculate these data points on the fly. Thatâ€™s why itâ€™s important that we need data processing. Similarly, if we didnâ€™t have the feature of data processing, what would you have to do to get this kind of personalized information? You would have to take data out of AEP, manipulate it, massage it, and then bring it back in. And if youâ€™re talking about a couple of attributes here and there, maybe thatâ€™s not a big deal. But if youâ€™re talking about hundreds of attributes or hundreds of campaigns, that can greatly impact your project timeline. You have to ship data in and out constantly every time you want to build something new. And that goes for the marketer as well. If theyâ€™re in the platform and they know you have the data available, they donâ€™t want to have to worry about having to contact other teams to aggregate this data so they can build out their campaigns. So AEP provides you both the data repository and the tools you need to activate upon your data in the most efficient and reliable way possible. So having said that, letâ€™s talk about how we could actually build this audience in the tool. And the first thing weâ€™ll talk about is the actual query you want to build out. Let me open that up. So in AEP, there are two ways to execute a query. One is via the UI, and the other is via a command line prompt. I prefer command line prompts, so thatâ€™s what Iâ€™ll be demoing today. Letâ€™s talk about the actual query that weâ€™re looking to build out. This one has two main sections to it. We have a select statement, which is generating our abandoned browse audience. And then we have an insert statement, which is saving this audience into a custom table. Letâ€™s first focus on our select statement. So you can see here, weâ€™re collecting four key data points, customer ID, SKU, the time they abandoned, and the price of the SKU. Weâ€™re then connecting the three tables we talked about earlier, analytics, unified profile, and product. And weâ€™re then connecting analytics to profile, and then connecting analytics to product. And then youâ€™ll see here, we have two more conditions. This is an abandoned browse use case, so we want to avoid including any data containing order confirmation pages, because that would imply that theyâ€™ve actually gone ahead and purchased an actual IDLE. Next part on timestamp is analytics data is usually pretty huge. We always want to find ways to minimize the amount of data that weâ€™re crunching for performance reasons and just for a use case. We donâ€™t need people who abandoned browse like 20 days ago, 15 days ago. We want to be very prescriptive in these queries. In this case, weâ€™ve chosen an interval of four days. So it gives us the flexibility in our segmentation. We can choose anyone whoâ€™s abandoned browse in the past hour, two hours, two days, up to four days ago. This is a flexible rolling four-day window in which to choose our audiences. And then youâ€™ll see at the end, we have an ordering. So this ties back to our original example where weâ€™re capturing the most expensive SKU they browsed. So weâ€™re ordering our data by price from highest to lowest and choosing the highest priced SKU they were browsing. At the top, we have our insert statement. This is essentially saving our audience into a table called Summit Advanced Data Prep Dataset. This shows you that we can create schemas and tables not only for incoming data feeds or for streaming data, we can actually create a query that calculates computed attributes and save those attributes in its own custom table. And we can take that table, enable it for profile, and then use these data points, customer ID or SKU or abandoned timestamp or SKU price, use those attributes for segmentation and in personalization. Having said that, letâ€™s talk about how we can actually execute this query. So as I mentioned, I prefer the command line interface, and thatâ€™s what Iâ€™ll show you in todayâ€™s demo. So to execute it, we can go to the query screen and we go to the credentials tab. And this gives you the connection string you need to connect to the database. Now, if anyoneâ€™s a little worried about giving people access to connect to the production database, you can absolutely restrict access to this page. Okay, so Iâ€™m going to copy my connection string. I will open up a command prompt and then simply paste it. And now Iâ€™m obviously my token, so we can refresh our page and get a new token generated. And I will copy this and Iâ€™ll open this up and put this in there. Enter. And Iâ€™m now properly connected to the database. So now that Iâ€™m in, I can just simply copy paste my SQL statement and execute it. So Iâ€™ll go back to our query. Copy it. Put it in here. And hit enter. Now we can execute this query directly in this window and see the results. Now, queries are not always run in a bubble. In our case, letâ€™s say we want to schedule this to run every day. We can easily do that in the UI as well. Iâ€™ll go to under query still. I go under browse. I can click on create query. Enter my query that I want to schedule. Give it a name. Letâ€™s say Summit data processing. Itâ€™s safe. Now I have my query saved. Now that itâ€™s saved, I can click into it and I now have the ability to add a schedule to it. We talked about creating a daily. You have the ability to choose your frequency, start and end dates, and the data set where you want to save the output of your query. So in here, Iâ€™m going to choose a frequency of daily. Iâ€™m going to choose every one day. And maybe Iâ€™ll have this run until the end of the year. And then I want to save the output to my advanced data prep schema. And I click save. And save again. And Iâ€™m all set. So itâ€™s that easy to create and execute a query in the command prompt. And I also scheduled to run on your desired frequency. Daily, hourly, weekly, etc. Letâ€™s take a look at the resulting schema that we just talked about. Weâ€™re saving everything into a schema called advanced data prep. Youâ€™ll see in this schema, it matches the data points that we are querying in our SQL statement. The output of the SQL query will populate the data set with these four data points. And we can now use these in our segmentation and destinations. Youâ€™ll see also this is profile enabled, which is key. So we have our query. Itâ€™s scheduled. Weâ€™ve built our schema to accept the output of that query. Now letâ€™s see how we can actually use that audience in a segment. Letâ€™s go to segments. Band and browse. And letâ€™s take a look at the criteria. So in our segment window, thereâ€™s two main panels. Thereâ€™s the profile attribute panel. And then thereâ€™s the event panel at the bottom. Letâ€™s first start at the profile panel. You can see here, weâ€™re looking to only select customers who have abandoned in the past four days. You can see this abandoned timestamp is the attribute that we calculated in our query. You can also see under attributes that go under my tenant. Youâ€™ll see my abandoned browse attributes are right here. So you can see I can just use these directly in my segmentation criteria for any sort of logic I want to perform. In this case, I want to use the abandoned timestamp, make sure that the people Iâ€™m selecting have only abandoned in the past four days. And on top of that, I want to also exclude anybody thatâ€™s hit the order confirmation page in the past four days as well. So I have a clear cut audience of people who have abandoned browse without having made a purchase. Now you may be looking at this and be like, this looks super simple. This canâ€™t be real. But it is because thatâ€™s the beauty of data processing. Having the query, the query is doing all the hard work behind the scenes. So you can just focus on the data points you care about. Right. Some customers may have, you know, five, six, seven attributes that they always need to include on every segment, like emailable yes, no, or certain preferences. And there could be a number of static criteria that are always needed. One of the other advantages of data processing is that youâ€™re not limited just using it to compute attributes. You can include logic in your query that satisfies some of these data conditions you always have. You can include the email constraints or preference constraints in your query itself. So all that work is done on the back end. And so it greatly simplifies how you build out your segments. And so the last part is weâ€™re going to talk about how we can actually activate this data. Weâ€™ve now segmented. And that is via destinations.

So you can see here I have my segment that activated it to an S3 location. So here weâ€™ll look at what the attributes are for personalization. Right. So I have my destination configured. I have selected my abandoned browse segment. And now this is asking me what attributes do I want to send in the data feed that I create on the S3 location. And you can see here I am selecting all the computer attributes that I generated from my SQL query. Picking the customer ID, the SKU price, the SKU, and the abandoned timestamp. These can all now be used in personalization downstream. For example, if an email service provider can use these attributes to populate an email. If there was more metadata I want to include here, such as the category or the name of the item, I could simply add that to my query as data points and then use them here to export them into destination. So your data processing logic has many functions. You can use it for accommodating complex logic and for segmentation. You can use it for calculating various personalized attributes we use downstream. And using it to greatly simplify how you build out your segments. All right. So that brings us to the end of this demo. Weâ€™ve talked through why data processing is key. An example of having used it for an abandoned browse situation. And that how we want to build that query depends on our existing data architecture. Ensuring that the quality of the data meets our needs. And if it doesnâ€™t, how we can mitigate that. And how we can actually schedule this query to run on a certain frequency. And then use it downstream in segmentation and destinations for personalization. So thank you all for your time today. I hope this was informative.

SQL example

INSERT INTO summit_adv_data_prep_dataset
SELECT STRUCT(
customerId AS crmCustomerId, struct(sku AS sku, price AS sku_price, abandonTS AS abandonTS) AS abandonBrowse) AS _pfreportingonprod
FROM
(SELECT
B.personKey.sourceID AS customerId,
A.productListItems[0].SKU AS sku,
max(A.timestamp) AS abandonTS,
max(C._pfreportingonprod.price) AS price
FROM summit_adobe_analytics_dataset A,profile_attribute_14adf268_2a20_4dee_bee6_a6b0e34616a9 B,summit_product_dataset C
WHERE A._experience.analytics.customDimensions.eVars.eVar1 = B.personKey.sourceID
AND A.productListItems[0].sku = C._pfreportingonprod.sku
AND A.web.webpagedetails.URL NOT LIKE '%orderconfirmation%'
AND timestamp > current_date - interval '4 day'
GROUP BY customerId,sku
order by price desc)D;

NOTE

This video is an excerpt from the 51ºÚÁÏ²»´òìÈ Summit 2020 session .

recommendation-more-help

9051d869-e959-46c8-8c52-f0759cee3763