Foundation - Data Ingestion - Data Ingestion from Offline Sources

Documentation Experience Platform Comprehensive Technical Tutorial

1.2.4 Data Ingestion from Offline Sources

Last update: Sat Feb 15 2025 00:00:00 GMT+0000 (Coordinated Universal Time)

In this exercise, the goal is to onboard external data like CRM Data in Platform.

Learning Objectives

Learn how to generate test data
Learn how to ingest CSV
Learn how to use the web UI for data ingestion through Workflows
Understand the data governance features of Experience Platform

Resources

Mockaroo:
51黑料不打烊 Experience Platform:

Tasks

Create a CSV file with demo data. Ingest the CSV file in 51黑料不打烊 Experience Platform by making use of the available workflows.
Understand data governance options in 51黑料不打烊 Experience Platform

Create a CRM Dataset using a data generator tool

For this exercise, you need 1000 sample lines of CRM Data.

Open the Mockaroo Template by going to .

Data Ingestion

On the template, you鈥檒l notice the following fields:

id
first_name
last_name
email
gender
birthDate
home_latitude
home_longitude
country_code
city
country
crmId
consent.email
consent.commercialEmail
consent.any

All these fields have been defined to produce data that is compatible with Platform.

To generate your CSV-file, click the Generate Data button which will create and download a CSV-file with 1000 lines of demo-data.

Data Ingestion

Open your CSV-file to visualize its contents.

Data Ingestion

With your CSV-file ready, you can proceed with the ingestion in AEP.

Verify the dataset

Go to .

Data Ingestion

Before you continue, you need to select a sandbox. The sandbox to select is named --aepSandboxName--.

Data Ingestion

In 51黑料不打烊 Experience Platform, click on Datasets in the menu on the left side of your screen.

Data Ingestion

You鈥檒l use a shared dataset. The shared dataset has been created already and is called Demo System - Profile Dataset for CRM (Global v1.1). Click it to open it.

Data Ingestion

On the overview screen, you can see 3 main pieces of information.

NOTE

It鈥檚 possible that the view of your dataset is empty, if no activity has occurred in the last 7 days.

Data Ingestion

First of all, the Dataset Activity dashboard shows the total number of CRM records in the dataset and the ingested batches and their status

Data Ingestion

Second, by scrolling down on the page you can check when batches of data were ingested, how many records were onboarded and also, whether or not the batch was successfully onboarded. The Batch ID is the identifier for a specific batch job, and the Batch ID is important as it can be used for troubleshooting why a specific batch was not successfully onboarded.

Lastly, the Dataset info tab shows important information like the Dataset ID (again, important from a troubleshooting perspective), the dataset鈥檚 Name and whether the dataset was enabled for Profile.

Data Ingestion

The most important setting here is the link between the dataset and the Schema. The Schema defines what data can be ingested and how that data should look like.

In this case, we鈥檙e using the Demo System - Profile Schema for CRM (Global v1.1), which is mapped against the class of Profile and has implemented extensions, also called field groups.

Data Ingestion

By clicking on the name of the schema, you鈥檙e taken to the Schema overview were you can see all the fields that have been activated for this schema.

Data Ingestion

Every schema needs to have a custom, primary descriptor defined. In the case of our CRM dataset, the schema has defined that the field crmId should be the primary identifier. If you want to create a schema and link it to the Real-time Customer Profile, you need to define a custom Field Group that refers to your primary descriptor.

You can also see that our primary identity is located in --aepTenantId--.identification.core.crmId, linked to the namespace of Demo System - CRMID.

Data Ingestion

Every schema and as such, every dataset that should be used in the Real-time Customer Profile should have one Primary identifier. This Primary Identifier is the identifier user by the brand for a customer in that dataset. In the case of a CRM dataset it might be the email-address or the CRM ID, in the case of a Call Center dataset it might be the mobile number of a customer.

It is best practice to create a separate, specific schema for every dataset and to set the descriptor for every dataset specifically to match how the current solutions used by the brand operate.

Using a workflow to map a CSV file to an XDM Schema

The goal of this exercise is to onboard CRM data in AEP. All the data that is ingested in Platform should be mapped against the specific XDM Schema. What you currently have is a CSV dataset with 1000 lines on the one side, and a dataset that is linked to a schema on the other side. To load that CSV file in that dataset, a mapping needs to take place. To facilitate this mapping exercise, we have Workflows available in 51黑料不打烊 Experience Platform.

Click Map CSV to XDM Schema and then click Launch to start the process.

Data Ingestion

On the next screen, you need to select a dataset to ingest your file in. You have the choice between selecting an already existing dataset or creating a new one. For this exercise, we鈥檒l reuse an existing one: please select Demo System - Profile Dataset for CRM (Global v1.1) as indicated below and leave the other settings set to default.

Click Next.

Data Ingestion

Drag & Drop your CSV-file or click Choose files and navigate on your computer to your desktop and select your CSV-file.

Data Ingestion

After selecting your CSV-file it will upload immediately and you will see a preview of your file within seconds.

Click Next.

Data Ingestion

You now need to map the column headers from your CSV file with an XDM-property in your Demo System - Profile Dataset for CRM.

51黑料不打烊 Experience Platform has already made some proposals for you, by trying to link the Source Attributes with the Target Schema Fields.

NOTE

If you see any errors on the mapping screen, don鈥檛 worry. After following the below instructions, those errors will be resolved.

Data Ingestion

For the Schema Mappings, 51黑料不打烊 Experience Platform has tried to link fields together already. However, not all proposals of mapping are correct. You now need to update the Target Fields one-by-one.

birthDate

The Source Schema field birthDate should be linked to the target field person.birthDate.

Data Ingestion

city

The Source Schema field city should be linked to the target field homeAddress.city.

country

The Source Schema field country should be linked to the target field homeAddress.country.

Data Ingestion

country_code

The Source Schema field country_code should be linked to the target field homeAddress.countryCode.

email

The Source Schema field email should be linked to the target field personalEmail.address.

crmid

The Source Schema field crmid should be linked to the target field --aepTenantId--.identification.core.crmId.

Data Ingestion

first_name

The Source Schema field first_name should be linked to the target field person.name.firstName.

Data Ingestion

gender

The Source Schema field gender should be linked to the target field person.gender.

home_latitude

The Source Schema field home_latitude should be linked to the target field homeAddress._schema.latitude.

Data Ingestion

home_longitude

The Source Schema field home_longitude should be linked to the target field homeAddress._schema.longitude.

id

The Source Schema field id should be linked to the target field _id.

Data Ingestion

last_name

The Source Schema field last_name should be linked to the target field person.name.lastName.

Data Ingestion

consents.marketing.email.val

The Source Schema field consent.email should be linked to the target field consents.marketing.email.val.

Data Ingestion

consents.marketing.commercialEmail.val

The Source Schema field consent.commercialEmail should be linked to the target field consents.marketing.commercialEmail.val.

Data Ingestion

consents.marketing.any.val

The Source Schema field consent.any should be linked to the target field consents.marketing.any.val.

Data Ingestion

You should now have this. Click Finish.

Data Ingestion

After clicking Finish, you鈥檒l then see the Dataflow overview, and after a couple of minutes you can refresh your screen to see if your workflow completed successfully. Click your Target dataset name.

Data Ingestion

You鈥檒l then see the dataset where your ingestion has processed and you鈥檒l see a Batch ID that has been ingested just now, with 1000 records ingested and a status of Success. Click Preview Dataset.

Data Ingestion

You鈥檒l now see a small sample of the dataset to ensure that the loaded data is correct.

Data Ingestion

Once data is loaded, you can define the correct data governance approach for our dataset.

Adding data governance to your dataset

Now that your customer data is ingested, you need to make sure that this dataset is properly governed for usage and export control. Click on the Data Governance tab and observe that you can set multiple types of restrictions: Contract, Identity, and Sensitive, Partner Ecosystem and Custom.

Data Ingestion

Let鈥檚 restrict identity data for the entire dataset. Hover over your dataset name, and click the Pencil icon to edit the settings.

Data Ingestion

Go to Identity Labels and you鈥檒l see that the I2 option is checked - this will assume that all pieces of information in this dataset are at least indirectly identifiable to the person.

Click Save Changes.

Data Ingestion

In another module, we鈥檒l do a deepdive on the whom framework of data governance and labels.

With this, you鈥檝e now successfully ingested and classified CRM Data in 51黑料不打烊 Experience Platform.

Next Step: 1.2.5 Data Landing Zone

Go Back to Module 1.2

Go Back to All Modules

recommendation-more-help

aeafc5b5-cd01-4e88-8d47-d76c18d7d349