Too many names for one customer or Master Data Management, part I

Eka Ponkratova
3 min readMar 6, 2022


He has been known by many names in the company. In one instance, it was John A Walubo, another record identified him as John Walubo. If you open another system and search by his national ID, you will find him as John Walubu. And it was not entirely clear how many duplicate records of John existed across all applications because some of John’s entries had some data points collected, e.g. date of birth and some data points missing, e.g. national ID, and another way around. Is it a familiar story?

What: Master Data (MD) and Master Data Management?

DAMA defines Master Data as “data about the business entities (e.g., employees, customers, products, financial structures, assets, and locations) that provide context for business transactions and analysis.” Does it mean that virtually all data is considered to be master data? MD is data about business entities that are core to business processes and operations. Thus, Master Data Management can be defined as a set of practices and tools to consolidate, cleanse, govern and share an organization's critical data about core business entities.


As per the DAMA-DMBOK2, the most common drivers for
a Master Data Management program are:
— managing data quality to deal with data gaps, duplicate data, and data inconsistencies across applications.
— managing the cost of data integration and reducing risks by allowing data reuse and sharing between different systems.
— meeting organizational data requirements by creating and sharing a master dataset for all departments.

How (architecture)?

There are three main architectural models for an MDM.

Inspired by*

How (Master Data Management tools)?

Master data can be grouped into data about parties, e.g. customer data; products and services data, such as data about product functions, features, etc.; financial structures data, such as ledger accounts, cost centers; and locations data, such as address and GPS coordinates. If in the past, there was a tool separation, depending on a domain, for example, MDM of customer data or MDM of product data — see image ‘Figure 1. Magic Quadrant for Master Data Management of Customer Data), multidomain MDM, capable of handling multiple applications, has grown in popularity — see Figure1. Magic Quadrant for Master Data management. Tibco, SAP, Informatica, and IBM remain in the quadrant year after year.

How (Master Data Management on AWS)?

As I’ve been working mostly with AWS, I got curious about vendors that listed their MDM-related products and services in AWS Marketplace.

Generally, AWS Marketplace offers two delivery ways with an MDM not being an exception:
-Amazon Machine Image (AMI) when a seller creates an AMI with its product installed so that a buyer can use it to create his EC2 instance and access it from his AWS account;
-Software as a Service (SaaS) when the seller hosts his software on AWS infrastructure. The buyer accesses the product in the software seller’s environment.

You will also see ‘Amazon SageMaker’ on the list where the seller develops a model and packages it in a Docker container while the buyer accesses the model from his AWS account.

Find the complete list at AWS Marketplace,

What to do if you are not ready to commit to an MDM program? In the next part, I will take a look at finding and de-duplicating John’s records using AWS Glue (read Part II here).

Other resources:

  1. Case studies from AWS:

2. Master Data Management course at Udemy ( (free)

3. Gartner’s Magic Quadrant for MDM (accessed on 3/6/2022) at

*A system of record

*A system of entry: a tool used to enter data



Eka Ponkratova

I’m a data consultant, interacting closely with you to get data to work for you