Introduction - a data fabric approach to governance and privacy

build an automated, integrated data governance and privacy layer across all the data in your enterprise.

Data fabric is an architectural approach that helps ensure quality data can be accessed by the right people at the right time.

In addition to providing a strong foundation for multicloud data integration, 360-degree customer intelligence and trustworthy AI, the data governance and privacy capability of a data fabric strengthens compliance with automated governance and privacy controls, while maintaining regulatory compliance no matter where data resides.

Strong governance makes the right, quality data easier to find for those who should have access to it, while allowing sensitive data to remain hidden unless appropriate. Having insights into your business and customers is a competitive advantage. The Forrester Analytics Business Technographics® Data And Analytics Survey, 2020, found that advanced insights-driven businesses are more likely to have a data governance strategy that involves defining, executing, training, and overseeing compliance than beginner and intermediate firms, to have an executive in charge of their data governance, and to use AI to crowdsource and embed data stewardship in everyday data engagements.

Strong privacy parameters help increase readiness for compliance and data protection anywhere, on-premises or across clouds. They allow businesses to understand and quickly apply industry-specific regulatory policies and governance rules on data wherever it resides.

In this guide, we’ll look at the most common governance and privacy challenges modern organizations face, the building blocks of an effective solution/approach, and the technology components you’ll need to build an automated, integrated data governance and privacy layer across all the data in your enterprise. We’ll also provide helpful resources such as a data governance and privacy trial.

Why establish automated data governance and privacy?

The building blocks of governance and privacy

integrate and improve data privacy, access, quality and traceability for all the data in an organization.

Ultimately, the goal of governance is knowing where data comes from, what it is, who can access it and when it should be retired. Several key technology building blocks exist to meet the need to integrate and improve data privacy, access, quality and traceability for all the data in an organization.

Let’s look at what you’ll need.

Data cataloging
The quality of your data determines how confidently you can act on insights. If low quality data goes into AI models, it could lead to inaccurate, noncompliant or discriminatory results. Getting the best insights means being able to access data that is fresh, clean and relevant, with a consistent taxonomy. A data catalog can help users easily find and use the right data with a rich and metadata-driven index of cataloged assets.

Automated metadata generation
Metadata tracks the origin, privacy level, age and potential uses of your data. Manually generating metadata is cumbersome, but with machine learning, data can be automatically tagged with metadata to mitigate human error and dark data. Automatic tagging of the metadata allows for policy enforcement at the point of access, so that more sensitive data can be used in a nonidentifiable and compliant way. In addition, metadata is used to establish a common vocabulary of business terms that provide context to data and to link data from different sources. This context adds semantic meaning to data so that it becomes more findable, usable and consistent within the organization, a key factor when seeking data for analytics and AI.

Automated governance
of data access and lineage Data lineage shows how data has been accessed and used and by whom. Knowing where data comes from is useful not only for compliance reporting but also for building trustworthy and explainable AI models. And it can be automated without complicating access. With restrictions built directly into access points, only the data users are authorized to access will be visible. Additionally, sensitive data can be dynamically masked so that models and data sets can be shared without exposing private data to unauthorized users. This clarity around what data can and can’t be used supports self-service data demands and allows organizations to be nimble in responding to line of business needs.

Data virtualization
Data virtualization connects data across all locations and makes the disparate data sources appear as a single database. This helps you ensure compliant access to the data through governed data access, regardless of where it lives, without movement. Using the single virtualized governed layer, user access to data is defined in one place instead of at each source, reducing complexity of access management.

Reporting and auditing
Enterprises must comply with a wide variety of changing regulations that differ according to geography, industry and data type. They need to be broken down into a catalog of requirements with a clear set of actions that businesses must take. Regulatory information should be automatically ingested, deduplicated, and applied to workflows. The secret to harmonizing all these data privacy and governance needs with business opportunity is aligning the technology components with a global data strategy and an open and holistic architecture.

By 2024, data fabric deployments will quadruple efficiency in data  utilization while cutting human-driven data management tasks in half.

Data fabric - a holistic approach

Data governance and privacy success story

Financial services: ING

ING is a Dutch bank with over 57,000 employees serving around 39.3 million customers, corporate clients and financial institutions in over 40 countries. To bring his vision of data governance to life at ING, Ferd Scheepers, ING’s Chief Architect, wanted to implement a data fabric approach in the company’s hybrid cloud environment. ING needed to govern its data in the cloud consistently with its on-premises environment. As the data leader, Scheepers had specific goals:

  • Empower ING’s data citizens with fast and simple access to governed data and toolsets
  • Ensure strong governance and privacy parameters across a complex global ecosystem
  • Comply with business policy and multi-jurisdiction regulations with changing requirements

ING created a data fabric solution to help implement a single corporate operating model and streamline data management and applications across all operational countries. It runs across an open hybrid cloud environment that adapts to ING’s multi-platform, heterogeneous landscape. Applying data virtualization across existing on-premises investments, it removes data silos, enabling just-in-time access to the right data across any cloud and on-premises, at the optimum cost, with the appropriate level of governance.

Using their data fabric, ING can provide a consistent user experience to increase collaboration, streamline application management, and optimize licensing and IT costs.

Consider these components

IBM Cloud Pak for Data

IBM Cloud Pak® for Data is a platform built specifically with a data fabric architecture in mind to predict outcomes faster and allow you to collect, organize and analyze your data, no matter where it may reside. The platform thus helps to improve productivity and reduce complexity by building a data fabric that connects siloed data distributed across a hybrid cloud landscape.

IBM Watson Knowledge Catalog

IBM Watson® Knowledge Catalog provides intelligent cataloguing, with automated metadata collection and policy management to ensure the details of a model are automatically collected and stored for maximum transparency and repeatability. It ensures that models are impartial, address bias, are explainable and adapt to changing model parameters.

IBM Watson Query

Applying sweeping governance rules across data lakes, databases, and warehouses is time consuming, and often leaves users with long delays to get access to the right data. Watson Query enforces governance policies when data is accessed across multiple sources, quickly providing data to your end applications through one view without manual changes, data movement or replication.