You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -44,7 +44,7 @@ The fixed structure must be technology-agnostic. The first fields of teh fixed s
44
44
*`Status: [Option[String]]` this is an enum representing the status of this version of the Data Product. Allowed values are: `[Draft|Published|Retired]`. This is a metadata that communicates the overall status of the Data Product but is not reflected to the actual deployment status.
45
45
*`Maturity: [Option[String]]` this is an enum to let the consumer understand if it is a tactical solution or not. It is really useful during migration from Data Warehouse or Data Lake. Allowed values are: `[Tactical|Strategic]`.
46
46
*`Billing: [Option[Yaml]]` this is a free form key-value area where is possible to put information useful for resource tagging and billing.
47
-
*`Tags: [Array[Yaml]]` Tag labels at DP level ( please refer to OpenMetadata https://docs.open-metadata.org/metadata-standard/schemas/types/taglabel).
47
+
*`Tags: [Array[Yaml]]` Tag labels at DP level ( please refer to [OpenMetadata documentation](https://docs.open-metadata.org/v1.0.0/main-concepts/metadata-standard/schemas/type/taglabel)).
48
48
*`Specific: [Yaml]` this is a custom section where we can put all the information strictly related to a specific execution environment. It can also refer to an additional file. At this level we also embed all the information to provision the general infrastructure (resource groups, networking, etc.) needed for a specific Data Product. For example if a company decides to create a ResourceGroup for each data product and have a subscription reference for each domain and environment, it will be specified at this level. Also, it is recommended to put general security here, Azure Policy or IAM policies, VPC/Vnet, Subnet. This will be filled merging data defined at common level with values defined specifically for the selected environment.
49
49
50
50
The **unique identifier** of a Data Product is the concatenation of Domain, Name and Version. So we will refer to the `DP_UK` as a URN which ends in the following way: `$DPDomain:$DPName:$DPMajorVersion`.
@@ -75,7 +75,7 @@ Constraints:
75
75
*`StartDate: [Optional[String]]` the first business date present in the dataset, leave it empty for events, or we can use some standard semantic like: "-7D, -1Y".
76
76
*`ProcessDescription: [Option[String]]` what is the underlying process that contributes to generate the data exposed by this output port.
77
77
*`DataContract: [Yaml]`: In case something is going to change in this section, it represents a breaking change because the producer is breaking the contract, this will require to create a new version of the data product to keep backward compatibility
78
-
*`Schema: [Array[Yaml]]` when it comes to describe a schema we propose to leverage OpenMetadata specification: Ref https://docs.open-metadata.org/metadata-standard/schemas/entities/table#column. Each column can have a tag array, and you can choose between simples LabelTags, ClassificationTags or DescriptiveTags. Here an example of classification Tag https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/data/tags/piiTags.json.
78
+
*`Schema: [Array[Yaml]]` when it comes to describe a schema we propose to leverage [OpenMetadata specification](https://docs.open-metadata.org/v1.0.0/main-concepts/metadata-standard/schemas/entity/data/table#definitions). Each column can have a tag array, and you can choose between simples LabelTags, ClassificationTags or DescriptiveTags. Here an example of classification Tag https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-service/src/main/resources/json/data/tags/piiTags.json.
79
79
*`SLA: [Yaml]` Service Level Agreement, describe the quality of data delivery and the output port in general. It represents the producer's overall promise to the consumers.
80
80
*`IntervalOfChange: [Option[String]]` how often changes in the data are reflected.
81
81
*`Timeliness: [Option[String]]` the skew between the time that a business fact occurs and when it becomes visibile in the data.
@@ -92,8 +92,8 @@ Constraints:
92
92
*`Limitations: [Option[String]]` If any limitation is present it must be made super clear to the consumers.
93
93
*`LifeCycle: [Option[String]]` Describe how the data will be historicized and how and when it will be deleted.
94
94
*`Confidentiality: [Option[String]]` Describe what a consumer should do to keep the information confidential, how to process and store it. Permission to share or report it.
95
-
*`Tags: [Array[Yaml]]` Tag labels at OutputPort level, here we can have security classification for example (please refer to OpenMetadata https://docs.open-metadata.org/metadata-standard/schemas/types/taglabel).
96
-
*`SampleData: [Option[Yaml]]` provides a sample data of your Output Port (please refer to OpenMetadata specification: https://docs.open-metadata.org/metadata-standard/schemas/entities/table#tabledata).
95
+
*`Tags: [Array[Yaml]]` Tag labels at OutputPort level, here we can have security classification for example (please refer to [OpenMetadata documentation](https://docs.open-metadata.org/v1.0.0/main-concepts/metadata-standard/schemas/type/taglabel)).
96
+
*`SampleData: [Option[Yaml]]` provides a sample data of your Output Port (please refer to [OpenMetadata specification](https://docs.open-metadata.org/v1.0.0/main-concepts/metadata-standard/schemas/entity/data/table#properties)).
97
97
*`SemanticLinking: [Option[Yaml]]` here we can express semantic relationships between this output port and other outputports (also coming from other domains and data products). For example, we could say that column "customerId" of our SQL Output Port references the column "id" of the SQL Output Port of the "Customer" Data Product.
98
98
*`Specific: [Yaml]` this is a custom section where we must put all the information strictly related to a specific technology or dependent from a standard/policy defined in the federated governance.
99
99
@@ -120,7 +120,7 @@ Constraints:
120
120
*`Technology: [Option[String]]` represents which technology is used to define the workload, like: Spark, Flink, pySpark, etc. The underlying technology is useful to understand better how the workload process data.
121
121
*`WorkloadType: [Option[String]]` explains what type of workload is: Ingestion ETL, Streaming, Internal Process, etc.
122
122
*`ConnectionType: [Option[String]]` an enum with allowed values: `[HouseKeeping|DataPipeline]`; `Housekeeping` is for all the workloads that are acting on internal data without any external dependency. `DataPipeline` instead is for workloads that are reading from outputport of other DP or external systems.
123
-
*`Tags: [Array[Yaml]]` Tag labels at Workload level ( please refer to OpenMetadata https://docs.open-metadata.org/metadata-standard/schemas/types/taglabel).
123
+
*`Tags: [Array[Yaml]]` Tag labels at Workload level ( please refer to [OpenMetadata documentation](https://docs.open-metadata.org/v1.0.0/main-concepts/metadata-standard/schemas/type/taglabel)).
124
124
*`ReadsFrom: [Array[String]]` This is filled only for `DataPipeline` workloads, and it represents the list of Output Ports or external systems that the workload uses as input. Output Ports are identified with `DP_UK:$OutputPortName`, while external systems will be defined by a URN in the form `urn:dmb:ex:$SystemName`. This filed can be elaborated more in the future and create a more semantic struct.
125
125
Constraints:
126
126
* This array will only contain Output Port IDs and/or external systems identifiers.
@@ -146,7 +146,7 @@ Constraints:
146
146
*`Platform: [Option[String]]` represents the vendor: Azure, GCP, AWS, CDP on AWS, etc. It is a free field, but it is useful to understand better the platform where the component will be running.
147
147
*`Technology: [Option[String]]` represents which technology is used to define the storage area, like: S3, Kafka, Athena, etc. The underlying technology is useful to understand better how the data is internally stored.
148
148
*`StorageType: [Option[String]]` the specific type of storage: Files, SQL, Events, etc.
149
-
*`Tags: [Array[Yaml]]` Tag labels at Storage area level ( please refer to OpenMetadata https://docs.open-metadata.org/metadata-standard/schemas/types/taglabel).
149
+
*`Tags: [Array[Yaml]]` Tag labels at Storage area level ( please refer to [OpenMetadata documentation](https://docs.open-metadata.org/v1.0.0/main-concepts/metadata-standard/schemas/type/taglabel)).
150
150
*`Specific: [Yaml]` this is a custom section where we can put all the information strictly related to a specific technology or dependent from a standard/policy defined in the federated governance.
151
151
152
152
@@ -161,7 +161,7 @@ Anyway is good to formalize what kind of information should be included and veri
161
161
*`Description: [String]` detailed explanation about what this observability is exposing
162
162
*`Endpoint: [URL]` this is the API endpoint that will expose the observability for each OutputPort
163
163
*`Completeness: [Yaml]` degree of availability of all the necessary information along the entire history
164
-
*`DataProfiling: [Yaml]` volume, distribution of volume over time, range of values, column values distribution and other statistics. Please refer to OpenMetadata to get our default implementation https://docs.open-metadata.org/openmetadata/schemas/entities/table#tableprofile. Keep in mind that this is the kind of standard that a company need to set based on its needs.
164
+
*`DataProfiling: [Yaml]` volume, distribution of volume over time, range of values, column values distribution and other statistics. Please refer to [OpenMetadata documentation](https://docs.open-metadata.org/v1.0.0/main-concepts/metadata-standard/schemas/type/taglabel) to get our default implementation. Keep in mind that this is the kind of standard that a company need to set based on its needs.
165
165
*`Freshness: [Yaml]`
166
166
*`Availability: [Yaml]`
167
167
*`DataQuality: [Yaml]` describe data quality rules will be applied to the data, using the format you prefer.
0 commit comments