Pentaho Data Catalog Analytics

What your data platform already knows, made visible.

A working demonstration of the Pentaho platform: 35 analytical dashboards — observability, governance, lineage, cost, and data quality — built on live Pentaho Data Catalog metadata.

35
Live dashboards
CDA · CDF · CDE
Pentaho-native
16.7 TB
Lineage modeled
576
Glossary terms
The dashboard suite

Every screen below is a real, running Pentaho dashboard over the data catalog — not mockups. Each exists in two builds you can switch between: Custom (a self-contained HTML dashboard over Pentaho CDA) and Framework (a true Pentaho CDF dashboard with CCC charts), so the same insight is delivered the lightweight way and the fully platform-native way.

Observability across the estate Governance & sensitivity Lineage & data movement Cost & sustainability Data quality & key discovery Cross-dashboard drill-through

Executive

Unified Executive Scorecard
Unified Executive Scorecard Custom
Every domain's headline KPI on one board, with trend sparklines and drill-through into each dashboard.
Executive Scorecard
Executive Scorecard Framework · CDF
The same board built natively on the Pentaho CDF framework (CCC charts) — toggle Custom ⇄ Framework in the header.

Observability

Estate Observability Command Center
Estate Observability Command Center Custom
Single pane of glass: catalog health, governance, sensitivity, freshness, completeness — every panel drills into its domain.
Observability Command Center
Observability Command Center Framework · CDF
The single-pane-of-glass rebuilt natively on Pentaho CDF — KPIs, CCC charts, and a source scorecard.
Catalog Observability
Catalog Observability Framework · CDF
True Pentaho CDF dashboard — CCC charts and CDF filters over CDA.
Data Freshness & Staleness
Data Freshness & Staleness Framework · CDF
CDF twin — scan-freshness pie, modified-age and stale-by-source bars.
Catalog Growth & Discovery
Catalog Growth & Discovery Framework · CDF
CDF twin — cumulative growth line, assets-by-source pie, per-month discovery bars, and recently-discovered assets.
Application & Access Reach
Application & Access Reach Framework · CDF
CDF twin — which applications touch the catalog: top apps by access, type and source breakdowns, and a reach-detail table across 207 access events / 46.6 TB.
Catalog Adoption
Catalog Adoption Framework · CDF
CDF twin — how actively the catalog is classified, governed and used: adoption coverage, top apps & policies, and classified assets by source.
Pipeline & Job Observability
Pipeline & Job Observability Framework · CDF
CDF twin — DataOps reliability: 8.2K runs at 92% success, success & volume trends, event lifecycle, by-integration health, and the top failing jobs.

Governance & Privacy

Sensitive Data Domains
Sensitive Data Domains Custom
Governance risk map: which business domains carry HIGH/MEDIUM sensitive data, as a domain × sensitivity heatmap.
Sensitive Data Domains
Sensitive Data Domains Framework · CDF
CDF twin rendering the cross-tab as a stacked CCC bar — same data, native framework.
Policy & Governance Coverage
Policy & Governance Coverage Custom
Governed vs ungoverned assets, policy assignments, and coverage trended over time.
Policy & Governance Coverage
Policy & Governance Coverage Framework · CDF
CDF twin — coverage pie, governed-by-source bars, policy types, and a 2-series coverage-over-time line.
Sensitive Data & Privacy
Sensitive Data & Privacy Framework · CDF
CDF twin — PII/classification reach, sensitivity scale, exposure by source, and a privacy term × source stacked bar over 625 classified assets.
Sensitive Data & Compliance Radar
Sensitive Data & Compliance Radar Framework · CDF
CDF twin — source-sensitivity→destination-governance stacked bar, sensitivity mix, bytes by sensitivity, restricted-flow risk, and a 2-series cross-boundary movement trend over 7.4K flows.

Data Quality & Structure

Data Quality & Metadata Completeness
Data Quality & Metadata Completeness Custom
Completeness distribution, most-missing attributes, and an actionable lowest-completeness remediation list.
Data Quality & Metadata Completeness
Data Quality & Metadata Completeness Framework · CDF
CDF twin — completeness & profiling pies, most-missing attributes, and a remediation table.
Column Health & Key Discovery
Column Health & Key Discovery Custom
Key-candidate detection (cardinality ≈ rows), dead/constant columns, and type mix across 10K+ columns.
Column Health & Key Discovery
Column Health & Key Discovery Framework · CDF
CDF twin — health-mix pie, type bars, and key/dead-column tables on the framework.
Column Profiling & Statistics
Column Profiling & Statistics Framework · CDF
CDF twin — null-% and uniqueness distribution pies, by-source bars, and highest-cardinality columns over 2.9K profiled columns.
Schema & Structure Explorer
Schema & Structure Explorer Framework · CDF
CDF twin — catalog composition, column data types, key coverage, and the largest schemas across 754 tables / 11.5K columns.
Document & Unstructured Insights
Document & Unstructured Insights Framework · CDF
CDF twin — files by extension, file assets by source, scan-volume trend, and an extension-detail table across 3.1K file assets.

Storage, Cost & Lineage

Storage Footprint & Capacity
Storage Footprint & Capacity Custom
TB by source and type, top-heavy objects, and a cumulative storage-growth trend.
Storage Footprint & Capacity
Storage Footprint & Capacity Framework · CDF
CDF twin — TB by source, structured/unstructured, size distribution, and cumulative growth.
Cost Optimization & Sustainability
Cost Optimization & Sustainability Custom
Monthly cost by source and sensitivity, reclaimable spend, and CO₂e footprint.
Cost Optimization & Sustainability
Cost Optimization & Sustainability Framework · CDF
CDF twin — cost by source & sensitivity, reclaimable spend, and CO2e.
Redundancy & Duplicate Data
Redundancy & Duplicate Data Framework · CDF
CDF twin — 233 TB reclaimable storage / $27.5K-a-month: reclaimable by source & sensitivity, duplicate assets, and top duplicate objects.
Data Movement Observability
Data Movement Observability Framework · CDF
Lineage throughput and pipeline reliability (16.7 TB moved) on the CDF framework.
Data Integration Health
Data Integration Health Custom
Platform-movement matrix, throughput trend, success rate, and top cross-platform flows.
Data Integration Health
Data Integration Health Framework · CDF
Platform-movement stacked bar, throughput trend, success rate, and top cross-platform flows on the CDF framework.

Glossary & Stewardship

Glossary Hierarchy & Term Stewardship
Glossary Hierarchy & Term Stewardship Custom
Term adoption gap (576 defined vs in-use), hierarchy, types, and reach across 23 business glossaries.
Glossary Hierarchy & Term Stewardship
Glossary Hierarchy & Term Stewardship Framework · CDF
CDF twin — adoption pie, glossary/type/depth bars and reach on the framework.
Ownership & Stewardship Gaps
Ownership & Stewardship Gaps Framework · CDF
CDF twin — owner-coverage pie, unowned backlog by source, governance coverage, and a per-source stewardship scorecard across 14K assets.
Business Glossary & Term Reach
Business Glossary & Term Reach Framework · CDF
CDF twin — term reach by glossary, classification coverage, terms-per-glossary, and top terms across 161 in-use terms / 12 business glossaries.

How it's built — on the Pentaho platform

One metadata warehouse, three Pentaho delivery styles, fully interactive.

CDA — the data layer

Every dashboard reads Pentaho CDA queries over a managed JDBC connection to the catalog warehouse. One governed data layer, many front-ends.

CDF & CDE — the framework

The Framework dashboards are true Pentaho CDF (CCC charts) and authored CDE (.wcdf/.cdfde) — editable in the Pentaho CDE designer.

Interactive by design

Cascading filters, light/dark, and click-to-drill that carries the selected filters from one dashboard into the next — the platform connecting the story end to end.