Databricks System Tables: The Observability Gap — What They Expose vs What You Actually Need for Cost Control

March 20, 2026 · 17 min read

Platform Engineering Team

Databricks System Tables: The Observability Gap

Databricks system tables look comprehensive on paper. Sixteen tables across ten schemas. Billing, compute, jobs, queries, lineage, audit. When Databricks deprecated Overwatch and pointed everyone to system tables, the message was clear: this is the future of observability.

But if you have ever tried to answer these questions using only system tables, you already know the gap:

Why is my job's GC time at 15%? (System tables do not track GC time.)
Which stage is spilling to disk? (System tables do not track per-stage spill.)
Which executor is the memory bottleneck? (System tables do not track executor-level JVM metrics.)
How many files does my Delta table have? Is it healthy? (System tables do not track Delta table physical metrics.)
What did my serverless job's infrastructure look like? (System tables do not populate node_timeline for serverless.)

This post documents exactly what Databricks system tables contain — column by column — and exactly what is missing. Every claim is verifiable against Databricks' own documentation.

What System Tables Actually Contain

Databricks exposes system tables under the system catalog in Unity Catalog. Here are the tables relevant to compute observability and cost control, with their exact schemas.

`system.compute.node_timeline` — Per-Node Hardware Metrics

This is the closest thing to infrastructure monitoring in system tables. It records OS-level metrics at per-node, per-minute granularity.

Column	Type	Description
`account_id`	string	Account ID
`workspace_id`	string	Workspace ID
`cluster_id`	string	Cluster ID
`instance_id`	string	Cloud instance ID (e.g., `i-1234a6c12a2681234`)
`start_time`	timestamp	Record start (UTC)
`end_time`	timestamp	Record end (UTC)
`driver`	boolean	Whether this is the driver node
`cpu_user_percent`	double	CPU time in userland (%)
`cpu_system_percent`	double	CPU time in kernel (%)
`cpu_wait_percent`	double	CPU time waiting for I/O (%)
`mem_used_percent`	double	OS memory used (%) — includes all processes
`mem_swap_percent`	double	Memory swap usage (%)
`network_sent_bytes`	bigint	Bytes sent over network
`network_received_bytes`	bigint	Bytes received from network
`disk_free_bytes_per_mount_point`	map	Free disk by mount point
`node_type`	string	Cloud instance type

Retention: 90 days. Scope: All-purpose and jobs compute only. Does NOT include serverless compute or SQL warehouses.

What this tells you: OS-level CPU, memory, network, and disk per node per minute.

What this does NOT tell you: Anything about Spark. No JVM heap. No GC time. No execution memory vs storage memory. No per-executor breakdown. The mem_used_percent includes all OS processes — it cannot distinguish between Spark execution memory, cached data, and background daemons.

`system.billing.usage` — DBU Consumption

The billing table records hourly DBU consumption with rich metadata about what consumed the DBUs.

Column	Type	Key Detail
`sku_name`	string	SKU name (e.g., `STANDARD_ALL_PURPOSE_COMPUTE`)
`usage_quantity`	decimal	DBUs consumed
`usage_start_time` / `usage_end_time`	timestamp	Always 1-hour windows
`usage_metadata.cluster_id`	string	Populated for classic compute. NULL for serverless.
`usage_metadata.node_type`	string	Instance type. NULL for serverless.
`usage_metadata.job_id`	string	For job compute and serverless jobs. NULL for all-purpose compute.
`usage_metadata.job_run_id`	string	Same scope as `job_id`
`usage_metadata.warehouse_id`	string	For SQL warehouses
`usage_metadata.notebook_id`	string	For serverless notebooks
`product_features.is_serverless`	boolean	Whether this is serverless usage
`identity_metadata.run_as`	string	User who ran the workload

Retention: Full account history. Granularity: Metered per-second, aggregated into hourly records.

Critical limitations:

No per-query cost. There is no statement_id foreign key. You cannot join billing to query.history to get exact DBU cost per query. The best you can do is proportional allocation within an hourly window — which is an approximation.
No cost attribution for all-purpose compute. When multiple users run notebooks on a shared cluster, usage_metadata.job_id is NULL. There is no way to attribute the hourly DBU cost to individual workloads.
Serverless has no infrastructure visibility. cluster_id and node_type are both NULL. You know how many DBUs were consumed but not what hardware ran your workload.

`system.lakeflow.job_run_timeline` — Job Execution Data

Column	Type	Description
`job_id`	string	Job ID
`run_id`	string	Run ID
`period_start_time` / `period_end_time`	timestamp	Run or hourly slice
`result_state`	string	SUCCEEDED, FAILED, CANCELLED, etc.
`setup_duration_seconds`	long	Time in setup phase
`queue_duration_seconds`	long	Time queued
`execution_duration_seconds`	long	Time executing
`cleanup_duration_seconds`	long	Time in cleanup
`trigger_type`	string	CRON, CONTINUOUS, ONETIME, etc.
`compute_ids`	array	Cluster IDs used

Retention: 365 days. Runs exceeding 1 hour are sliced into multiple rows aligned to clock-hour boundaries.

What this tells you: Job-level duration breakdown (setup, queue, execution, cleanup) and success/failure status.

What this does NOT tell you: Anything about Spark stages, tasks, shuffle, spill, or GC within the job. A 2-hour job that spent 90 minutes on a single skewed stage looks identical to a healthy 2-hour job.

`system.query.history` — SQL Query Metrics

This is the richest observability table, but it only covers SQL warehouses and serverless compute.

Column	Type	Description
`statement_id`	string	Unique query ID
`statement_text`	string	Full SQL text
`execution_status`	string	FINISHED, FAILED, CANCELED
`compute.type`	string	`WAREHOUSE` or `SERVERLESS_COMPUTE`
`total_duration_ms`	bigint	Total execution time
`execution_duration_ms`	bigint	Time executing
`compilation_duration_ms`	bigint	Time optimizing
`waiting_for_compute_duration_ms`	bigint	Time waiting for compute
`waiting_at_capacity_duration_ms`	bigint	Time queued
`total_task_duration_ms`	bigint	Sum of all Spark task durations
`read_bytes`	bigint	Total data read
`read_rows`	bigint	Total rows read
`read_files`	bigint	Files read after pruning
`pruned_files`	bigint	Files eliminated by pruning
`read_partitions`	bigint	Partitions read
`shuffle_read_bytes`	bigint	Shuffle data over network
`spilled_local_bytes`	bigint	Data spilled to disk
`written_bytes` / `written_rows` / `written_files`	bigint	Output metrics
`read_io_cache_percent`	int	IO cache hit rate
`from_result_cache`	boolean	Whether result was cached

Retention: 365 days. Scope: SQL warehouses and serverless compute only. Does NOT cover classic all-purpose or job clusters.

What this tells you: Query-level I/O, timing, shuffle reads, and spill — useful for identifying expensive queries.

What this does NOT tell you:

No GC time. The single most important metric for memory right-sizing is not captured.
No shuffle write bytes. Only shuffle reads are recorded — you cannot measure shuffle amplification.
No per-stage breakdown. A query that scans 10 GB but spills 5 GB on one stage looks like a single number. You cannot identify which operator caused the spill.
No peak memory per query. You cannot determine whether a query was close to OOM.
No execution plan. You cannot detect plan regressions programmatically. There is no system.query.profiles table — the Query Profile is UI-only.
No per-query DBU cost. There is no cost column. You must approximate by joining with hourly billing.

The Metrics That Are Completely Absent

Here is the systematic comparison. On the left: what Spark tracks internally via SparkListenerTaskEnd events. On the right: whether it appears in any system table.

Per-Executor JVM Metrics

Spark tracks these per executor via heartbeats and stage completion events:

Metric	Available in System Tables?
`JVMHeapMemory` (peak heap usage)	No
`JVMOffHeapMemory` (peak off-heap)	No
`MinorGCCount` / `MinorGCTime`	No
`MajorGCCount` / `MajorGCTime`	No
`OnHeapExecutionMemory` (peak)	No
`OffHeapExecutionMemory` (peak)	No
`OnHeapStorageMemory` (peak)	No

node_timeline provides mem_used_percent at the OS level — but this includes all processes, cannot distinguish heap from off-heap, and cannot separate Spark execution memory from storage memory. It is the wrong metric for memory right-sizing.

Per-Task Spark Metrics

Every SparkListenerTaskEnd event contains a TaskMetrics object with:

Metric	Available in System Tables?
`executorRunTime` (wall clock per task)	No
`executorCpuTime` (CPU time per task)	No
`jvmGCTime` (GC time per task)	No
`peakExecutionMemory`	No
`diskBytesSpilled`	Aggregate only in `query.history`
`memoryBytesSpilled`	No
`shuffleReadMetrics.remoteBytesRead`	Aggregate only in `query.history`
`shuffleReadMetrics.localBytesRead`	No
`shuffleReadMetrics.fetchWaitTime`	No
`shuffleWriteMetrics.bytesWritten`	No
`shuffleWriteMetrics.writeTime`	No
`inputMetrics.bytesRead`	Aggregate only in `query.history`
`resultSerializationTime`	No
`taskLocality` (PROCESS_LOCAL, NODE_LOCAL, etc.)	No

Every metric in this table is available in Spark event logs and the Spark UI. None of them appear in system tables. The entire TaskMetrics object from SparkListenerTaskEnd is absent from the system.compute namespace.

What This Means for Cost Control

Without these metrics, you cannot:

Diagnose GC pressure — you do not know if executors are spending 15% of runtime garbage collecting (the #1 signal to increase memory)
Detect data skew — you cannot see per-task duration distributions to find straggler tasks
Identify spill sources — you see aggregate spill per query but not which stage or operator caused it
Right-size executor memory — you do not know peak JVM heap usage, so you cannot calculate recommended_memory = peak_heap * 1.15
Distinguish I/O-bound from CPU-bound tasks — you cannot compute cpuTime / runTime per task
Measure shuffle efficiency — you have shuffle read bytes but not local vs remote ratio or fetch wait time

These are the exact metrics that drive right-sizing formulas. System tables give you the bill. They do not give you the data to reduce it.

This gap is less discussed but equally impactful. Databricks system tables provide zero visibility into Delta table physical health.

What `information_schema.TABLES` Contains

The INFORMATION_SCHEMA.TABLES view has 15 columns:

TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE,
IS_INSERTABLE_INTO, COMMIT_ACTION, TABLE_OWNER, COMMENT,
CREATED, CREATED_BY, LAST_ALTERED, LAST_ALTERED_BY,
DATA_SOURCE_FORMAT, STORAGE_PATH, STORAGE_SUB_DIRECTORY

Notice what is absent: no table_size, no num_files, no num_rows, no num_partitions, no avg_file_size. The entire physical layer is invisible.

What You Cannot Monitor from System Tables

Monitoring Need	Available via System Table?	Workaround
Current file count per table	No	`DESCRIBE DETAIL` per table
Current table size in bytes	No	`DESCRIBE DETAIL` per table
File size distribution	No	Parse `_delta_log` checkpoint Parquet
Per-partition file count	No	Query with `_metadata.file_path` per table
Small file detection	No	Compute `sizeInBytes / numFiles` per table
Tables needing OPTIMIZE	No	Custom logic comparing file counts to thresholds
Tables needing VACUUM	No	`ANALYZE TABLE COMPUTE STORAGE METRICS` (DBR 18+, not persisted)
Manual OPTIMIZE/VACUUM history	No	`DESCRIBE HISTORY` per table
Partition skew	No	Per-table `_metadata.file_path` queries
Z-ORDER effectiveness	No	No metrics exposed
Data skipping effectiveness	No	No metrics exposed

The only storage-related system table is system.storage.predictive_optimization_operations_history, which logs automated OPTIMIZE, VACUUM, and ANALYZE operations performed by Predictive Optimization. It does NOT log manual operations, and it does NOT show current table state — only what the automated service has done.

How You Actually Get Delta Table Health Today

You must run per-table commands individually:

-- File count and total size (single table, single point in time)
DESCRIBE DETAIL my_catalog.my_schema.my_table;
-- Returns: numFiles, sizeInBytes, partitionColumns, format, etc.

-- Operation history (single table)
DESCRIBE HISTORY my_catalog.my_schema.my_table;
-- Returns: version, timestamp, operation, operationMetrics
-- OPTIMIZE metrics: numAddedFiles, numRemovedFiles, p50FileSize, etc.

-- Storage breakdown (DBR 18+, NOT persisted)
ANALYZE TABLE my_catalog.my_schema.my_table COMPUTE STORAGE METRICS;
-- Returns: active_bytes, vacuumable_bytes, time_travel_bytes

None of these results are persisted in any system table. To monitor table health across a catalog with hundreds of tables, you must build a custom pipeline that iterates over every table, runs these commands, and stores the results yourself.

Compare This to Apache Iceberg

Iceberg provides SQL-queryable metadata tables for every table:

Iceberg Metadata Table	What It Exposes	Delta Equivalent
`table.files`	Every data file: path, size, row count, column-level min/max/null counts	None — must parse `_delta_log`
`table.partitions`	Per-partition: row count, file count, total size	None — must query `_metadata.file_path`
`table.snapshots`	All snapshots with added/removed files and rows	`DESCRIBE HISTORY` (per table, not a system table)
`table.manifests`	Manifest files with file counts, sizes, partition info	None
`table.all_data_files`	Files across ALL snapshots	None
`table.entries`	Manifest entries with status, snapshot, file info	None

On Iceberg, detecting small files across an entire catalog is a single query:

SELECT table_name, COUNT(*) as file_count,
       AVG(file_size_in_bytes) as avg_file_size
FROM catalog.db.my_table.files
GROUP BY table_name
HAVING AVG(file_size_in_bytes) < 134217728  -- under 128 MB

On Delta, the same task requires iterating over every table with DESCRIBE DETAIL, parsing the results, storing them in a custom table, and building alerting on top. There is no catalog-wide query.

The Serverless Gap: Even Less Visibility

When you move to serverless compute, the observability picture gets significantly worse.

System Tables on Serverless vs Classic

System Table	Classic Compute	Serverless
`node_timeline` (CPU, memory, disk)	Fully populated, per-node per-minute	Not populated at all
`compute.clusters` (config history)	Fully populated	Not populated
`billing.usage.cluster_id`	Populated	NULL
`billing.usage.node_type`	Populated (instance type)	NULL
`query.history`	SQL warehouses only	SQL warehouses + serverless notebooks/jobs
`lakeflow.job_run_timeline`	Populated	Populated

On serverless, node_timeline is completely empty. There are no user-visible nodes. You cannot see CPU utilization, memory pressure, or network I/O for serverless workloads — the data simply does not exist in system tables.

What Else Disappears on Serverless

Capability	Classic	Serverless
Spark UI	Full (stages, executors, DAGs, storage)	Query Profile only (operator-level DAG)
Spark event logs	Generated, exportable	Not generated
Executor logs	Accessible	Not accessible
Init scripts (custom monitoring)	Supported	Not supported
Ganglia / compute metrics UI	Available	Not available
Hard spending limit	Cluster policies, autoscale caps	Does not exist

The last point is critical: there is no mechanism to set a hard spending cap on serverless compute. Budget alerts only monitor with up to a 24-hour delay. Budget policies only tag. A runaway pipeline can accumulate unlimited DBU charges before anyone notices.

Per-Query Cost on Serverless

Calculating per-query cost on serverless requires joining system.query.history with system.billing.usage by timestamp window and warehouse ID. But billing records are hourly aggregates with no statement_id foreign key, and the documentation warns that "multiple records" may be "associated with a given serverless compute workload" due to the distributed architecture.

The result is an approximation at best. Exact per-query DBU attribution is not possible with current system tables.

The Overwatch Deprecation Problem

Before system tables, Databricks Labs maintained Overwatch — an open-source Scala project that parsed Spark event logs into queryable Delta tables. Overwatch provided the metrics that system tables do not.

What Overwatch Had That System Tables Do Not

Overwatch Gold Table	Key Metrics	System Table Equivalent
`sparktask_gold`	Per-task: shuffle bytes, spill bytes, GC time, serialization time, executor ID	None
`sparkexecutor_gold`	Per-executor: lifetime, removal reason, JVM stats	None
`sparkstage_gold`	Per-stage: runtime, accumulator lineage	None
`sparkjob_gold`	Per-Spark-job: stage IDs, job-to-notebook lineage	None
`jobRunCostPotentialFact`	Cluster utilization %, core-hours, cloud VM + DBU cost	Partial (`billing.usage` has DBU cost but not cloud VM cost)
`notebookCommands_gold`	Per-command: execution time, estimated cost	None

Overwatch is now officially deprecated. The GitHub repository has 134 open issues and no active development. Databricks directs all users to system tables — but as this post documents, system tables do not cover what Overwatch provided.

The gap is being filled by third-party vendors (Unravel Data, Sync Computing, Datadog) at additional cost. But the native platform observability has regressed.

The Full Gap Matrix

Metric	`node_timeline`	`billing.usage`	`query.history`	`lakeflow.*`	Event Logs	Spark UI
GC time per executor	No	No	No	No	Yes	Yes
Peak JVM heap per executor	No	No	No	No	Yes	Yes
Per-task shuffle read/write	No	No	Aggregate read only	No	Yes	Yes
Per-stage disk spill	No	No	Aggregate only	No	Yes	Yes
CPU time vs wall time per task	No	No	No	No	Yes	Yes
Peak execution memory	No	No	No	No	Yes	Yes
Task duration distribution (skew)	No	No	No	No	Yes	Yes
Executor count per job	No	No	No	No	Yes	Yes
OS CPU/memory per node	Yes	No	No	No	No	No
DBU cost per hour	No	Yes	No	No	No	No
Query I/O (bytes, rows, files)	No	No	Yes	No	No	Yes
Job duration breakdown	No	No	No	Yes	No	No
Delta table file count	No	No	No	No	No	No
Delta table health	No	No	No	No	No	No

The pattern is clear: system tables cover platform operations (billing, job scheduling, node hardware). Event logs and Spark UI cover engine performance (GC, shuffle, spill, skew, memory). Delta table health falls through both.

How Cazpian Handles This

On Cazpian, the observability gap described above does not exist.

Every Metric, Collected Automatically

When a Spark job or SQL query completes on Cazpian, every metric covered in this post is immediately available — no event log parsing, no custom SparkListener, no per-table DESCRIBE DETAIL iteration:

GC time per executor and per stage
Peak JVM heap memory against allocated memory
Per-task shuffle read/write with local vs remote breakdown
Per-stage disk spill — which stage, which operator, how much
CPU time vs wall time per task — I/O-bound vs compute-bound classification
Task duration distribution with automatic skew detection
Delta/Iceberg table health — file count, file size distribution, partition skew, compaction status
Exact cost per query and per job — not an hourly approximation

AI-Driven Recommendations

Cazpian's AI analyzes the collected metrics and surfaces actionable recommendations:

Right-sizing: Specific executor memory and core count changes based on actual peak JVM heap usage — not OS-level mem_used_percent
Skew remediation: Identifies skewed stages from task duration distributions and recommends salting, AQE tuning, or repartitioning
Spill elimination: Traces disk spill to the responsible stage and suggests the minimum memory increase
Table health: Identifies tables needing compaction, detects small file accumulation, and flags partition skew — across the entire catalog, not table by table

Full Visibility, Not a Black Box

On Databricks serverless, you lose node_timeline, Spark UI, event logs, and executor logs. The infrastructure is a black box.

On Cazpian, you get the same immediate startup and managed infrastructure — but with full visibility into what the infrastructure is doing. Instance types, executor counts, JVM metrics, and per-task breakdowns are all visible. You do not trade observability for convenience.

What You Can Do Today on Databricks

If you are on Databricks and need better observability, here are the available options. Note that most of these only work on classic all-purpose or job compute — serverless compute blocks event logs, Spark configs, and init scripts entirely.

1. Enable Spark Event Logs (Classic/Job Compute Only)

The most impactful single change — but only available on classic all-purpose and job clusters. Serverless compute does not generate Spark event logs. Configure cluster log delivery to write event logs to cloud storage, then parse them:

spark.eventLog.enabled                   true
spark.eventLog.dir                       s3a://my-bucket/spark-event-logs/
spark.eventLog.logStageExecutorMetrics   true

This gives you access to every metric in the "absent from system tables" list. You need a parser (PySpark or a History Server) to make the data queryable. If you are on serverless, this option is not available — you are limited to system.query.history aggregates.

2. Build a Delta Table Health Pipeline

Iterate over tables in information_schema.TABLES, run DESCRIBE DETAIL on each, and persist results to a metrics table:

from pyspark.sql.functions import current_timestamp

tables = spark.sql("""
  SELECT table_catalog, table_schema, table_name
  FROM system.information_schema.tables
  WHERE data_source_format = 'DELTA'
    AND table_type = 'MANAGED'
""").collect()

results = []
for t in tables:
    fqn = f"{t.table_catalog}.{t.table_schema}.{t.table_name}"
    try:
        detail = spark.sql(f"DESCRIBE DETAIL {fqn}").first()
        results.append((fqn, detail.numFiles, detail.sizeInBytes))
    except Exception:
        pass

health_df = spark.createDataFrame(
    results, ["table_name", "num_files", "size_bytes"]
).withColumn("snapshot_time", current_timestamp())

health_df.write.mode("append").saveAsTable("ops.delta_table_health")

Schedule this daily. Alert when size_bytes / num_files drops below 128 MB (small file problem) or when num_files exceeds your threshold.

3. Use Third-Party Tools

Unravel Data, Sync Computing, and Datadog all provide deeper Databricks observability than native system tables. They fill the gap at additional cost.

4. Accept the Limitations

For teams that primarily run SQL warehouse queries and only need query-level aggregate metrics, system.query.history may be sufficient. The gap mainly affects teams that need engine-level diagnostics — memory right-sizing, shuffle optimization, skew detection, and table health monitoring.

What's Next

This post documented what Databricks system tables contain and what they are missing. For the complete guide to collecting the metrics that system tables do not provide, see:

Spark Runtime Metrics Collection with DriverPlugin, ExecutorPlugin, and SparkListener — Building custom metrics pipelines that capture what system tables miss.
Iceberg Metrics Reporting — How Iceberg's ScanReport and CommitReport provide the table health observability that Delta system tables lack.

What System Tables Actually Contain​

system.compute.node_timeline — Per-Node Hardware Metrics​

system.billing.usage — DBU Consumption​

system.lakeflow.job_run_timeline — Job Execution Data​

system.query.history — SQL Query Metrics​

The Metrics That Are Completely Absent​

Per-Executor JVM Metrics​

Per-Task Spark Metrics​

What This Means for Cost Control​

The Delta Table Health Blind Spot​

What information_schema.TABLES Contains​

What You Cannot Monitor from System Tables​

How You Actually Get Delta Table Health Today​

Compare This to Apache Iceberg​

The Serverless Gap: Even Less Visibility​

System Tables on Serverless vs Classic​

What Else Disappears on Serverless​

Per-Query Cost on Serverless​

The Overwatch Deprecation Problem​

What Overwatch Had That System Tables Do Not​

The Full Gap Matrix​

How Cazpian Handles This​

Every Metric, Collected Automatically​

AI-Driven Recommendations​

Full Visibility, Not a Black Box​

What You Can Do Today on Databricks​

1. Enable Spark Event Logs (Classic/Job Compute Only)​

2. Build a Delta Table Health Pipeline​

3. Use Third-Party Tools​

4. Accept the Limitations​

What's Next​