Energy-metered data science

Data science.
Energy-metered.

Every notebook cell, every model training run, every SQL query — measured in picojoules. See what your data pipeline actually costs.

Get Started Explore Features
The Problem

Your notebook doesn't know
what your model costs

ML training runs are energy black boxes. GPU hours show up on a bill weeks later. Traditional notebooks give you wall-clock time at best — nothing about the actual energy consumed per cell, per epoch, per query. You cannot optimize what you cannot measure.

Invisible GPU Costs

GPU hours are billed in bulk. You have no idea which training run, which hyperparameter sweep, or which data preprocessing step consumed the energy.

📈

No Cell-Level Metering

Traditional notebooks show execution time. They do not show energy. A 2-second cell can consume 100x more energy than another 2-second cell depending on the operation.

🌎

Compliance Blind Spot

CSRD Scope 3, ISO/IEC 21031 SCI, and NIH data sharing mandates require energy accounting. You cannot report what you never measured.

Languages

Your language. Metered.

First-class support for the languages data scientists actually use. Every execution metered in picojoules.

🐍

Python

NumPy, Pandas, PyTorch, scikit-learn

📊

R

tidyverse, ggplot2, Bioconductor

Julia

Flux.jl, DataFrames.jl, DifferentialEquations.jl

🔢

MATLAB

Signal processing, control systems, Simulink

🗃

SQL

JouleDB native, PostgreSQL wire protocol

🔗

GraphQL & Cypher

Knowledge graphs, graph analytics, traversals

Notebooks

Built-in notebook with
energy receipts per cell

Every cell execution produces an energy receipt. CPU time, memory, rows processed, and picojoules consumed — all inline, all automatic.

In [3] Python
import pandas as pd df = pd.read_csv("sensors.csv") result = df.groupby("region").agg({ "temperature": "mean", "humidity": "std", "pressure": "max" }) print(result)
14,203 pJ · 847.2 ms · 128.4 MB peak
// Energy receipt cell [3] pandas groupby+agg cpu 847.2 ms 14,203 pJ memory 128.4 MB peak alloc rows 2,847,391 processed result 12 rows 3 columns
ML Pipeline

Training → Inference → Deploy
Energy at every stage

Know the energy cost of every phase. Compare hyperparameter sweeps by picojoules, not just accuracy. Deploy models with energy budgets attached.

📚
Preprocess
2,410 pJ
🧠
Train
847,200 pJ
🔍
Evaluate
12,800 pJ
🚀
Deploy
187 pJ / req
train.py 847,200 pJ total
import torch model = TransformerModel(d_model=512, nhead=8) optimizer = torch.optim.AdamW(model.parameters()) for epoch in range(100): loss = model(batch) loss.backward() optimizer.step()
The Lift

Write in Python.
Migrate to Joule when performance matters.

Keep your existing Python workflows. When a cell becomes a bottleneck, lift it to Joule for up to 75x energy reduction — same logic, same results, fraction of the cost.

analysis.py 75.88x
import pandas as pd df = pd.read_csv("sensors.csv") result = df.groupby("region").agg({ "temperature": "mean", "humidity": "std", "pressure": "max" }) print(result)
14,203 pJ · Python + pandas
analysis.joule 1.00x
use data::{Frame, read_csv} let df = read_csv("sensors.csv") let result = df .group_by("region") .agg([ mean("temperature"), std("humidity"), max("pressure"), ])
187 pJ · Joule native

Same groupby + aggregate. Same 2.8M rows. 75.88x less energy.

SQL & Graph

Query your data natively

SQL, GraphQL, and Cypher cells run directly against JouleDB. No external database needed. Every query metered.

query.sql 42 pJ
SELECT region, AVG(temperature) AS avg_temp, STDDEV(humidity) AS std_hum, MAX(pressure) AS max_pres FROM sensors GROUP BY region; -- Energy: 42 pJ | 11ms | JouleDB native
Built For

From the lab to the enterprise

🔬

Research

Reproducible experiments with energy provenance

🧬

Biotech

Genomics, proteomics, drug discovery pipelines

💰

Finance

Quant modeling, risk analysis, market data

🌍

Climate Science

Earth systems modeling, satellite data, emissions tracking

🏥

Healthcare AI

Medical imaging, clinical NLP, patient data

National Labs

HPC workloads, simulation data, energy budgets

Compliance

Report what you measure

Energy metering at the cell level gives you the data you need for regulatory compliance — automatically.

ISO/IEC 21031 SCI

Software Carbon Intensity scoring per notebook, per pipeline, per deployment. Automatic SCI reports from energy receipts.

CSRD Scope 3

EU Corporate Sustainability Reporting Directive requires downstream compute energy disclosure. Data gives you the numbers.

NIH Data Sharing

NIH 2023 data sharing policy requires computational reproducibility. Energy receipts provide provenance beyond timestamps.

Get Started

One command. Every cell metered.

Install Data and start seeing the energy cost of your data science work.

Terminal
$ curl -fsSL https://data-lang.dev/install.sh | sh
Installation Guide All Features