About LakeLogic

Contract-driven data engineering for the modern lakehouse.

What is LakeLogic?

LakeLogic is an open-source Python framework that lets data teams define, enforce, and monitor data contracts across every layer of the data stack — from raw ingestion to curated gold tables. Write your schema once in YAML, and LakeLogic validates, transforms, and materialises your data using the engine that fits: Polars, Pandas, DuckDB, Spark, Snowflake, or BigQuery.

Why we built it

Modern data teams manage hundreds of datasets across multiple engines, clouds, and formats. Without a shared contract layer, quality checks are ad-hoc, schema drift goes unnoticed, and bad data reaches production. LakeLogic gives every pipeline a single source of truth: a declarative contract that travels with the data.

Core principles

Contract-first — One YAML contract defines schema, quality rules, transformations, and lineage.
Engine-agnostic — Runs the same contract on Polars, Spark, DuckDB, Snowflake, or BigQuery.
Batteries included — pip install lakelogic gives you everything: engines, connectors, notifications.
Quarantine-by-default — Bad rows are isolated, not dropped. Reprocess them when upstream fixes land.
Open source — Apache-2.0 licensed. No vendor lock-in.

Project links

Contact

Reach the team at hello@lakelogic.org or open an issue on GitHub.