About LakeLogic
Contract-driven data engineering for the modern lakehouse.
What is LakeLogic?
LakeLogic is an open-source Python framework that lets data teams define, enforce, and monitor data contracts across every layer of the data stack — from raw ingestion to curated gold tables. Write your schema once in YAML, and LakeLogic validates, transforms, and materialises your data using the engine that fits: Polars, Pandas, DuckDB, Spark, Snowflake, or BigQuery.
Why we built it
Modern data teams manage hundreds of datasets across multiple engines, clouds, and formats. Without a shared contract layer, quality checks are ad-hoc, schema drift goes unnoticed, and bad data reaches production. LakeLogic gives every pipeline a single source of truth: a declarative contract that travels with the data.
Core principles
- Contract-first — One YAML contract defines schema, quality rules, transformations, and lineage.
- Engine-agnostic — Runs the same contract on Polars, Spark, DuckDB, Snowflake, or BigQuery.
- Batteries included —
pip install lakelogicgives you everything: engines, connectors, notifications. - Quarantine-by-default — Bad rows are isolated, not dropped. Reprocess them when upstream fixes land.
- Open source — Apache-2.0 licensed. No vendor lock-in.
Project links
Contact
Reach the team at hello@lakelogic.org or open an issue on GitHub.