Find Data Lakehouse Developer Leads on GitHub

Capture Apache Iceberg, Delta Lake, Apache Hudi, and dbt engineers evaluating your data lakehouse tooling — captured from GitHub stars, keyword signals, and competitor repos.

Published: May 9, 2026Updated: May 9, 20268 min read

What Is a Data Lakehouse Engineer?

Data lakehouse engineers build and maintain the storage, catalog, and compute layers of modern data platforms. They work with open table formats (Apache Iceberg, Delta Lake, Apache Hudi), catalog services (Unity Catalog, Project Nessie, Polaris), compute engines (Spark, Flink, Trino, DuckDB), and transformation tools (dbt, SQLMesh). They have significant budget authority over data infrastructure decisions and are actively evaluating new tooling on GitHub.

GitHub Signals That Identify Data Lakehouse Engineers

  • New stars on apache/iceberg, delta-io/delta, apache/hudi — table format evaluators
  • Stars on unitycatalog/unitycatalog, projectnessie/nessie, apache/polaris — catalog evaluators
  • Stars on trinodb/trino, apache/flink, apache/spark — compute engine users
  • Stars on dbt-labs/dbt-core, TobikoData/sqlmesh, SDF-Labs/sdf — transformation engineers
  • Issues or PRs mentioning "Iceberg REST catalog", "table format migration", "partition evolution"
  • Keyword mentions: "data lakehouse", "open table format", "ACID transactions", "time travel queries", "schema evolution"
  • Stars on tabular-io/iceberg-python, apache/iceberg-go — language-specific SDK evaluators

Key Repos to Track for Lakehouse Signal Capture

Add these repos to your GitLeads tracked repositories to capture data lakehouse signals continuously:

  • apache/iceberg — the primary Apache Iceberg repo; stars signal format adoption
  • delta-io/delta — Delta Lake core; Databricks ecosystem signal
  • apache/hudi — Hudi format; AWS ecosystem signal
  • unitycatalog/unitycatalog — Databricks open-source catalog
  • projectnessie/nessie — Git-for-data catalog (Dremio ecosystem)
  • apache/polaris (incubating) — Snowflake-contributed Iceberg REST catalog
  • dbt-labs/dbt-core — the dominant transformation layer
  • TobikoData/sqlmesh — dbt alternative; evaluators are tech-forward data teams
  • apache/gravitino — Hortonworks/Cloudera metadata lake

Keyword Signals to Monitor in GitHub Issues and Code

  • "iceberg REST catalog" OR "iceberg catalog" — platform integration signal
  • "table format migration" OR "migrate to iceberg" — active migration project
  • "partition evolution" OR "schema evolution" — power user signal
  • "data lakehouse" OR "open lakehouse" — architecture evaluation
  • "ACID transactions" OR "merge-on-read" OR "copy-on-write" — format decision signal
  • "Unity Catalog" OR "HMS" OR "Glue catalog" — catalog evaluation signal
  • "dbt incremental" OR "dbt model" — active dbt engineering

What Data Lakehouse Engineers Buy

This audience controls or strongly influences decisions in:

  • Managed Iceberg table services (Tabular, Snowflake Open Catalog, AWS Glue Iceberg)
  • Lakehouse query engines (Trino Enterprise, Starburst Galaxy, Dremio Cloud)
  • Data catalog platforms (Atlan, DataHub, Alation, Collibra)
  • dbt Cloud — the managed version of dbt-core they're already using
  • ETL/ELT pipelines (Airbyte, Fivetran, dlt Hub)
  • Cloud storage optimization tools (Iceberg compaction, OPTIMIZE services)
  • Data observability platforms (Monte Carlo, Elementary, Bigeye)

Routing Lakehouse Leads to Your Sales Stack

  • Iceberg repo star + company email from data/cloud domain → HubSpot deal + data team AE
  • Unity Catalog or Nessie keyword → Salesforce account match — check if enterprise account
  • dbt-core star with public email → Clay enrichment + Smartlead sequence for dbt Cloud pitch
  • High-follower data engineer → Slack alert for DevRel partnership outreach
  • SQLMesh or SDF star → early-adopter signal; fast-track to founder sales call
GitLeads finds data lakehouse engineers evaluating Iceberg, Delta Lake, dbt, and catalog tools on GitHub — and pushes enriched profiles into HubSpot, Salesforce, Slack, Clay, and 12+ other tools. No email sending. Start free at [gitleads.app](https://gitleads.app). Related: [find Kafka developer leads](/blog/find-kafka-developer-leads), [find Postgres developer leads](/blog/find-postgres-developer-leads), [github signals for data analytics companies](/blog/github-signals-for-data-analytics-companies).

Want more like this? Get the weekly developer lead playbook.

No spam. 5 emails over 2 weeks. Unsubscribe anytime.

Related Articles

How to Find Leads on GitHub: The Complete Guide (2026)
10 min read
GitHub Leads vs LinkedIn Leads: When to Use Which (2026)
9 min read
GDPR Compliance for GitHub Lead Scraping: What You Must Know
8 min read