Data-platform programmes, not BI projects

Modern data platform architecture overview

Most organisations call this "data analysis," but the failure mode is rarely the analysis itself. It's that the underlying data platform was never built to support repeatable, governed, observable analytics. We don't run BI projects. We run data-platform programmes — warehouse migrations, lakehouse adoption, governance, lineage — that give analysts a foundation worth their time.

For Australian engagements, data lakes, warehouses and analytics pipelines default to AWS Sydney (ap-southeast-2) so that personal and operational data stays onshore. We work to the Privacy Act 1988 and the Australian Privacy Principles, build to the OAIC Notifiable Data Breaches scheme, and for fintech and superannuation clients we align our controls with APRA CPS 234 — covering access management, encryption in transit and at rest, audit logging and incident response. Customer data is never used to train third-party models, and our Melbourne presence keeps 4.5–5 hours of AEST overlap with our engineering centre.

01

Platform migration & modernisation

The honest framing of a legacy data estate: spreadsheets that became systems of record, on-prem warehouses that can't keep up with cloud workloads, and nightly batch ETL that means today's decisions are made on yesterday's numbers.

The stack we standardise on

  • ELT over ETL via dbt — transformations versioned alongside application code, tested, peer-reviewed, and deployed through CI.
  • Lakehouse architectures — Apache Iceberg or Delta Lake to separate storage from compute and keep historical data queryable without warehouse lock-in.
  • Orchestration — Apache Airflow or AWS Glue / Step Functions for scheduled and event-driven pipelines.
  • Warehouse layer — Snowflake, Databricks or BigQuery where customer choice constrains; otherwise sized to workload and cost profile.

When to migrate — and when not to

Migrate when reporting cycles exceed business decision cycles, when data engineers spend more time fixing pipelines than building new ones, or when the warehouse bill is growing faster than the data. Don't migrate when the existing platform works and the team can keep up — replatforming a healthy estate is an expensive way to feel modern.

Our agricultural data engineering programme is a real example of a lakehouse-style pipeline shipped to production — Airflow orchestration, PostgreSQL as the operational store, Sentinel-2 SCL cloud masking and NDVI/EVI computation, fuzzy matching with human confirmation in the loop, and explicit handling of sync-time versus collection-time timestamps so analysts know which dimension they are querying.

02

Governance, quality & data contracts

This is where most data-platform programmes fail — not in the build, but in the operation. Reactive quality (someone notices a number is wrong, you investigate, you patch) is a tax on the analyst team. Proactive quality is a different posture.

From reactive to proactive

Data contracts at the producer boundary, Great Expectations or Soda checks at every transformation step, and dbt tests on every model move quality from "things broke and we fixed them" to "the build fails when a contract is broken, before the bad data lands." The cost of detection drops by an order of magnitude when it shifts left.

The governance stack

  • dbt for tests-as-code — schema tests, referential integrity, freshness, custom assertions versioned in Git.
  • Great Expectations or Soda for declarative quality checks that run inside the pipeline and fail loudly.
  • Schema registry — Confluent Schema Registry or AWS Glue Data Catalog for contract enforcement between producers and consumers.
  • PII tokenisation at ingest so personally identifying fields never reach the analytics layer in the clear.
  • Audit trails — every transformation captured, for the kind of regulator review APRA CPS 234 or the OAIC NDB scheme assume.

Master-data quality is its own discipline. On the EUDR / sustainability commodity importer programme, supplier records flow through standardisation, deduplication and validation layers because the regulatory exposure of a wrong supplier record at the 30 December 2026 EUDR cut-over is too high to leave to spreadsheet hygiene.

03

Observability & lineage

Most data teams have monitoring on the pipeline (did the DAG run?) and none on the data (did the data look right?). Modern data observability changes this.

The observability stack

  • Data observability — Monte Carlo, Soda or Lightup watching freshness, volume, distribution and schema drift on every critical table.
  • OpenLineage for end-to-end lineage from source system through warehouse to BI tool, captured automatically from dbt, Airflow and Spark jobs.
  • Column-level lineageso when an analyst asks "what is this number?" you can answer in minutes, not days — and so impact analysis on a schema change takes minutes too.

Our TR Capital portfolio management programme is a working example — trade reconciliation and corporate-action processing where every record has provenance, every transformation is traceable, and every reconciliation break has a documented lineage path back to the source feed. The Odoo / EUDR programmefollows the same pattern: supplier master data flowing through transformation, validation and reporting layers, all observable, all lineage-tracked, because opaque data platforms and regulator submissions don't coexist.

How we scope these programmes

Data-platform programmes look more expensive on paper than BI projects. The cost difference disappears within twelve months because the analyst team stops debugging the platform and starts producing analysis — the work the platform was built to enable in the first place.

We scope these engagements honestly, including saying when an existing platform is fine and the issue is upstream in product, process or instrumentation. If the answer is "you don't need a new warehouse, you need three data contracts and a lineage tool," we'll tell you that.