AI Technology

Combining Data Sources Without the Pain: How Milo’s Query Engine Works

Most analytics teams hit the same wall: data lives everywhere. Customer records in Postgres, financials in a spreadsheet, behavioral data in a data warehouse. Joining them means either long engineering cycles or fragile ad-hoc scripts. The result? Analysts spend more time moving and cleaning data than analyzing it.

Why Combining Data Sources Is Hard

Traditional BI tools typically assume a single clean source. The moment you need to cross-analyze Postgres tables with spreadsheet data or mix data warehouse facts with operational logs, you end up writing glue code. That often means:

  • Exporting CSVs and manually joining them
  • Building one-off ETL pipelines
  • Writing Python scripts for transformations and statistical cleanup

These approaches are slow, brittle, and distract from the actual business question.

Enter Milo’s Query Engine

To solve this, we built a multi-source query engine directly into Milo. It lets you run a single SQL-like query across heterogeneous data sources, without standing up a separate ETL pipeline.

Here’s how it works under the hood:

  • Adapted SQL dialect: We extended SQL to allow joins and filters across sources (Postgres, spreadsheets, warehouses).
  • DuckDB foundation: The engine is built on DuckDB, a high-performance in-memory analytics database. It parses queries and prunes them to read only the minimum data required.
  • Containerized execution: Queries run inside the same container that executes Milo’s Python code, ensuring consistency, isolation, and performance.

The result is fast, efficient, persistent queries across multiple data sources, without writing Python unless you want to.

Three Ways Milo Gets Data

The new query engine is part of a broader toolkit. Milo dynamically decides the best approach depending on the problem:

  1. Direct SQL queries
  • Runs native SQL against a single database.
  • Fastest option for same-source data without transformations.
  1. Python code generation
  • When data is messy or external APIs are needed, Milo writes Python.
  • This enables data cleaning, enrichment, statistical analysis, and custom transformations.
  1. The Query Engine
  • The sweet spot for cross-source joins.
  • Lets you query Postgres + spreadsheets + warehouse tables together in one statement.
  • No Python boilerplate, no ad-hoc pipelines.

By switching intelligently between these three modes, Milo balances speed, flexibility, and reliability.

Why This Matters

For data teams and business users, the implications are big:

  • Less engineering overhead: Analysts no longer need to stitch sources with custom code.
  • Faster insights: DuckDB’s in-memory execution minimizes latency by scanning only what’s needed.
  • Cleaner workflows: Use Python only when data requires heavy cleaning or statistical modeling.
  • Scalable governance: All queries run in Milo’s controlled environment, making them auditable and repeatable.

The Future of Multi-Source Analytics

Modern businesses rarely have a single source of truth. The ability to query across sources efficiently, without building fragile pipelines, is becoming a baseline requirement.

Milo’s query engine makes multi-source analytics as simple as writing a Prompt. Combine Postgres with spreadsheets, or join CRM data from your warehouse with operational logs, all inside one prompt. When data is clean, Milo uses SQL. When it’s messy, it switches to Python. And when you need cross-source joins, the new query engine takes over.

It’s the shortest path from data everywhere to answers anywhere.

Bring Generative BI to Your Team

If you found this article useful, imagine what Milo could do for your business. Our team will walk you through a personalized demo.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Related Articles

View More Posts