Why I'm Building Crawl: The Migration Intelligence Gap Nobody Fills

March 14, 20263 min read

12 years at Informatica taught me that every data migration fails at the same step: understanding what you have. AI is finally good enough to change that.

The problem I kept seeing

I spent 12 years at Informatica in various roles, working Fortune 500 data integration across APAC, MENA, and Europe. What's funny is that by the end as a Development Architect, I got to fix the bugs that I encountered when I was deploying the software as a Consultant. I was always part of the MDM vertical — data management, data enrichment, data stewardship, golden customer records, survivorship rules, fuzzy match rules — but I worked closely with the PowerCenter ETL teams and saw first hand the problems they had to deal with in terms of data migration.

Designing landing tables for data ingestion into MDM, we had a choice: make them look like the source system, or like the target MDM data model. We chose the target — because you don't want ETL logic split across two systems. Better to do all the heavy transformation in PowerCenter, keep the MDM-side passthrough simple, and have one place to look. The right architectural call — but it also meant all the transformation logic lived in PowerCenter, often undocumented. Good luck changing it once data was flowing.

What I saw over and over was the same wall. Customers running Oracle databases with years of stored procedures, custom code being migrated into PowerCenter, business logic buried in ETL jobs that nobody had documented. The wall wasn't the technical translation — tools existed for that. The wall was earlier: understanding what you have.

Thousands of stored procedures encoding business rules nobody documented. ETL jobs with logic baked into transformation steps that predate anyone on the current team. Views referencing tables that were dropped two years ago. The tribal knowledge lived in the SQL, and nobody had time to read it all.

Every migration I saw started the same way: consultants spending weeks manually reading stored procedures, writing documentation that was outdated before the project kicked off. It was the most expensive, least scalable part of every engagement.

Why now

When I left Informatica in 2023, AI wasn't strong enough to do this reliably. You could get an LLM to summarize a single procedure, but it couldn't synthesize across hundreds of objects — finding contradictions between procedures, detecting dead code, scoring migration risk based on vendor-specific syntax patterns.

That's changed. Models can now hold enough context to cross-reference a procedure against its dependencies. They can identify that sp_calculate_customer_churn uses a different LTV formula than sp_calculate_ltv. They can flag that a DATEADD call uses vendor-specific syntax that won't survive a platform change. They can do this at scale, across an entire codebase of stored procedures, and produce cogent summaries that a project manager can actually use to make triage decisions.

The gap hasn't closed — the tools to fill it have finally arrived.

What Crawl does

Crawl is the Step 0 layer: pre-migration intelligence that runs before you pick up any conversion tool. It connects to your database (read-only, catalog-only — it reads procedure source code, never your data), extracts business rules using hybrid AST + LLM analysis, and produces triage reports that tell you:

  • What do we have? — Auto-generated business rule summaries
  • Is it still alive? — Dead code detection, contradiction flagging
  • What should we migrate first? — Triage by criticality, complexity, risk
  • What breaks if we move? — Vendor-specific logic that won't survive a platform change

It's open-source (Apache 2.0), vendor-neutral, and works with any LLM provider — OpenRouter by default, or point it at a local model if your code can't leave the building.

The SQL Explainer: a free taste

To validate that the core analysis works, I built a free SQL explainer tool. Paste a stored procedure, get a structured breakdown of what it does — business rules, risk flags, complexity score. It's Crawl's analysis engine running on a single procedure instead of an entire database.

Try it. Then imagine that across your whole data stack.

What's next

Crawl is in early development. Oracle Data Integrator (ODI) is the first supported source, Informatica PowerCenter is in progress, with Snowflake, SQL Server, Oracle PL/SQL, and Postgres planned. If you've ever stared down a migration backlog and wished someone had documented the stored procedures before the original team left, this is for you.

Augustin Chan is CTO & Founder of Digital Rain Technologies, building production AI systems including 8-Bit Oracle. Previously Development Architect at Informatica for 12 years. BS Cognitive Science (Computation), UC San Diego.