The New Need for AI Infrastructure

In this article

At first, it was heavenly. The hackathon, a sanctioned space to develop new things without all of the red tape, meeting interruptions, and approval workflows that destroy deep work, had been joined in bliss with the most powerful technology to grace us with its presence in decades: large language models (LLMs).

And the first time these two got together, there were fireworks. Ideas that used to die as sparks in a Slack channel came to life overnight: summarize all the things! Generate that custom report! Add chat to it!

Concepts were proofed, levels of effort were slashed, and roadmaps were paved. Hackathons were happy in a way we hadn't seen since QR codes found bar menus.

But it wasn't long into the second soirée between these two that something just felt—off. Yes, engineers were still using LLMs, but they'd always add a note or comment. Something about privacy, costs, or exceptions. Something about testing. Or, "Before this can go to production …" knowing deep down that whatever came after those dots was, shall we say, unlikely.

By the third time, the winning hack would still emerge, demonstrating a huge, overdue “unlock in customer value,” but the engineers would go away with some bottled concern, like the eight-year-old who's simultaneously smitten by Santa but unable to escape the thought that their house doesn't have a chimney, so how the heck is this really going to work?

Hard things and the UI waterline

This cycle of building up hope and tasting a new future, only to re-enter the realities of the present, has only accelerated. Rather than vanquishing these challenges, AI has made previously hard things easy—and historically easy things hard.

The "seemingly hard things" line marks the tipping point between low and high levels of effort features, as illustrated by Randall Munroe:

Ironically, this joke was published pre-LLMs. The 80/20 solution for "check whether the photo is of a bird" is now also "gimme a few hours," and the research team and five years is simply "let's wait for GPT-5".

And it's exactly this shift in level of effort that's responsible for the Cambrian explosion of sparkle-and-wand features, "AI" finding its way onto (almost) every SaaS marketing page, and AI startups capturing 46% of the $209 billion raised in 2024.

But another countervailing force is at play. As much as LLMs make previously impossible, long-awaited, and highly visible hackathon-ripe features easy, they make other features more difficult. Worse, these features are the invisible, below the UI waterline, taken for granted, downright boring kind: testing, search, compute costs, privacy, age-restricted content, not-becoming-a-physics-professor:

In all but the enterprise case, these features aren't headliners; they are merely the shoulders we sit on at the hackathon chicken fight—sturdy, reliable, must-have, common sense. They aren't trivial, but for many businesses these features are solved enough that they rarely add to scope. Instead, they lurk in the back of customers’ minds and dwell in the bottom right of your product manager's Kano model as “the undelightful.”

The crash

The fact that we've muddied the waters hits us post-hackathon when we provide the true cost of shipping and maintaining that magical, AI-centric feature spring-loaded with unknown edge cases and dragons versus the deterministic, well-understood feature that already has a solid and clean foundation.

With a tinge of disappointment and a dose of relief, we accept that the effort required is greater than we anticipated. And then, to confirm our bias towards the status quo, the Apple Pie questions come: "How confident are we that customers will even use this? Were they really asking for this? Which customers? What data do we have from prospects?" We pause for a moment, shake off the dream, and ship the non-AI thing that was already on the roadmap.

As we navigate from expectations to reality, it’s the hidden complexities of the platform layer that sabotage and sink our plans.

Product and platform expectations vs reality illustration

Becoming AI native

The root causes of a slow shipping velocity are legion. But in the case of AI, we find that the true culprit is the hidden costs of this missing infrastructure layer. If we want our product, design, and engineering teams to be productive in this new world of LLMs, we need to equip them with components they can effectively take for granted. We have to grow out of the illusion that "it's just an API call" and directly address the challenges that are ignored during bursts of inspiration, but reveal themselves as landmines on the road to production.

We have to grow up from the idea that "it's just an API call" and directly address the challenges that are ignored during bursts of inspiration, but reveal themselves as landmines on the road to production.

At Customer.io, we envision this infrastructure as two layers: core, a short, relatively stable list of heavier efforts, and catalog, a longer, more dynamic (discover-along-the-way) list of lighter efforts.

Both lists are works-in-progress, not gospel (see also Unix philosophy Rule #16).

Core

The foundational context for a productive and safe use of AI.

Workflows: A way to define a chain of activities or calls, preferably durable and resilient.
Testing: A way to detect regression, measure improvements, and understand differences in LLM output.
Internal search: A way to find and retrieve relevant data or content to enrich LLM context.
Packaging: A way to deploy new and novel call chains as reusable functions (tools) for later use.
Privacy and security: A way to ensure that AI/LLM usage respects user privacy and handles data safely.
Logging: A way to store, retrieve, and inspect prior usage.
Prompt library: A searchable, versioned library of prompts at the user and account level.
Human-in-the-loop: A way to describe a workflow step that requires an action to continue.

Catalog

A standard library of the most common, re-used functions in AI-centric workflows and agents:

LLM calls: The API call to an LLM of the user or developer's choice, with developer-friendly ergonomics.
- Data in: Beyond the prompt, make it easy to provide the LLM with images and content.
- Data out: Reliably getting a response matching a schema.
Browser: A headless browser for navigating the web with Puppeteer support.
Request: A robust, simplified method to make an HTTP request that is optionally authenticated.
Screenshotting: A way to generate an image of a web page or resource.
Static content storage: A way to conveniently store generated images and video.
Image generation: An easy way to generate an image that meets certain criteria.
Web search: A way to search the web for relevant content and links with clean results.
Text generation: A way to compose text and inject data with liquid, useful for prompts.
Object: A way to compose and use an object in the workflow.
Storage: A way to store an object for use elsewhere or later.
Agent: A way to define an agent as a list of API endpoints (i.e. tools).
Parsing: An easy way to extract nodes from a JSON object or DOM.

Getting inspired again

If LLMs and hackathons turn out to be gas and matches, is it over? Was it just (too much) fun while it lasted?

Far from! But until we cross the threshold of being AI native, we should challenge ourselves to remember the purpose of these jam sessions: to temporarily remove whatever bottleneck is preventing us from shipping. While it may have (previously) been a lack of focus or brainstorming, too much red tape, or a lack of confidence that an idea is technically feasible, with LLMs, it's very likely something else now. Something a little deeper and perhaps a little more boring—unless you're an engineer's engineer?

When you invest in infrastructure, you’re improving the tools in the hands of the people that serve your customers. Ultimately, that's an investment in quality.

And who isn't excited about that?