Building Nattr: When one AI isn't enough

The problem: AI Tunnel Vision

You've probably noticed this: ask Claude a question, you get one perspective. Ask ChatGPT the same thing, completely different answer. Which one's right? Often, they both are—just approaching it differently.

The frustration comes when you're making important decisions. You want to cross-validate, get multiple expert opinions, see different angles. But that means copying your question across multiple browser tabs, managing separate conversations, and manually synthesising the responses. It's exhausting.

What if all those AI models could just... talk to each other? And to you. In the same conversation.

What we built

Nattr is a multi-AI chat platform where you can have Claude, GPT-4, Gemini, Grok, DeepSeek, and Qwen all in the same room. You type @all and six different AI perspectives respond to your question in real-time.

Two features make this genuinely useful:

@ Mentions that actually work
Type @ and you get a dropdown of all available AIs. Want Claude's opinion? @claude. Want everyone? @all. Want to pit GPT-4 against Gemini on a technical question? Mention both. The AIs only respond when called, so conversations stay focused.

Web search across all models
Toggle the 🌐 button and every AI can search the web before responding. Claude and GPT-4 use native tool calling. The others? We built smart detection that triggers searches when you ask about "latest", "current", or paste URLs. All powered by Tavily's API, giving AIs access to current information instead of just training data.

The result? Ask "what's the latest in AI developments?" and you get six expert analyses, all citing current sources, in seconds.

The technical journey

We built this with Next.js 15, Prisma, and Supabase. The interesting part wasn't the stack—it was making six completely different AI APIs play nicely together.

The adapter pattern challenge
Each AI provider has its own SDK, its own quirks, its own way of handling streaming. Claude uses Anthropic's SDK. OpenAI has its own. Google's Gemini? Different again. We built a unified adapter layer that translates between them all, so the rest of the app just says "get me a response" without caring which AI it's talking to.

The real headache was tool calling. Claude and GPT-4 have native support for tools (functions the AI can call). You define a "web_search" tool, and the AI decides when to use it. Beautiful. But Gemini, Grok, DeepSeek, and Qwen? No native tool support.

Our solution: hybrid approach. For models with tool support, we use it. For others, we detect keywords ("latest", "news", URLs) and inject search results directly into the conversation context. Same capability, different implementation. The user never knows the difference.

The tool_use block problem
Here's a specific technical challenge that cost us a day: Claude's tool_use blocks break when stored in conversation history. You'd ask Claude to search, it would work perfectly, then the next message would fail because it saw its own tool_use block and got confused.

Solution? Native tools for the first message only. After that, switch to context injection for subsequent searches. Same feature, different implementation based on conversation state. Ugly under the hood, seamless for users.

Real-time without the complexity

We needed real-time updates—you can't have a six-way AI conversation that requires page refreshes. But Vercel's serverless functions don't do WebSockets well.

The architecture: Socket.io server on Render (Oregon region) handling WebSocket connections, Next.js API routes on Vercel for everything else. When an AI responds, it hits the API route, which pushes to Socket.io, which broadcasts to all connected clients. Polling as a fallback for environments that block WebSockets.

Total build time for the real-time infrastructure? About two days. Standing on the shoulders of giants (Socket.io is mature and battle-tested) meant we could focus on the unique parts—the multi-AI orchestration.

Database design that actually scaled

We knew from the start this needed to handle complexity: multiple rooms, multiple participants (humans AND AIs), flexible conversation modes, credit tracking. Eight database tables, all connected through Prisma.

The clever bit: participants are polymorphic. Humans and AIs both exist in the same Participants table. Both can send messages. Both show up in the same conversation. This made features like "Code Reviewer Claude" + "Creative Claude" + "Debug Claude" (same model, different personas) trivial to implement.

We're on Supabase's transaction pooler with PgBouncer, which handles serverless connection limits beautifully. No hand-rolled connection pooling logic needed.

What Went Well

The Adapter Pattern Held Up
We thought abstracting six different AI APIs might be premature optimisation. It wasn't. Every time we added a new provider (most recently Qwen), it took under an hour. Write the adapter, plug it in, done. The pattern proved its worth immediately.

Users Understood @ Mentions Instantly
No tutorial needed. People who've used Slack or Discord just... got it. Type @, see the dropdown, mention who you want. The UI pattern borrowed from familiar tools meant zero learning curve.

Personas Unlocked Unexpected Use Cases
We added eight predefined personas (Code Reviewer, Creative Writer, Research Assistant, etc.) thinking it'd be a nice extra feature. Users started creating rooms with the SAME model five times, each with a different persona. "Give me five different expert opinions from Claude" became a common pattern we hadn't anticipated.

What Was Harder Than Expected

Getting AIs to Actually Use Web Search
Even when we gave Claude the perfect tool definition, it would often skip searching and just answer from training data. The fix? Extremely explicit prompts. Instead of "You have access to web search", we went with "ALWAYS search before answering questions about current events, latest developments, or specific URLs. DO NOT rely on training data for time-sensitive information."

Subtlety doesn't work with LLMs. Caps lock does.

Streaming 300-Second Responses on Vercel
Vercel's default timeout is 10 seconds on the free tier. We needed 300 seconds for AI responses (some models are SLOW). This required upgrading to Vercel Pro and explicitly setting execution timeouts. Small detail, big impact on user experience. Nobody wants a response cut off mid-sentence.

The Context Window Trade-off
We initially loaded full conversation history for every AI request. With a 1,000-message conversation, that meant sending 1,000 messages to the AI every single time. Slow. Expensive. Unnecessary.

We now limit context to the last 50 messages. Turns out that's plenty for most conversations. The trade-off: occasionally the AI "forgets" something from way earlier in the thread. The benefit: responses in seconds instead of timeouts.

What We'd Do Differently

Start with Fewer AI Providers
Six providers on day one was ambitious. If we were doing this again, we'd launch with Claude and GPT-4, prove the concept, THEN add the others. The adapter pattern meant adding providers was easy, but the initial integration overhead was significant.

Build the Credit System Later
We spent two days building a comprehensive credit tracking system—per-token pricing, transaction history, different rates for input vs output tokens. All necessary eventually, but not for MVP. Should've launched with a simple "unlimited for early users" model and added monetisation once we had traction.

Invest in Better Error Messages Earlier
When an AI call fails (rate limits, API errors, timeouts), we initially just showed "Something went wrong". Useless. We should've built detailed error handling from the start: "Anthropic API rate limit hit—try again in 60 seconds" vs "OpenAI timeout—the response took too long".

Specificity matters. Generic errors frustrate users.

The Precode Approach in Action

This is exactly how we run MVP Sprints for clients: solve one problem really well, get it live, learn from real users.

Nattr could have been a full-featured team collaboration platform with file uploads, video calls, integrations with Slack, etc. Instead, it's a focused tool that does one thing—multi-AI conversations—better than anything else.

Two weeks from concept to production. No feature bloat. No six-month roadmaps. Just ship, learn, iterate.

The technical decisions mirror what we recommend to clients:

Leverage existing tools (Socket.io, not hand-rolled WebSockets)
Use managed services (Supabase, not self-hosted PostgreSQL)
Build abstractions that scale (adapter pattern for AI providers)
Cut scope ruthlessly (50-message context, not unlimited history)

Every sprint teaches us something new about rapid product development. This one taught us that even complex technical problems (six different AI APIs, real-time streaming, token accounting) can be solved in days, not months, if you're willing to make pragmatic trade-offs.

Try It Yourself

Nattr is live at nattr.ai. Free tier gives you 500 credits—enough to experiment with multi-AI conversations and see if it clicks for your workflow.

If you're building something and stuck in "planning mode", wondering how to get an MVP out quickly: this is how we do it. One week. Real product. Real users. Real learning.

We run MVP Sprints for clients who want to move this fast. No agency overhead, no endless discovery phases. Just experienced builders who ship. If that sounds interesting, get in touch.

For now, go ask six AIs the same question and see what happens. You might be surprised by how different their perspectives are.