What I Learned Building an AI Trivia Platform

The Idea That Wouldn't Let Go

I love trivia games, but they all have the same problem: finite question banks. Play enough and you memorize answers. I kept thinking: what if the questions were generated by AI, infinite and unique?

That question led to building the AI Trivia Platform—a game where OpenAI's GPT models generate trivia questions on-demand. What started as curiosity became a deep dive into practical AI integration, prompt engineering, and the challenges of making probabilistic systems feel reliable.

Prompting: The New Programming

If you'd told me a year ago that I'd spend days iterating on prompts like they were code, I'd have been skeptical. But prompt engineering is real, and it's fascinating.

My first prompt was embarrassingly simple:

Generate a trivia question about science.

The results? Inconsistent format, varying difficulty, sometimes vague or trivial. I learned that specificity matters enormously. LLMs are capable, but they need clear instructions.

The evolved prompt:

Generate a multiple-choice trivia question with these requirements:

Category: {category}
Difficulty: {difficulty}

Format:
- Question: Clear, specific question about {category}
- 4 answer choices labeled A, B, C, D
- Exactly one correct answer
- Three plausible but incorrect distractors
- Difficulty: {difficulty_description}

Rules:
- No trick questions or wordplay
- Verify factual accuracy
- Make distractors believable but clearly wrong
- Avoid obvious patterns (e.g., "C" is always correct)

Output as JSON:
{
  "question": "...",
  "choices": ["A: ...", "B: ...", "C: ...", "D: ..."],
  "correct": "B",
  "explanation": "..."
}

This structured prompt increased quality dramatically—from 60% usable questions to 95%. The difference was specificity and format constraints.

The Temperature Knob

I discovered that OpenAI's temperature parameter is like a creativity dial. Low temperature (0.3) gives consistent, focused responses. High temperature (1.0) gives creative, varied responses.

For trivia, I needed variety without chaos. Temperature 0.7 was the sweet spot—questions varied but stayed factually grounded. Too low and questions felt repetitive. Too high and facts became suspect.

I learned to adjust temperature by use case:

Factual categories (science, history): lower temperature for accuracy
Creative categories (entertainment, pop culture): higher temperature for variety

This nuance wasn't in tutorials—I discovered it through experimentation and user feedback.

Quality Control: The 95% Problem

AI-generated content is usually good, but "usually" isn't enough for production. That 5% of bad questions—factually wrong, ambiguous, or poorly worded—erodes user trust quickly.

Multi-stage validation helped:

Format validation: Does the response match the expected JSON structure? Missing fields → reject
Consistency check: Is the marked correct answer actually in the choices? Mismatch → reject
Factuality check (heuristic): Does the explanation make sense? Are multiple choice options clearly distinct? Ambiguous → reject
User feedback: Report button lets users flag bad questions. Flagged questions → manual review

This pipeline caught most issues, but I learned that AI quality control needs human oversight. I review reported questions weekly, identifying patterns in mistakes. This feedback improves prompts iteratively.

Cost Optimization: The $500 Lesson

My first month's OpenAI bill was $500. For a hobby project, that was unsustainable. I needed to optimize.

Caching was obvious in hindsight: Why generate "History, Medium difficulty" questions every time? Pre-generate and cache them. I built a question pool:

Generate 100 questions per category/difficulty combination
Store in database
Serve from pool
Regenerate when pool depletes

This reduced API calls by 70%. Users still got variety—100 questions per combination is plenty—but costs plummeted.

Smart generation: Don't generate randomly—generate based on demand. Popular categories (entertainment) get more pre-generated questions. Obscure categories (philosophy) generate on-demand. This balances variety with cost.

Model selection: GPT-4 generates amazing questions but costs 10x more than GPT-3.5. I AB tested—users couldn't tell the difference for trivia. GPT-3.5 was plenty good. Switching saved 80%.

Batch processing: Generating one question per API call is inefficient. I batch requests—generate 10 questions per call. OpenAI charges per token, and batch context reuse is more efficient.

Result: $500/month → $50/month with better user experience (lower latency from caching).

TypeScript: Type Safety for AI Outputs

TypeScript proved invaluable for handling AI responses. LLMs return text—you need to parse and validate.

Define expected structures:

interface TriviaQuestion {
  question: string
  choices: [string, string, string, string]
  correct: 'A' | 'B' | 'C' | 'D'
  explanation: string
  category: string
  difficulty: 'easy' | 'medium' | 'hard'
}

Validate responses with Zod:

const QuestionSchema = z.object({
  question: z.string().min(10),
  choices: z.array(z.string()).length(4),
  correct: z.enum(['A', 'B', 'C', 'D']),
  explanation: z.string().min(20),
  // ...
})

const parsed = QuestionSchema.safeParse(response)
if (!parsed.success) {
  // Handle invalid response
}

This caught malformed responses immediately. TypeScript's type narrowing made working with validated data pleasant—no runtime surprises.

Latency and User Experience

Waiting 3 seconds for question generation kills flow. Even with caching, some requests generate on-demand. I learned to hide latency creatively:

Prefetching: While user answers question 5, prefetch question 6 in the background. By the time they click "Next," the question is ready.

Optimistic UI: Show loading animation immediately. Users tolerate brief waits if they feel progress—spinning icons, progress bars, engaging messages ("Generating your question...").

Fallback to cache: If generation takes >2 seconds, serve from cache instead. Speed beats perfection for UX.

Progressive disclosure: Don't wait for the entire question to generate—stream the response if possible (though OpenAI's API doesn't support streaming for structured outputs well yet).

User Engagement: What I Didn't Expect

Users want variety, not novelty. I thought AI-generated questions would be the main appeal. Users actually cared more about smooth gameplay, fair difficulty, and leaderboards. AI was enabling technology, not the feature.

Difficulty calibration is hard. What's "medium" difficulty? Subjective! Some users found my "easy" questions hard. I learned to add user feedback: "Too easy? Too hard?" and adjust generation based on aggregate feedback.

Explanation matters. After answering, showing why the answer is correct (with AI-generated explanation) was hugely appreciated. Users learn, not just play. This was a prompt addition that paid engagement dividends.

Debugging Probabilistic Systems

Traditional debugging: change input, expect same output. AI debugging: same input, different outputs. This is jarring.

Logging everything became essential. Store:

Prompt used
Temperature setting
Generated response
Validation result
User feedback

This audit trail let me trace problems. When users reported a bad question, I could see exactly what prompt generated it and iterate.

Seed parameters: OpenAI's seed parameter (deterministic output for testing) helped reproduce issues. Testing with seeds ensures consistent behavior for test cases.

A/B testing: Changes to prompts affect quality. I couldn't trust intuition—I had to test. Generate 100 questions with old prompt, 100 with new prompt, compare quality metrics (user ratings, rejection rates).

Lessons That Stuck

AI is a tool, not magic. It augments creativity but needs guardrails. Prompt engineering, validation, and caching turn probabilistic outputs into reliable products.

User experience trumps technology. Users don't care that questions are AI-generated—they care that the game is fun, fair, and fast. Technology serves user needs, not the reverse.

Cost optimization is product design. Caching and batching weren't just cost-saving—they improved latency. Constraints spark creativity.

Iteration beats perfection. My first prompt was mediocre. My tenth was good. My thirtieth was great. I learned to iterate quickly, gather feedback, and improve incrementally.

What's Next

This project sparked ideas I'm still exploring:

Adaptive difficulty: AI adjusts question difficulty based on user performance in real-time
Multiplayer with AI moderation: AI generates questions for live multiplayer games
Voice interaction: Users answer verbally, AI interprets speech (multimodal)

AI opens creative possibilities impossible with static content. The challenge is harnessing probability into reliability.

Closing Thoughts

Building the AI Trivia Platform taught me that working with AI is software engineering. It's not prompt-and-pray—it's architecting systems that handle probabilistic components reliably.

It reinforced that integration is where value lies. OpenAI's API is powerful, but value comes from thoughtful integration: prompting, validation, caching, UX design. These layers transform raw AI capability into user delight.

Most importantly, it showed that AI makes new things possible. Infinite trivia questions were impractical before. AI made it trivial. What else becomes possible when we think creatively about AI integration?

If you're building with AI, embrace iteration. Test rigorously. Optimize for cost and latency. And remember: users care about their experience, not your technology. Make something useful, then make it delightful.

View the project to see the implementation and play the game!