For AI Agents & Developers: Navigate the landscape of LLM costs and optimize your spending
Last Updated: October 30, 2025 (Pricing changes frequently - verify current rates)
Critical distinction:
Example: You can use ChatGPT for free in your browser, but the OpenAI API requires payment after $5 credit.
| Provider | Free Tier? | What’s Included | How to Access | Credit Card Required? | Expires? |
|---|---|---|---|---|---|
| OpenAI API | ⚠️ $5 credit only | $5 one-time credit | API key | Yes | After $5 used |
| Anthropic API | ❌ No | None | API key | Yes | N/A |
| Google Gemini | ✅ Yes | 1500 RPD free | Google AI Studio | No | Never |
| OpenRouter | ✅ Yes | Free flagship models (DeepSeek R1, Qwen) | API key | No | Daily reset |
| Groq | ✅ Yes | 14K-30K RPD (Llama 3.3 70B, Mixtral) | API key | No | Never |
| Hugging Face | ✅ Yes | 300 req/hour (thousands of models) | API key | No | Never |
| Together.ai | ✅ $25 credits | $25 free credits for new users | API key | Yes | After credits used |
| Ollama | ✅ Yes | Unlimited | Local install | No | Never |
Legend:
NEW! Several providers now offer free access to flagship-quality models that rival GPT-4 and Claude Sonnet. Here’s how to access them:
OpenRouter provides free access to several high-quality models with daily limits:
| Model | Quality Score | Context Window | Rate Limits | Best For |
|---|---|---|---|---|
| DeepSeek R1 | 90/100 | 64K | Daily limit (resets) | Complex reasoning, coding |
| DeepSeek R1 Qwen3 8B | 85/100 | 32K | Daily limit (resets) | General tasks, fast inference |
| Qwen 32B | 88/100 | 32K | Daily limit (resets) | Multilingual, coding |
Setup:
from tta_dev_primitives.integrations import OpenRouterPrimitive
# Use DeepSeek R1 for free
deepseek = OpenRouterPrimitive(
model="deepseek/deepseek-r1:free",
api_key="your-openrouter-key" # Free tier, no credit card
)
# Performance on par with OpenAI o1, but free!
result = await deepseek.execute(context, {
"prompt": "Explain quantum computing in simple terms"
})
Key Benefits:
Rate Limits:
Google AI Studio provides free access to Gemini Pro and Flash models - flagship-quality models with generous limits:
| Model | Quality Score | Context Window | Free Tier Limits | Paid Tier Cost |
|---|---|---|---|---|
| Gemini 2.5 Pro | 89/100 | 2M tokens | Free of charge | $1.25/$10.00 per 1M tokens |
| Gemini 2.5 Flash | 85/100 | 1M tokens | Free of charge | $0.30/$2.50 per 1M tokens |
| Gemini 2.5 Flash-Lite | 82/100 | 1M tokens | Free of charge | $0.10/$0.40 per 1M tokens |
Setup:
from tta_dev_primitives.integrations import GoogleAIStudioPrimitive
# Free Gemini Pro access via AI Studio
gemini_pro = GoogleAIStudioPrimitive(
model="gemini-2.5-pro",
api_key="your-google-ai-studio-key" # Free tier, no credit card
)
# Free Gemini Flash for faster responses
gemini_flash = GoogleAIStudioPrimitive(
model="gemini-2.5-flash",
api_key="your-google-ai-studio-key"
)
Key Benefits:
Rate Limits (Free Tier):
⚠️ Important: Google AI Studio vs Vertex AI
Groq provides free access to several models with ultra-fast inference speeds:
| Model | Quality Score | Speed | Free Tier Limits | Best For |
|---|---|---|---|---|
| Llama 3.3 70B | 87/100 | 300+ tokens/sec | 14,400 RPD | General tasks, coding |
| Llama 3.1 8B | 82/100 | 500+ tokens/sec | 30,000 RPD | Fast responses, simple tasks |
| Mixtral 8x7B | 85/100 | 400+ tokens/sec | 14,400 RPD | Multilingual, reasoning |
Setup:
from tta_dev_primitives.integrations import GroqPrimitive
# Ultra-fast inference with Llama 3.3 70B
groq = GroqPrimitive(
model="llama-3.3-70b-versatile",
api_key="your-groq-key" # Free tier, no credit card
)
# 300+ tokens/second - fastest free LLM API
result = await groq.execute(context, {
"prompt": "Write a Python function to sort a list"
})
Key Benefits:
Rate Limits (Free Tier):
Hugging Face provides free access to thousands of models via their Inference API:
| Tier | Rate Limits | Models Available | Best For |
|---|---|---|---|
| Unregistered | 1 request/hour | All public models | Testing |
| Registered (Free) | 300 requests/hour | All public models | Development |
| Pro ($9/month) | 10,000 requests/hour | All models + priority | Production |
Setup:
from tta_dev_primitives.integrations import HuggingFacePrimitive
# Free access to Llama, Mistral, and more
hf = HuggingFacePrimitive(
model="meta-llama/Llama-3.3-70B-Instruct",
api_key="your-hf-token" # Free tier, no credit card
)
# Access thousands of open-source models
result = await hf.execute(context, {
"prompt": "Explain machine learning"
})
Key Benefits:
Rate Limits (Free Tier):
Together.ai offers $25 in free credits for new users:
| Model | Quality Score | Free Credits | Cost After Credits | Best For |
|---|---|---|---|---|
| Llama 4 Scout | 88/100 | $25 free | $0.20/$0.80 per 1M tokens | General tasks |
| FLUX.1 Schnell | N/A | 3 months free | Image generation | Image generation |
Setup:
from tta_dev_primitives.integrations import TogetherAIPrimitive
# $25 in free credits for new users
together = TogetherAIPrimitive(
model="meta-llama/Llama-4-Scout",
api_key="your-together-key" # $25 free credits
)
# Use credits for text or image generation
result = await together.execute(context, {
"prompt": "Generate a business plan"
})
Key Benefits:
Free Credits:
| Provider | Best Free Model | Quality vs GPT-4 | Rate Limits | Credit Card? | Best For |
|---|---|---|---|---|---|
| OpenRouter | DeepSeek R1 | 90% | Daily limits | ❌ No | Complex reasoning |
| Google AI Studio | Gemini 2.5 Pro | 89% | 1500 RPD | ❌ No | Production apps |
| Groq | Llama 3.3 70B | 87% | 14,400 RPD | ❌ No | Ultra-fast inference |
| Hugging Face | Llama 3.3 70B | 87% | 300 req/hour | ❌ No | Model variety |
| Together.ai | Llama 4 Scout | 88% | $25 credits | ✅ Yes | New users |
Quality Scoring:
For Production Apps:
For Development:
Example Workflow:
from tta_dev_primitives.recovery import FallbackPrimitive
from tta_dev_primitives.integrations import (
GoogleAIStudioPrimitive,
OpenRouterPrimitive,
GroqPrimitive
)
# Free flagship model fallback chain
workflow = FallbackPrimitive(
primary=GoogleAIStudioPrimitive(model="gemini-2.5-pro"), # Free, flagship
fallbacks=[
OpenRouterPrimitive(model="deepseek/deepseek-r1:free"), # Free, daily limits
GroqPrimitive(model="llama-3.3-70b-versatile") # Free, ultra-fast
]
)
# 100% uptime with free flagship models!
result = await workflow.execute(context, input_data)
Follow these steps to get free flagship model access working immediately:
cd packages/tta-dev-primitives
uv sync --extra integrations
Google AI Studio (Recommended - Best Free Flagship):
AIza...)OpenRouter (DeepSeek R1 - On Par with OpenAI o1):
sk-or-...)Groq (Ultra-Fast Inference):
gsk_...)Create a .env file in your project root:
# Copy the example file
cp .env.example .env
# Edit .env and add your keys
GOOGLE_API_KEY=AIza...your-key-here
OPENROUTER_API_KEY=sk-or-...your-key-here
GROQ_API_KEY=gsk_...your-key-here
Run the free flagship models example:
cd packages/tta-dev-primitives
uv run python examples/free_flagship_models.py
Expected Output:
✅ Model: gemini-2.5-pro
📝 Response: [AI-generated response]
📊 Usage: {'prompt_tokens': 10, 'completion_tokens': 50, 'total_tokens': 60}
🎯 Quality: 89/100 (flagship)
💰 Cost: $0.00 (FREE)
Google AI Studio:
OpenRouter:
Groq:
Use the fallback chain pattern for 100% uptime:
from tta_dev_primitives.recovery import FallbackPrimitive
from tta_dev_primitives.integrations import (
GoogleAIStudioPrimitive,
OpenRouterPrimitive,
GroqPrimitive
)
# Create fallback chain
llm = FallbackPrimitive(
primary=GoogleAIStudioPrimitive(model="gemini-2.5-pro"),
fallbacks=[
OpenRouterPrimitive(model="deepseek/deepseek-r1:free"),
GroqPrimitive(model="llama-3.3-70b-versatile")
]
)
# Use in your app
response = await llm.execute(request, context)
Issue: “Invalid API key”
echo $GOOGLE_API_KEYIssue: “Rate limit exceeded”
Issue: “Model not found”
gemini-2.5-pro, gemini-2.5-flashdeepseek/deepseek-r1:free, qwen/qwen-32b:freellama-3.3-70b-versatile, llama-3.1-8b-instantIssue: “Connection timeout”
Full Documentation:
While free tiers are great for learning and prototyping, production use cases often require paid models. Here’s when to make the switch:
1. Quality Requirements Exceed Free Tier Capabilities
2. Rate Limits Become a Bottleneck
3. Latency/Reliability Requirements
4. Context Window Requirements
Use TTA.dev primitives to combine free and paid models intelligently:
from tta_dev_primitives.integrations import OpenAIPrimitive, OllamaPrimitive
from tta_dev_primitives.recovery import FallbackPrimitive
# Start with paid (quality), fallback to free (cost savings)
workflow = FallbackPrimitive(
primary=OpenAIPrimitive(model="gpt-4o"), # Paid, high quality
fallbacks=[
OllamaPrimitive(model="llama3.2:8b") # Free, unlimited
]
)
Cost per 1M tokens (October 2025):
| Model | Provider | Input Cost | Output Cost | Quality Score | Best For |
|---|---|---|---|---|---|
| GPT-4o | OpenAI | $2.50 | $10.00 | 92/100 | Complex reasoning, code generation |
| GPT-4o-mini | OpenAI | $0.15 | $0.60 | 82/100 | General purpose, cost-effective |
| Claude Sonnet 4.5 | Anthropic | $3.00 | $15.00 | 90/100 | Creative writing, analysis |
| Claude Opus | Anthropic | $15.00 | $75.00 | 88/100 | Highest quality, complex tasks |
| Gemini Pro 2.5 | $1.25 | $5.00 | 89/100 | Multimodal, cost-effective | |
| Gemini Flash 2.5 | $0.075 | $0.30 | 85/100 | Fast, cheap, good quality |
Cost Calculation Example:
# Example: 10K requests/day, 500 tokens input, 1000 tokens output
# GPT-4o-mini (most cost-effective paid option)
daily_cost = (10000 * 500 / 1_000_000 * 0.15) + (10000 * 1000 / 1_000_000 * 0.60)
# = $0.75 + $6.00 = $6.75/day = $202.50/month
# Gemini Flash 2.5 (cheapest paid option)
daily_cost = (10000 * 500 / 1_000_000 * 0.075) + (10000 * 1000 / 1_000_000 * 0.30)
# = $0.375 + $3.00 = $3.375/day = $101.25/month
# Claude Sonnet 4.5 (highest quality)
daily_cost = (10000 * 500 / 1_000_000 * 3.00) + (10000 * 1000 / 1_000_000 * 15.00)
# = $15.00 + $150.00 = $165.00/day = $4,950/month
💡 Cost Optimization Insight:
Using TTA.dev’s CachePrimitive can reduce costs by 30-40% by avoiding redundant API calls:
from tta_dev_primitives.performance import CachePrimitive
# Cache expensive LLM calls
cached_llm = CachePrimitive(
primitive=OpenAIPrimitive(model="gpt-4o"),
ttl_seconds=3600, # 1 hour cache
max_size=1000
)
# 30-40% of requests hit cache → 30-40% cost reduction
# $202.50/month → $121.50-$141.75/month savings
TTA.dev primitives help you maximize value from paid models:
Use CachePrimitive to avoid redundant API calls for identical inputs.
Use RouterPrimitive to route simple tasks to cheaper models (GPT-4o-mini, Gemini Flash).
Use FallbackPrimitive to start with paid models, fall back to free when budget exceeded.
Use RetryPrimitive to avoid wasting API calls on transient failures.
See the comprehensive Cost Optimization Patterns Guide for:
Q: “Can I use ChatGPT for free?”
A: Yes, but there are TWO different things:
They are NOT the same!
# 1. Sign up at https://platform.openai.com/signup
# 2. Add payment method (required even for $5 credit)
# 3. Get API key from https://platform.openai.com/api-keys
# 4. Set environment variable
export OPENAI_API_KEY="sk-..."
Nothing. Anthropic does not offer a free API tier.
Q: “Can I use Claude for free?”
A: Yes, but only the web interface (claude.ai):
They are NOT the same!
# 1. Sign up at https://console.anthropic.com/
# 2. Add payment method (required)
# 3. Get API key
# 4. Set environment variable
export ANTHROPIC_API_KEY="sk-ant-..."
Q: “What’s the difference between Google AI Studio and Vertex AI?”
A: They are COMPLETELY different:
| Feature | Google AI Studio | Vertex AI |
|---|---|---|
| Free Tier | ✅ Yes (1500 RPD) | ❌ No |
| Target | Developers, prototyping | Enterprise, production |
| Setup | Simple API key | GCP project, billing |
| Best For | Testing, learning | Production apps |
Use Google AI Studio for free access!
# 1. Go to https://aistudio.google.com/
# 2. Sign in with Google account
# 3. Click "Get API key"
# 4. Set environment variable
export GOOGLE_API_KEY="AIza..."
Q: “What’s the difference between OpenRouter free tier and BYOK?”
A: Two different things:
| Feature | OpenRouter (Regular) | OpenRouter BYOK |
|---|---|---|
| Who pays provider? | OpenRouter | You (your API keys) |
| Free tier | Limited credits | 1M requests/month |
| Your API keys | Not used | Required |
| Best for | Quick testing | Production with your keys |
BYOK = You bring your own OpenAI/Anthropic/etc. keys, OpenRouter routes for free (up to 1M/month)
# 1. Sign up at https://openrouter.ai/
# 2. Go to https://openrouter.ai/settings/integrations
# 3. Add your provider API keys (OpenAI, Anthropic, etc.)
# 4. Get OpenRouter API key
# 5. Set environment variable
export OPENROUTER_API_KEY="sk-or-..."
Q: “Is Ollama really free?”
A: Yes, but it’s local:
# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# 2. Download a model
ollama pull llama3.2
# 3. Run locally (no API key needed)
ollama run llama3.2
"""Example: Maximize free tier usage with RouterPrimitive"""
from tta_dev_primitives import RouterPrimitive
from tta_dev_primitives.integrations import (
OllamaPrimitive,
OpenAIPrimitive,
)
from tta_dev_primitives.recovery import FallbackPrimitive
from tta_dev_primitives.performance import CachePrimitive
import os
# Strategy: Free → Paid fallback
free_llm = OllamaPrimitive(model="llama3.2") # Always free
paid_llm = OpenAIPrimitive(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o-mini" # Cheapest paid option
)
# Try free first, fallback to paid
workflow = FallbackPrimitive(
primary=free_llm,
fallbacks=[paid_llm]
)
# Add caching to reduce API calls
cached_workflow = CachePrimitive(
primitive=workflow,
ttl_seconds=3600 # 1 hour cache
)
"""Example: Stay within free tier limits"""
from tta_dev_primitives.recovery import RetryPrimitive
from tta_dev_primitives.integrations import OpenAIPrimitive
import time
# Track usage to stay within free tier
class UsageTracker:
def __init__(self, daily_limit=1500):
self.daily_limit = daily_limit
self.requests_today = 0
self.last_reset = time.time()
def can_make_request(self):
# Reset counter daily
if time.time() - self.last_reset > 86400:
self.requests_today = 0
self.last_reset = time.time()
return self.requests_today < self.daily_limit
def record_request(self):
self.requests_today += 1
# Use with Gemini (1500 RPD free)
tracker = UsageTracker(daily_limit=1500)
async def safe_llm_call(prompt):
if not tracker.can_make_request():
raise Exception("Daily limit reached - use fallback")
tracker.record_request()
# Make API call...
"""Example: Combine multiple free tiers"""
from tta_dev_primitives import RouterPrimitive
# Use different providers for different tasks
router = RouterPrimitive(
routes={
"local": OllamaPrimitive(), # Free, unlimited
"cloud_free": GoogleGeminiPrimitive(), # 1500 RPD free
"paid_backup": OpenAIPrimitive() # $5 credit
},
default_route="local"
)
# Route based on task complexity
def select_route(task):
if task.is_simple:
return "local" # Use free Ollama
elif task.is_urgent:
return "cloud_free" # Use Gemini (faster)
else:
return "paid_backup" # Use OpenAI credit
packages/tta-dev-primitives/src/tta_dev_primitives/integrations/openai_primitive.pypackages/tta-dev-primitives/src/tta_dev_primitives/integrations/anthropic_primitive.pypackages/tta-dev-primitives/src/tta_dev_primitives/integrations/ollama_primitive.pypackages/tta-dev-primitives/src/tta_dev_primitives/performance/cache.pypackages/tta-dev-primitives/src/tta_dev_primitives/core/routing.pypackages/tta-dev-primitives/src/tta_dev_primitives/recovery/fallback.pyLast Updated: October 30, 2025 For: AI Agents & Developers (all skill levels) Maintained by: TTA.dev Team
⚠️ Important: Free tier limits change frequently. Always verify current limits on provider websites before relying on this information for production use.