Together AI democratizes access to powerful open-source language models by providing blazing-fast inference infrastructure optimized for models like LLaMA, Mistral, and Code Llama. For teams wanting to avoid vendor lock-in or reduce API costs by 10x, Together AI is a game-changer.
The Together AI Philosophy
Together AI believes in open-source. Rather than building proprietary models, they optimize inference for the best open-source models available. This approach provides freedom: swap models, fine-tune, or self-host anytime.
Supported Models
Foundation Models
LLaMA 2 and LLaMA 3: Meta’s powerful open models
- 7B, 13B, 70B parameter versions
- Excellent general-purpose capabilities
- Strong coding abilities
- Competitive with proprietary models
Mixtral 8x7B: Sparse mixture-of-experts model
- Expert-level performance
- Efficient token generation
- Good for specialized tasks
- Open-source and available for fine-tuning
Mistral 7B: Compact but powerful
- Fastest inference
- Lowest cost
- Surprisingly capable
- Ideal for high-volume applications
Code Llama: Specialized for programming
- Code generation and explanation
- Multi-language support
- Great for developer tools
- Open weights for fine-tuning
Specialized Models
- Summarization: Aligned models for document summarization
- Legal: Fine-tuned for legal document analysis
- Medical: Specialized for healthcare applications
- Multilingual: Excellent non-English capabilities
Why Together AI Wins on Speed
Hardware Optimization
Together invested in custom inference optimization:
- Optimized for batch inference
- Reduced latency (2-5x faster than alternatives)
- Higher throughput per GPU
- Efficient memory usage
Performance Numbers
| Task | Together AI | Alternative |
|---|---|---|
| Latency | 50ms | 200ms |
| Tokens/sec | 150+ | 30-50 |
| Cost/1M tokens | $0.50 | $2.00 |
| Availability | 99.9% | 99.5% |
Real-World Impact
Scenario: Generate product descriptions for 100,000 items
- OpenAI: 3 hours, $500 cost
- Together AI: 45 minutes, $50 cost
- Speed improvement: 4x faster
- Cost savings: 90% reduction
Cost Efficiency
Transparent Pricing
No surprises, no hidden fees:
- LLaMA 2 7B: $0.08 per 1M input tokens
- LLaMA 2 70B: $0.70 per 1M input tokens
- Mistral 7B: $0.06 per 1M input tokens
Input tokens cost same as output (unlike some competitors).
Free Trial
- $5 free credits
- Test models before committing
- No credit card required
Volume Discounts
- 10%+ discount for $1000+/month
- Higher discounts for committed volumes
- Custom pricing for enterprise
API Integration
Easy Implementation
import together
response = together.Complete.create(
prompt="Complete this: The future of AI is",
model="togethercomputer/llama-2-70b-chat",
max_tokens=256,
temperature=0.7
)
print(response['output']['choices'][0]['text'])
Language Support
- Python (most complete)
- JavaScript/Node.js
- Java
- Go
- REST API for any language
Streaming Support
For real-time token generation (chatbots, interactive apps):
stream = together.Complete.stream(
prompt="Write a poem about AI",
model="togethercomputer/llama-2-70b",
)
for token in stream:
print(token, end='', flush=True)
Fine-Tuning on Your Data
Custom Model Training
Together provides fine-tuning services:
- Upload data: Your examples and expected outputs
- Configure training: Set parameters and options
- Train: Together handles infrastructure
- Deploy: Use your fine-tuned model via API
Fine-Tuning Benefits
- Lower cost per inference (tokens cost less)
- Better performance on your task
- Faster response generation
- Domain-specific knowledge
Example: Customer Support
Fine-tune LLaMA 2 on 5,000 support conversations:
- Results: 30% accuracy improvement
- Cost: 40% less per request
- Time to market: 1 week
Real-World Use Cases
Content Generation at Scale
Publishing company needs 1000 article summaries daily:
- Use Mixtral 8x7B for summarization
- 1000 articles cost < $5/day
- OpenAI would cost $50+/day
- Annual savings: $16,500
Customer Chatbot
E-commerce platform with 1M daily queries:
- Deploy LLaMA 2 7B for first-level triage
- Together AI: $50/day
- Alternative APIs: $500-800/day
- Monthly savings: $15,000+
Code Generation
Development team using LLaMA Code for assistant:
- Faster than cloud alternatives
- Deploy models you own (no lock-in)
- Fine-tune on your codebase
- Control and privacy
Comparison: Together AI vs Alternatives
| Feature | Together AI | OpenAI | Anthropic |
|---|---|---|---|
| Model Freedom | Excellent | Proprietary | Proprietary |
| Speed | Fastest | Good | Good |
| Cost | Lowest | Higher | Higher |
| Fine-tuning | Available | Limited | Limited |
| Data Privacy | Excellent | Good | Excellent |
| Model Switching | Easy | Difficult | Not possible |
When to Use Together AI
Choose Together AI if:
- You want to avoid vendor lock-in
- Cost per token is critical
- You need fast inference
- You plan to fine-tune models
- You want to use open-source models
- You need inference at scale
Consider alternatives if:
- You need cutting-edge proprietary models
- You need vision/multimodal capabilities
- You require bleeding-edge research models
- You prioritize brand recognition
Getting Started
- Sign up at together.ai (free account)
- Get API key from dashboard
- Browse models in their playground
- Test models with free credits
- Read documentation for your use case
- Start building with Python/JavaScript
Advanced Features
Model Comparison Tool
Compare models side-by-side on your prompts:
- Same input, different models
- See quality and speed tradeoffs
- Find best model for your task
- Make data-driven decisions
Batch Processing
For non-urgent inference:
- 30% cost reduction
- Ideal for reports, summaries, analysis
- Scheduled execution
- Perfect for overnight processing
Conclusion
Together AI represents a fundamental shift in how enterprises think about LLM inference. Instead of being locked into one vendor’s proprietary models, you can leverage the best open-source models available. The combination of superior speed, lower costs, and complete freedom makes Together AI compelling for any organization running inference at scale. For cost-conscious teams, those needing custom fine-tuning, or companies committed to open-source, Together AI is not just an alternative—it’s the smarter choice. Start with their free trial and compare the speed and cost yourself.