Together AI

Together AI democratizes access to powerful open-source language models by providing blazing-fast inference infrastructure optimized for models like LLaMA, Mistral, and Code Llama. For teams wanting to avoid vendor lock-in or reduce API costs by 10x, Together AI is a game-changer.

The Together AI Philosophy

Together AI believes in open-source. Rather than building proprietary models, they optimize inference for the best open-source models available. This approach provides freedom: swap models, fine-tune, or self-host anytime.

Supported Models

Foundation Models

LLaMA 2 and LLaMA 3: Meta’s powerful open models

7B, 13B, 70B parameter versions
Excellent general-purpose capabilities
Strong coding abilities
Competitive with proprietary models

Mixtral 8x7B: Sparse mixture-of-experts model

Expert-level performance
Efficient token generation
Good for specialized tasks
Open-source and available for fine-tuning

Mistral 7B: Compact but powerful

Fastest inference
Lowest cost
Surprisingly capable
Ideal for high-volume applications

Code Llama: Specialized for programming

Code generation and explanation
Multi-language support
Great for developer tools
Open weights for fine-tuning

Specialized Models

Summarization: Aligned models for document summarization
Legal: Fine-tuned for legal document analysis
Medical: Specialized for healthcare applications
Multilingual: Excellent non-English capabilities

Why Together AI Wins on Speed

Hardware Optimization

Together invested in custom inference optimization:

Optimized for batch inference
Reduced latency (2-5x faster than alternatives)
Higher throughput per GPU
Efficient memory usage

Performance Numbers

Task	Together AI	Alternative
Latency	50ms	200ms
Tokens/sec	150+	30-50
Cost/1M tokens	$0.50	$2.00
Availability	99.9%	99.5%

Real-World Impact

Scenario: Generate product descriptions for 100,000 items

OpenAI: 3 hours, $500 cost
Together AI: 45 minutes, $50 cost
Speed improvement: 4x faster
Cost savings: 90% reduction

Cost Efficiency

Transparent Pricing

No surprises, no hidden fees:

LLaMA 2 7B: $0.08 per 1M input tokens
LLaMA 2 70B: $0.70 per 1M input tokens
Mistral 7B: $0.06 per 1M input tokens

Input tokens cost same as output (unlike some competitors).

Free Trial

$5 free credits
Test models before committing
No credit card required

Volume Discounts

10%+ discount for $1000+/month
Higher discounts for committed volumes
Custom pricing for enterprise

API Integration

Easy Implementation

import together

response = together.Complete.create(
    prompt="Complete this: The future of AI is",
    model="togethercomputer/llama-2-70b-chat",
    max_tokens=256,
    temperature=0.7
)

print(response['output']['choices'][0]['text'])

Language Support

Python (most complete)
JavaScript/Node.js
Java
Go
REST API for any language

Streaming Support

For real-time token generation (chatbots, interactive apps):

stream = together.Complete.stream(
    prompt="Write a poem about AI",
    model="togethercomputer/llama-2-70b",
)

for token in stream:
    print(token, end='', flush=True)

Fine-Tuning on Your Data

Custom Model Training

Together provides fine-tuning services:

Upload data: Your examples and expected outputs
Configure training: Set parameters and options
Train: Together handles infrastructure
Deploy: Use your fine-tuned model via API

Fine-Tuning Benefits

Lower cost per inference (tokens cost less)
Better performance on your task
Faster response generation
Domain-specific knowledge

Example: Customer Support

Fine-tune LLaMA 2 on 5,000 support conversations:

Results: 30% accuracy improvement
Cost: 40% less per request
Time to market: 1 week

Real-World Use Cases

Content Generation at Scale

Publishing company needs 1000 article summaries daily:

Use Mixtral 8x7B for summarization
1000 articles cost < $5/day
OpenAI would cost $50+/day
Annual savings: $16,500

Customer Chatbot

E-commerce platform with 1M daily queries:

Deploy LLaMA 2 7B for first-level triage
Together AI: $50/day
Alternative APIs: $500-800/day
Monthly savings: $15,000+

Code Generation

Development team using LLaMA Code for assistant:

Faster than cloud alternatives
Deploy models you own (no lock-in)
Fine-tune on your codebase
Control and privacy

Comparison: Together AI vs Alternatives

Feature	Together AI	OpenAI	Anthropic
Model Freedom	Excellent	Proprietary	Proprietary
Speed	Fastest	Good	Good
Cost	Lowest	Higher	Higher
Fine-tuning	Available	Limited	Limited
Data Privacy	Excellent	Good	Excellent
Model Switching	Easy	Difficult	Not possible

When to Use Together AI

Choose Together AI if:

You want to avoid vendor lock-in
Cost per token is critical
You need fast inference
You plan to fine-tune models
You want to use open-source models
You need inference at scale

Consider alternatives if:

You need cutting-edge proprietary models
You need vision/multimodal capabilities
You require bleeding-edge research models
You prioritize brand recognition

Getting Started

Sign up at together.ai (free account)
Get API key from dashboard
Browse models in their playground
Test models with free credits
Read documentation for your use case
Start building with Python/JavaScript

Advanced Features

Model Comparison Tool

Compare models side-by-side on your prompts:

Same input, different models
See quality and speed tradeoffs
Find best model for your task
Make data-driven decisions

Batch Processing

For non-urgent inference:

30% cost reduction
Ideal for reports, summaries, analysis
Scheduled execution
Perfect for overnight processing

Conclusion

Together AI represents a fundamental shift in how enterprises think about LLM inference. Instead of being locked into one vendor’s proprietary models, you can leverage the best open-source models available. The combination of superior speed, lower costs, and complete freedom makes Together AI compelling for any organization running inference at scale. For cost-conscious teams, those needing custom fine-tuning, or companies committed to open-source, Together AI is not just an alternative—it’s the smarter choice. Start with their free trial and compare the speed and cost yourself.