Inference Service

LLM chat completions via AI Gateway

The Inference Service provides access to multiple LLM providers through a unified OpenAI-compatible API.

POST /inference/v1/chat/completions

Send chat messages and receive model completions.

Request

POST https://console.mira.network/inference/v1/chat/completions
Authorization: Bearer mk_inference_YOUR_API_KEY
Content-Type: application/json

Body Parameters

ParameterTypeRequiredDescription
modelstringYesModel identifier (e.g., openai/gpt-4o-mini)
messagesarrayYesArray of message objects
temperaturenumberNoSampling temperature (0-2, default: 1)
max_tokensnumberNoMaximum tokens in response
streambooleanNoEnable streaming (default: false)

Message Object

FieldTypeDescription
rolestringsystem, user, or assistant
contentstringThe message content

Available Models

Model IDProviderDescription
openai/gpt-4o-miniOpenAIFast, cost-effective model
openai/gpt-4oOpenAIMost capable OpenAI model
anthropic/claude-sonnet-4-5AnthropicClaude Sonnet 4.5

Example Request

curl -X POST https://console.mira.network/inference/v1/chat/completions \
  -H "Authorization: Bearer mk_inference_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699999999,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Response Schema

interface ChatCompletionResponse {
  id: string;
  object: 'chat.completion';
  created: number;
  model: string;
  choices: Choice[];
  usage: Usage;
}

interface Choice {
  index: number;
  message: Message;
  finish_reason: 'stop' | 'length' | 'content_filter';
}

interface Message {
  role: 'assistant';
  content: string;
}

interface Usage {
  prompt_tokens: number;
  completion_tokens: number;
  total_tokens: number;
}

JavaScript Example

async function chat(messages, apiKey) {
  const response = await fetch('https://console.mira.network/inference/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiKey}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'openai/gpt-4o-mini',
      messages,
    }),
  });

  const data = await response.json();
  return data.choices[0].message.content;
}

// Usage
const response = await chat([
  { role: 'system', content: 'You are a helpful assistant.' },
  { role: 'user', content: 'Hello!' },
], 'mk_inference_YOUR_API_KEY');

console.log(response);

Multi-turn Conversation

Include previous messages to maintain context:

const messages = [
  { role: 'system', content: 'You are a helpful assistant.' },
  { role: 'user', content: 'My name is Alice.' },
  { role: 'assistant', content: 'Nice to meet you, Alice!' },
  { role: 'user', content: 'What is my name?' },
];

const response = await chat(messages, apiKey);
// "Your name is Alice."

GET /inference/v1/health

Health check endpoint. Does not require authentication.

curl https://console.mira.network/inference/v1/health

Response:

{
  "status": "ok",
  "service": "inference-service",
  "version": "1.0.0"
}