Amarsia
Client Usage

Multimodal

Send text and files together in one request so the assistant can understand everything at once.

Simple idea

content is a list. Each list item is one thing you send:

  • text message
  • image
  • video
  • audio
  • file URL (like PDF link)

Structure the API expects

Item typeRequired fieldsExample
texttext{ "type": "text", "text": "Summarize this" }
imagemime_type, file_uri{ "type": "image", "mime_type": "image/jpeg", "file_uri": "https://..." }
videomime_type, file_uri{ "type": "video", "mime_type": "video/mp4", "file_uri": "https://..." }
audiomime_type, file_uri{ "type": "audio", "mime_type": "audio/mpeg", "file_uri": "https://..." }
urlmime_type, file_uri{ "type": "url", "mime_type": "application/pdf", "file_uri": "https://..." }

Clear example

{
  "content": [
    {
      "type": "text",
      "text": "Read this brochure and tell me key offers."
    },
    {
      "type": "image",
      "mime_type": "image/jpeg",
      "file_uri": "https://example.com/brochure-page.jpg"
    },
    {
      "type": "url",
      "mime_type": "application/pdf",
      "file_uri": "https://example.com/price-list.pdf"
    }
  ]
}

Response shape (regular API)

{
  "content": "Here are the key offers...",
  "model": "MODEL_NAME",
  "input_tokens": 120,
  "output_tokens": 80
}