Multimodal

Simple idea

content is a list of inputs:

text
image
video
audio
url (like PDF links)

The content shape is the same across run, stream, conversation.run, and conversation.stream.

Input types

Item `type`	Required fields	Example
`text`	`text`	`{ "type": "text", "text": "Summarize this" }`
`image`	`mime_type`, `file_uri`	`{ "type": "image", "mime_type": "image/jpeg", "file_uri": "https://..." }`
`video`	`mime_type`, `file_uri`	`{ "type": "video", "mime_type": "video/mp4", "file_uri": "https://..." }`
`audio`	`mime_type`, `file_uri`	`{ "type": "audio", "mime_type": "audio/mpeg", "file_uri": "https://..." }`
`url`	`mime_type`, `file_uri`	`{ "type": "url", "mime_type": "application/pdf", "file_uri": "https://..." }`

SDK example

const result = await client.run({
  content: [
    { type: "text", text: "Summarize this brochure and PDF." },
    {
      type: "image",
      mime_type: "image/jpeg",
      file_uri: "https://example.com/brochure-page.jpg",
    },
    {
      type: "url",
      mime_type: "application/pdf",
      file_uri: "https://example.com/price-list.pdf",
    },
  ],
})

console.log(result.content)

const streamed = await client.stream({
  content: [
    { type: "text", text: "Describe this brochure image live." },
    {
      type: "image",
      mime_type: "image/jpeg",
      file_uri: "https://example.com/brochure-page.jpg",
    },
  ],
})

client.conversation.start("conv_existing_123")
await client.conversation.run({
  content: [
    { type: "text", text: "Use this PDF to answer follow-up questions." },
    {
      type: "url",
      mime_type: "application/pdf",
      file_uri: "https://example.com/price-list.pdf",
    },
  ],
})

Response shape

{
  "content": "Here are the key offers...",
  "model": "MODEL_NAME",
  "input_tokens": 120,
  "output_tokens": 80
}

Multimodal

Simple idea

Input types

SDK example

Response shape

Related

On this page