Moondream 3 Preview answers natural-language questions about an image. Pass an image URL and a question ("What is the person doing?", "How many cars are in the lot?", "What does the sign say?") and it returns a concise text answer. It is a compact, efficient vision-language model built for frontier-level visual reasoning — reading text in a scene (OCR), counting and identifying objects, describing what is happening, and grounding answers in fine image detail — while staying fast and inexpensive to run at scale.

## Best for - Visual question answering: asking free-form questions about a photo, screenshot, chart, or document image - Reading text inside images (signs, labels, receipts, handwriting) and answering questions about it - Counting, identifying, and locating objects or people in a scene - Describing image content for accessibility, moderation triage, or content tagging - High-volume image understanding where cost-per-call and latency matter

## Choose another model when - You want to generate or edit an image rather than describe one — use a text-to-image or image-editing model - You have no image and only need a text answer — use a text-only language model - You need pixel-precise bounding boxes or segmentation masks as structured output rather than a written answer — use a detection or segmentation model

## Tips - Ask one clear, specific question per call; specifying the desired answer format ("answer in one short sentence", "reply with just the number") tightens the output. - Leave `temperature` at its default of 0 for deterministic, factual answers; raise it (up to 1) only when you want more varied phrasing. - Keep `reasoning` enabled (default) to also receive the model's step-by-step reasoning trace alongside the answer; set it to `false` for just the final answer and lower latency.

## Advanced Configuration - `reasoning` (boolean, default `true`): when `true`, the response includes the model's detailed reasoning behind the answer; when `false`, the reasoning trace is omitted and only the answer is returned. - `temperature` (number 0–1, default `0`): sampling temperature for the answer. `0` is deterministic; higher values increase variety. - `top_p` (number 0–1): nucleus-sampling probability mass, an alternative way to control answer diversity.

To run via the ModelRunner JavaScript client: ```js import { modelrunner } from "@modelrunner/client";

const result = await modelrunner.subscribe("moondream/moondream3-preview/query", { input: { image_url: "https://storage.googleapis.com/falserverless/example_inputs/moondream-3-preview/query_in.jpg", prompt: "What is in this image? Answer in one short sentence.", reasoning: false, }, }); ```

moondream / moondream3-preview/query

Input

Output

Examples

Model Details

moondream / moondream3-preview/query

Model Input

Input

Model Output

Output

Model Example Requests

Examples

Model Details

Model Details