Skip to content

ML.VisionToText

Name Mandatory Description Default Type
⬅️ Input The input of the shard, if any Image
Output ➡️ The resulting output of the shard String
Model No The Moondream2 model to use. none Var(Model)
Tokenizer No The tokenizer to use. none Var(Tokenizer)
Prompt No The prompt to use for the vision-to-text generation. none String
Temperature No Temperature for text generation (0.0 for deterministic output). 0.5 Float
TopP No Top-p sampling value (0.0-1.0, 0.0 to disable). 0.9 Float
RepeatPenalty No Penalty for repeating tokens (1.0 means no penalty). 1 Float
MaxTokens No Maximum number of tokens to generate. 512 Int
GPU No Whether to use the GPU (if available). false Bool
Seed No The seed to use for the generation. 42 IntVar(Int)

Complete vision-to-text pipeline using Moondream2 model. Takes an image tensor as input and outputs text based on a prompt.