ML.VisionToText¶
Name | Mandatory | Description | Default | Type |
---|---|---|---|---|
⬅️ Input |
The input of the shard, if any | Image |
||
Output ➡️ |
The resulting output of the shard | String |
||
Model |
No | The Moondream2 model to use. | none |
Var(Model) |
Tokenizer |
No | The tokenizer to use. | none |
Var(Tokenizer) |
Prompt |
No | The prompt to use for the vision-to-text generation. | none |
String |
Temperature |
No | Temperature for text generation (0.0 for deterministic output). | 0.5 |
Float |
TopP |
No | Top-p sampling value (0.0-1.0, 0.0 to disable). | 0.9 |
Float |
RepeatPenalty |
No | Penalty for repeating tokens (1.0 means no penalty). | 1 |
Float |
MaxTokens |
No | Maximum number of tokens to generate. | 512 |
Int |
GPU |
No | Whether to use the GPU (if available). | false |
Bool |
Seed |
No | The seed to use for the generation. | 42 |
Int Var(Int) |
Complete vision-to-text pipeline using Moondream2 model. Takes an image tensor as input and outputs text based on a prompt.