Vision
scx vision models support multimodal inputs, allowing users to process both text and images. These models analyze images and generate context-aware text responses. Learn how to query scx vision models using either the OpenAI Python client.
Make a query with an image
On scx, the vision model request follows OpenAI’s multimodal input format which accepts both text and image inputs in a structured payload. While the call is similar to Text Generation, it differs by including an encoded image file, referenced via the image_path variable. A helper function is used to convert this image into a base64 string, allowing it to be passed alongside the text in the request.
Step 1
Make a new Python file and copy the code below.;
This example uses the Llama-4-Maverick-17B-128E-Instruct model.
Step 2
Use your scx API key and base URL from the API keys and URLs page to replace the string fields "your-scx-api-key" and "https://api.scx.ai/v1"in the construction of the client.
Step 3
Select an image and move it to a suitable path that you can specify in the lines.
Step 4
Verify the prompt to pair with the image in the content portion of the user prompt.
Step 5
Run the Python file to receive the text output.