Vision

scx vision models support multimodal inputs, allowing users to process both text and images. These models analyze images and generate context-aware text responses. Learn how to query scx vision models using either the OpenAI Python client.

Make a query with an image

On scx, the vision model request follows OpenAI’s multimodal input format which accepts both text and image inputs in a structured payload. While the call is similar to Text Generation, it differs by including an encoded image file, referenced via the image_path variable. A helper function is used to convert this image into a base64 string, allowing it to be passed alongside the text in the request.

Step 1

Make a new Python file and copy the code below.;

1from openai import OpenAI
2import base64
3
4client = OpenAI(
5    base_url="https://api.scx.ai/v1",
6    api_key="your-scx-api-key",
7)
8
9# Helper function to encode the image
10def encode_image(image_path):
11  with open(image_path, "rb") as image_file:
12    return base64.b64encode(image_file.read()).decode('utf-8')
13
14# The path to your image
15image_path = "sample.JPEG"
16
17# The base64 string of the image
18image_base64 = encode_image(image_path)
19
20print(image_base64)
21
22response = client.chat.completions.create(
23    model="Llama-4-Maverick-17B-128E-Instruct",
24    messages=[
25        {
26            "role": "user",
27            "content": [
28                {"type": "text", "text": "What is happening in this image?"},
29                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_base64}"}}
30            ]
31        }
32    ]
33)
34
35print(response.choices[0].message.content)
36

Step 2

Use your scx API key and base URL from the API keys and URLs page to replace the string fields "your-scx-api-key" and "https://api.scx.ai/v1"in the construction of the client.

Step 3

Select an image and move it to a suitable path that you can specify in the lines.

1# The path to your image
2image_path = "sample.JPEG"
3

Step 4

Verify the prompt to pair with the image in the content portion of the user prompt.

Step 5

Run the Python file to receive the text output.