spring AI (7) Wenshengtu DALLE3 + SD

Spring AI supports the generation of text-to-image. It is based on existing solutions. The code for generating text-to-image is simple, only about 5 lines. However, let's still record it.

DALLE3#

This is the text-to-image model released by OpenAI. It is basically the most commonly used one. It is better than professional SD and MJ. The key point is that it is simple. You don't need to think about prompt words, just describe it in natural language.

Since the dependencies and configurations of OpenAI have been set up before, there is no need to change them again.

First, create an OpenAiImageModel object.

private final OpenAiImageModel openAiImageModel;

Then, just copy the official example here.

ImageResponse response = openaiImageModel.call(
        new ImagePrompt("A light cream colored mini golden doodle",
        OpenAiImageOptions.builder()
                .withQuality("hd")
                .withN(4)
                .withHeight(1024)
                .withWidth(1024).build())

);

Here are some main parameters explained when building:

Parameter	Explanation
Model	Model, default is DALL_E_3
Quality	Quality of the generated image, only supported by dall-e-3
N	Number of generated images
Width	Width of the generated image. For dall-e-2, it must be one of 256x256, 512x512, or 1024x1024. For dall-e-3, it must be one of 1024x1024, 1792x1024, or 1024x1792
Height	Height of the generated image
Style	Style of the generated image, must be either vivid or natural. The former is more realistic and the latter is more natural. Only supported by dall-e-3

Finally, return the image.

/**
     * Call OpenAI's dall-e-3 to generate an image
     * @param message Prompt words
     * @return ImageResponse
     */
    @GetMapping(value = "/openImage", produces = "text/html")
    public String openImage(@RequestParam String message) {
        ImageResponse imageResponse = openAiImageModel.call(new ImagePrompt(message,
                OpenAiImageOptions.builder()
                        .withModel(OpenAiImageApi.DEFAULT_IMAGE_MODEL)
                        .withQuality("hd")
                        .withN(1)
                        .withWidth(1024)
                        .withHeight(1024).build())
        );

        String url = imageResponse.getResult().getOutput().getUrl();
        System.err.println(url);
        return "<img src='" + url + "'/>";
    }

OpenAI provides two ways to return the image: one is to return the URL of the image, and the other is to return the BASE64 encoding of the image.

Stability AI#

At first, I thought that Stability AI could call the local Stable Diffusion interface for image generation. After trying, I found that it was not possible. Stability AI is an online drawing platform launched by Stability, which is different from Stable Diffusion.

First, register an account on the Stability AI official website to get free credits.
After registration, copy the API KEY.

Then, add the dependency.

 <!--  stability dependency   -->
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-stability-ai-spring-boot-starter</artifactId>
        </dependency>

And configure it.

spring:
  ai:
    stabilityai:
      api-key: sk-xxx

Then, call it in the same way.

/**
     * Call stability to generate an image
     * @param message Prompt words
     * @return ImageResponse
     */
    @GetMapping(value = "/sdImage", produces = "text/html")
    public String sdImage(@RequestParam String message) {
        ImageResponse imageResponse = stabilityAiImageModel.call(
                new ImagePrompt(message,
                        StabilityAiImageOptions.builder()
                                .withStylePreset("cinematic")
                                .withN(1)
                                .withHeight(512)
                                .withWidth(768).build())

        );

        String b64Json = imageResponse.getResult().getOutput().getB64Json();
        String mimeType = "image/png";
        String dataUrl = "data:" + mimeType + ";base64," + b64Json;

        return "<img src='" + dataUrl + "' alt='Image'/>";

    }

Please note that:

For Stability AI, please use English prompt words. Chinese may not work.
Stability AI only returns the image in base64 format, and the returned URL is null.

Zhipu#

I also tried another AI painting model called Zhipu, but I found that the free credits provided after registration only support the dialogue model, not the text-to-image model. Moreover, to use the latest model for free trial, real-name authentication is required... It's a bit unreasonable, so I gave up.