不看恼恨图片挨次(图片挨次代码)

时间：2025-04-04 03:49:16 来源：仰不愧天网作者：探索阅读：540次

这是恼恨一款经由 Docker 以及八十行摆布的 Python 代码，实现一款相似 Midjourney 民间图片剖析功能 Describe 的图片图片 Pr

不看恼恨图片挨次(图片挨次代码)

申明：本文转载自苏洋大佬博客内文章，可实用转载注明原文链接原文链接https://soulteary.com/2023/04/05/eighty-lines-of-code-to-implement-the-open-source-midjourney-and-stable-diffusion-spell-drawing-tool.html

这是挨次挨次一款经由 Docker 以及八十行摆布的 Python 代码，实现一款相似 Midjourney 民间图片剖析功能 Describe 的代码 Prompt 工具让你在玩 Midjourney、Stable Diffusion 这种模子时，恼恨再也不为天生 Prompt 形貌挠头。图片图片

▍"咒语"做图工具简介该工具提供两个版本，挨次挨次分说反对于 CPU 以及 GPU 推理运用，代码假如你有一张大于 8GB 显存的恼恨显卡，可能欢喜的图片图片运用全副的功能，假如你惟独 CPU，挨次挨次那末也可能运用 CPU 版本的代码运用来妨碍偷懒。

Github：https://github.com/soulteary/docker-prompt-generator

“咒语”做图工具名目该工具是由作者苏洋开拓而后开源的，作者自己开拓这个工具最后的图片图片想法也是为了一个目的：“利便且快”，俗称“懒”，挨次挨次作者原话是这样说的：“昨晚在玩 Midjourney 的时候，在想 Prompt 的时候，想到挠头。

作为一个懒人，计上心头：能不能让模子帮我天生 Prompt 呢，输入一些关键词概况句子，而后让挨次辅助我实现残缺的 Prompt 内容（俗话：文生文）”而恰在此时Midjourney 民间又宣告了新功能，“describe”，反对于剖析图片为多少段差距的 Prompt 文本，并反对于不断妨碍图片天生。

（俗话：图生文，而后文生图）

Midjourney 民间的“图生文”功能：describe这个功能给了大多巨匠，致使行业外的人，更多的体验对于懒人体验也颇为宜可是民间确定不会开源此功能，以是苏洋大佬灵机一动，就有了开拓这款工具的初衷。

▍“作图咒语天生器” 的运用为了更快的上手以及运用到这个工具，咱们需要先实现情景的配置装备部署运用以及 Docker 情景豫备基于 Docker 以及 Nvidia 民间根基容器的深度钻研情景《基于 Docker 的深度钻研情景（入门篇）》-作者：苏洋。

https://soulteary.com/2023/03/22/docker-based-deep-learning-environment-getting-started.html纯 CPU 也是可能上手的：《在搭载 M1 及 M2 芯片 MacBook配置装备部署上玩 Stable Diffusion 模子》-作者：苏洋

https://soulteary.com/2022/12/10/play-the-stable-diffusion-model-on-macbook-devices-with-m1-and-m2-chips.html

在豫备好 Docker 情景的配置装备部署之后，咱们就能不断玩啦咱们随意找一个适宜的目录，运用 git clone 概况下载 Zip 缩短包的方式，先把“Docker Prompt Generator(Docker 作图咒语天生器)”名目的代码下载到当地。

git clone https://github.com/soulteary/docker-prompt-generator.git# or curl -sL -o docker-prompt-generator.zip https:

//github.com/soulteary/docker-prompt-generator/archive/refs/heads/main.zip接着，进入名目目录，运用 Nvidia 原厂的 PyTorch Docker 根基镜像来实现根基情景的构建，比照于咱们直接从 DockerHub 拉制作好的镜像，自行构建将能节约大批光阴。

咱们在名目目录中实施下面的命令，就能实现运用模子运用的构建啦：# 构建根基镜像dockerbuild -t soulteary/prompt-generator:base . -f docker/Dockerfile.base

# 构建 CPU 运用dockerbuild -t soulteary/prompt-generator:cpu . -f docker/Dockerfile.cpu # 构建 GPU 运用docker

build -t soulteary/prompt-generator:gpu . -f docker/Dockerfile.gpu而后，凭证你的硬件情景，抉择性实施下面的命令，就能启动一个带有 Web UI 界面的模子运用啦。

# 运行 CPU 镜像 docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --rm -it -p 7860:7860 soulteary/prompt-generator:cpu

# 运行 GPU 镜像 docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --rm -it -p 7860:7860 soulteary/prompt-generator:gpu

咱们在浏览器中输入运行容器的宿主机的 IP 地址，就能启动运用工具啦运用工具工具的运用，颇为简略，分说有运用“图片天生形貌”以及运用“文本天生形貌”两种我找了一张以前模子天生的图片，而后将这张图片喂给这个挨次，点击按钮，就能取患上图片的形貌文本啦。

将图片剖析为形貌文本咱们可能在 Midjourney 概况 Stable Diffusion 中，直接运用这段文原本不断天生图片，概况运用“从文本中天生”，来扩展内容，让内容更适宜 Midjourney 这种运用。

为了展现工具的中文翻译以及续写能耐，咱们径自写一段简略的中文形貌：“一只小鸟立梢头，一轮明月当空照，一片黄叶铺枝头”。

运用中文天生图片天生“咒语”（形貌）可能看到，基于咱们的输入内容，天生为了颇为多差距的文本。想要验证文本内容是否适宜原意，咱们可能将内容粘贴到 Midjourney 中妨碍测试。

运用下面两段文原本天生图片由于模子存在随机性，假如想要患上到更好的服从，还需要对于形貌妨碍更多的调解优化，不外，看起来工具剖析图片，天生的形貌，着实是可能做到开箱即用的，而凭证咱们的不痛不痒天生的文本，也天生出了适宜要求的图片。

这次试验中相对于好的服从▍模子运勤勉用实现下面是工具的实现流程以及思考，假如你想钻研或者快捷运用开源模子名目来构建你的 AI 容器运用，可能不断浏览运勤勉用妄想在“入手”前，咱们需要先清晰功能妄想，以及思考运用甚么样的技术来做详细功能的技术反对于。

在我同样艰深运用 Stable Diffusion、Midjourney 的历程中，每一每一有三个场景挠头：我惟独一些关键词，需要发挥想象利巴关键词串起来，而后喂给模子运用假如形貌内容不够好，概况关键词之间的分割关连比力远，那末图片的天生下场就不会特意好。

我有一张图片，想让模子环抱图片中的内容，好比：构图、某些元素、激情等妨碍二次创作，而不是简略的做图片中的元素交流我更习气运用中文做形貌，而不是英文，可是当初模子天生图片，想要好的下场，需要运用英文，总是借助翻译工具，切换挨次界面概况网页，仍是挺省事的。

处置第一个下场，咱们可能运用最近火爆出圈的 GPT-4 的尊长的尊长：GPT-2 着实就能知足需要，将内容（一句话、多少个关键词）妨碍快捷续写比照力运用 GPT-3 / GPT-4，无需联网，也无需付费，模子文件更是“重价大碗”，用 CPU 就能跑起来。

处置第二个下场，咱们可能运用 OpenAI 在一年前推出的 CLIP 神经收集模子，以及 Salesforce 推出的 BLIP ，可能从图片中抽掏出最适宜的形貌文本，让咱们用在新的 AIGC 图片天生使掷中。

稍作优化调解，咱们惟独要约莫运用 6～8GB 显存就能将这部份功能的模子跑起来处置第三个下场，咱们可能运用赫尔辛基大学开源的 OPUS MT 模子，实现将中文翻译为英文，进一步偷懒，以及处置下面两类原始模子不反对于中文输入的下场。

由于前两个场景下场中的模子不反对于中文，而我又是一个懒人，不想输入英文来玩图，以是咱们先来处置第三个下场，让全部运用实现流程更丝滑中文 Prompt 翻译为英文 Prompt 功能想要实现第一个懒人功能，从用户输入的中文内容中，自动天生英文，咱们需要运用中英双语的翻译模子。

赫尔辛基大学的开源机关将预磨炼模子凋谢在了 HuggingFace 社区，Helsinki-NLP/opus-mt-zh-en咱们可能经由写十五行简略的 Python 代码，来实现模子文件的下载，以及实现将中文自动转换为适宜的英文内容的功能。

好比下面的例子中，挨次运行竣事，将输入《火影忍者》中的金句“青春不能转头，以是青春不尽头”的译文from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

import torch model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-zh-en").eval() tokenizer = AutoTokenizer.from_pretrained(

"Helsinki-NLP/opus-mt-zh-en") deftranslate(text):with torch.no_grad(): encoded = tokenizer([text], return_tensors=

"pt") sequences = model.generate(**encoded) return tokenizer.batch_decode(sequences, skip_special_tokens=

True)[0] input = "青春不能转头，以是青春不尽头 ——《火影忍者》" print(input, translate(input))将下面的代码保存为 translate.py，而后实施 python translate.py，期待模子下载竣事，咱们将患上到相似下面的服从：。

青春不能转头，以是青春不尽头 Youth cant turn back, so theres no end to youth.是否看起来还不错？这部份代码保存在了名目中的 soulteary/docker-prompt-generator/app/translate.py。

接下来，咱们来实现 Prompt “收费续杯”（有逻辑续写）功能实现 MidJourney Prompt 续写功能基于一些内容，妨碍不断的内容天生，是天生类模子的看家本领，好比巨匠已经熟习的不能再熟习的 ChatGPT 眼前的 GPT 模子系列。

作者也找到了一个 Google 去职守业的“外洋大姐” 基于 GPT-2 运用 25 万条 MidJourney 数据 fine-tune 好的 GPT2 模子：succinctly/text2image-prompt-generator

，试了试下场不错，那末咱们就用它来实现这部份功能吧以及下面同样，咱们实现一个不到 30 行的简略的挨次，就能实现模子自动下载，以及调用模子凭证咱们的输入内容（上文中热血台词的翻译）天生一些适宜 Midjourney 或者 Stable Diffusion 的新的 Prompt 内容：。

from transformers import pipeline, set_seed import random import re text_pipe = pipeline(text-generation

, model=succinctly/text2image-prompt-generator) def text_generate(input): seed = random.randint(

100, 1000000) set_seed(seed) for count in range(6): sequences = text_pipe(input, max_length=

random.randint(60, 90), num_return_sequences=8) list = [] for sequence in sequences: line = sequence[

generated_text].strip() if line != inputandlen(line) > (len(input) + 4) and line.endswith((

":", "-", "—")) is False: list.append(line) result = "\n".join(list) result = re.

sub([^ ]+\.[^ ]+,, result) result = result.replace("", "") if result !=

"": return result if count == 5: return result input = "Youth cant turn back, so theres no end to youth."

print(input, text_generate(input))咱们将下面的代码保存为 text-generation.py，而后实施 python text-generation.py，稍等片刻咱们将患上到相似下面的内容：

# Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation. Youth cant turn back, so theres no

endto youth. Youth cant turn back, so theres noendto youth. Young, handsome, confident, lonely boy sitting

on his cant turn back, so theres noendto youth. Whereold yang waits, young man on the streets of Bangkok::

10 film poster::10 photorealism, postprocessing, low angle::10 Trending on artstation::8 —ar 47:82 Youth can

t turn back, so theres noendto youth. By Karel Thole and Mike Mignola --ar 2:3 Youth cant turn back, so there

s noendto youth. And there is a bright hope about a future where there will be time.内容看起来彷佛还不错，咱们直接在 Midjourney 中输入测试，将患上到相似下面的服从。

将咱们天生的 Prompt 内容，运用 Midjourney 妨碍图片天生看起来算是及格了，这部份代码在名目的 soulteary/docker-prompt-generator/app/text-generation.py

实现为了两个功能之后，咱们来实现凭证图片内容天生 Prompt 形貌的运勤勉用实现凭证图片天生 Prompt 形貌功能比照力下面两个功能，运用 CPU 就能搞定，内容生乐成用也颇为高可是想要快捷的凭证图片天生 Prompt 则需要显卡的反对于。

不外凭证我的试验，运行起来惟独要 6～8GB 摆布的显存，仍是比力省钱的（不显卡可能运用云效率器替换，买个按量的，玩罢销毁即可）这里，咱们仍是是实现一段简略的，不到 30 行的 Python 代码，实现模子下载、运用加载、图片下载，以及将图片转换为 Prompt 的功能：

from clip_interrogator import Config, Interrogator import torch config = Config() config.device = cuda

if torch.cuda.is_available() elsecpu config.blip_offload = Falseif torch.cuda.is_available() elseTrue

config.chunk_size = 2048 config.flavor_intermediate_count = 512 config.blip_num_beams = 64 config.clip_model_name =

"ViT-H-14/laion2b_s32b_b79k" ci = Interrogator(config) defget_prompt_from_image(image):return ci.interrogate(image.convert(

RGB)) import requests import shutil r = requests.get("https://pic1.zhimg.com/v2-6e056c49362bff9af1eb39ce530ac0c6_1440w.jpg?source=d16d100b"

, stream=True) if r.status_code == 200: with open(./image.jpg, wb) as f: r.raw.decode_content =

True shutil.copyfileobj(r.raw, f) from PIL import Image print(get_prompt_from_image(Image.open(

./image.jpg)))代码中的图片，运用了我专栏中上一篇文章的题图（同样运用 Midjourney 天生）将下面的内容保存为 clip.py，而后实施 python clip.py，稍等片刻，咱们将患上到相似下面的服从：。

# WARNING:root:Pytorch pre-release version 1.14.0a0+410ce96 - assuming intent to test itLoadingBLIPmodel...

loadcheckpointfromhttps://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large_caption.pth

LoadingCLIPmodel...LoadedCLIPmodelanddatain8.29seconds.100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|

55/55[00:00<00:00,316.23it/s]Flavor chain:38%|███████████████████████████████████████████████████████▏

|12/32[00:04<00:07,2.74it/s]100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|

55/55[00:00<00:00,441.49it/s]100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|

6/6[00:00<00:00,346.74it/s]100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|

50/50[00:00<00:00,457.84it/s]arobotwithaspeechbubbleonabluebackground,highlydetailedhyperrealretro,artificial

intelligence!!,toyphotography,byE妹妹aAndijewska,markingsonrobot,computergenerated,blueish,delete,small

gadget,animated,bluebody,inretrocolors从服从中看，形貌仍是比力精确的这部份代码在了名目的 soulteary/docker-midjourney-prompt-generator/app/clip.py。

好啦，到当初为止，三个主要功能，就都实现竣事了接下来，咱们借助 Docker 以及 Gradio 来实现 Web UI 以及一键运行的模子容器运用运用 Docker 构建 AI 运用容器接下来，咱们来实现 AI 运用的容器构建以及相关代码编写。

前文中提到，咱们将实现两个版本的运用，分说反对于 CPU 以及 GPU 来实现快捷的 AI 模子推理功能由于后者可能向下兼容前者，以是咱们先来实现一个搜罗前两个运勤勉用，CPU 就能跑的模子根基镜像实现惟独要 CPU 运行的运用容器镜像。

散漫上文中的代码，Dockerfile 文件不难编写：FROM nvcr.io/nvidia/pytorch:22.12-py3 LABEL org.opencontainers.image.authors="soulteary@gmail.com" RUN pip config

set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple && \ pip install transformers sentencepiece sacremoses && \ pip

install gradio WORKDIR /app RUN cat > /get-models.py <

Helsinki-NLP/opus-mt-zh-en) AutoTokenizer.from_pretrained(Helsinki-NLP/opus-mt-zh-en) pipeline(text-generation

, model=succinctly/text2image-prompt-generator) EOF RUN python /get-models.py && \ rm -rf /get-models.py

将下面的内容保存为 Dockerfile.base，而后运用 docker build -t soulteary/prompt-generator:base . -f Dockerfile.base ，稍等片刻，搜罗了模子文件的根基运用模子就搞定啦。

[+] Building 189.5s (7/8) => [internal] load .dockerignore

0.0s => => transferring context: 2B

0.0s => [internal] load build definition from Dockerfile.base

0.0s => => transferring dockerfile: 692B

0.0s => [internal] load metadata for nvcr.io/nvidia/pytorch:22.12-py3

0.0s => [1/5] FROM nvcr.io/nvidia/pytorch:22.12-py3

0.0s => CACHED [2/5] RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple && pip install transformers sentencepiece sacremoses && pip install gradio 0.0s

=> CACHED [3/5] WORKDIR /app

0.0s => CACHED [4/5] RUN cat > /get-models.py <

0.0s => [5/5] RUN python /get-models.py && rm -rf /get-models.py

189.4s => => # Downloading (…)olve/main/source.spm: 100%|██████████| 805k/805k [00:06<00:00, 130kB/s]

=> => # Downloading (…)olve/main/target.spm: 100%|██████████| 807k/807k [00:01<00:00, 440kB/s]

=> => # Downloading (…)olve/main/vocab.json: 100%|██████████| 1.62M/1.62M [00:01<00:00, 1.21MB/s]

=> => # Downloading (…)lve/main/config.json: 100%|██████████| 907/907 [00:00<00:00, 499kB/s]

=> => # Downloading pytorch_model.bin: 100%|██████████| 665M/665M [00:11<00:00, 57.2MB/s]

=> => # Downloading (…)okenizer_config.json: 100%|██████████| 255/255 [00:00<00:00, 81.9kB/s]实现历程中，我这边的构建光阴约莫要 5 分钟摆布，可能从椅子上起来，动一动，听首歌放松一会。

镜像构建竣事，可能运用下面的命令，进中计罗模子以及 PyTorch 情景的 Docker 镜像在这个镜像中，咱们可能逍遥的运用前两个功能相关的模子：docker run --gpus all --ipc=host --。

ulimit memlock=-1 --ulimit stack=67108864 --rm -it -p 7680:7680 soulteary/prompt-generator:base bash有了情景之后，咱们来不断实现一个简略的 Web UI，实现上文中的懒人功能：让模子凭证咱们输入的中文内容，天生可能绘制高品质图片的 Prompt：

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM import torch model = AutoModelForSeq2SeqLM.from_pretrained(Helsinki-NLP/opus-mt-zh-en).eval() tokenizer = AutoTokenizer.from_pretrained(Helsinki-NLP/opus-mt-zh-en) def translate(text):

with torch.no_grad(): encoded = tokenizer([text], return_tensors=pt) sequences = model.generate(**encoded)

return tokenizer.batch_decode(sequences, skip_special_tokens=True)[0] from transformers import pipeline, set_seed

import random import re text_pipe = pipeline(text-generation, model=succinctly/text2image-prompt-generator

) def text_generate(input): seed = random.randint(100, 1000000) set_seed(seed) text_in_english =

translate(input) forcountinrange(6): sequences = text_pipe(text_in_english, max_length=random.randint(

60, 90), num_return_sequences=8) list = [] forsequencein sequences: line =

sequence[generated_text].strip() if line != text_in_english andlen(line) > (len(text_in_english) +

4) and line.endswith((:, -, —)) isFalse: list.append(line) result = "\n".join(

list) result = re.sub([^ ]+\.[^ ]+,, result) result = result.replace(, )

ifresult != : returnresultifcount == 5: returnresultimport gradio as gr with

gr.Blocks() asblock: with gr.Column(): with gr.Tab(文本天生): input = gr.Textbox(

lines=6, label=你的想法, placeholder=在此输入内容...) output = gr.Textbox(lines=6, label=天生的 Prompt

) submit_btn = gr.Button(快给我编) submit_btn.click( fn=text_generate, inputs=

input, outputs=output ) block.queue(max_size=64).launch(show_api=False, enable_queue=True

, debug=True, share=False, server_name=0.0.0.0)在容器情景中建树一个名为 webui.cpu.py 的文件，而后运用 python webui.cpu.py，将看到相似下面的日志输入：

Running on local URL: http://0.0.0.0:7860 To create a publiclink, set`share=True`in`launch()`.而后咱们在浏览器中掀开容器地址配置装备部署的 IP （假如在本机运行，可能碰头 http://127.0.0.1:7860 ，就能碰头 Web 效率啦。

随意输入点甚么，它都能给你不断往下编咱们不才面的输入框里输入一些内容，而后点击“快给我编”按钮，就能患上到一堆模子编进去的 Prompt 内容啦实现完“文生文”功能之后，咱们来实现“图生文”相关功能实现需要 GPU 运行的运用容器镜像

散漫上文，实现 GPU 相关功能需要的容器情景也不难：FROM soulteary/prompt-generator:base LABEL org.opencontainers.image.authors="soulteary@gmail.com" RUN pip config

set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple && \ pip install clip_interrogator git+https://github.com/pharmapsychotic/BLIP.git@lib

#egg=blip RUN cat > /get-models.py <

cudaif torch.cuda.is_available() elsecpu config.blip_offload = Falseif torch.cuda.is_available() else

True config.chunk_size = 2048 config.flavor_intermediate_count = 512 config.blip_num_beams = 64 config.clip_model_name =

"ViT-H-14/laion2b_s32b_b79k" ci = Interrogator(config) EOF RUN python /get-models.py && \ rm -rf /

get-models.py将下面的内容保存为 Dockerfile.gpu 文件，而后运用 docker build -t soulteary/prompt-generator:gpu . -f Dockerfile.gpu 实现镜像的构建。

急躁期待镜像构建竣事，运用下面的命令，可能进中计罗三种模子以及 PyTorch 情景的 Docker 镜像：docker run --gpus all --ipc=host --ulimit memlock=-1 --

ulimit stack=67108864 --rm -it -p 7680:7680 soulteary/prompt-generator:gpu bash接着，来编写可能调用三种模子能耐的 Python 挨次：