ComfyUI 有哪些基本节点

ComfyUI 是一个基于节点(node-based)的 Stable Diffusion 用户界面,因其高度可定制化和灵活性而受到广泛欢迎。它的核心在于通过连接各种功能节点来构建图像生成的工作流(workflow)。以下是一些 ComfyUI 中常用的组件(节点),我会用通俗的语言介绍它们的用途,帮助你快速上手!

1. Load Checkpoint(加载检查点)

作用:这是工作流的基础节点,用来加载 Stable Diffusion 模型(比如 SD 1.5、SDXL 等)。它会同时加载模型的三个部分:U-Net(生成核心)、CLIP(文字理解)和 VAE(图像编码解码)。
通俗解释:就像给画家准备画笔和颜料,这个节点把“画图的大脑”加载进来。
常见用法:选择一个 .ckpt 或 .safetensors 文件,启动整个生成过程。

2. CLIP Text Encode(CLIP 文本编码)

作用:把你输入的文字提示(prompt)转化成模型能理解的“数字语言”(嵌入向量),分为正向提示(positive prompt)和负向提示(negative prompt)。
通俗解释:相当于把你的描述翻译给画家听,比如“画一只猫”或者“别画得太模糊”。
常见用法:接上 Load Checkpoint 的 CLIP 输出,一个用于想要的内容,一个用于避免的内容。

3. KSampler(采样器)

作用:控制从噪声生成图像的过程,可以选择不同的采样方法(比如 Euler、DPM++)和步数(steps)。
通俗解释:就像画家画画时决定用几笔完成作品,步数多画得细致但慢,步数少则快但粗糙。
常见用法:连接 U-Net 模型和 CLIP 编码后的提示,调整步数(20-50 常见)和 CFG(引导强度,7-12 常见)。

4. VAE Decode(VAE 解码)

作用:把 KSampler 生成的“潜在图像”(latent image,一种压缩数据)解码成最终的像素图像。
通俗解释:相当于把画家的草稿变成成品画。
常见用法:接上 KSampler 的输出和 Load Checkpoint 的 VAE 输出,生成可见图像。

5. Save Image(保存图像)

作用:把生成的图像保存到硬盘上。
通俗解释:就像把画好的画裱起来存档。
常见用法:接上 VAE Decode 的输出,指定保存路径和文件名。

6. Empty Latent Image(空潜在图像)

作用:生成一个空白的“画布”(潜在空间的噪声),指定图像尺寸(比如 512x512 或 1024x1024)。
通俗解释:给画家一张白纸,让他从零开始画。
常见用法:作为 KSampler 的输入起点,尺寸要符合模型要求(SD 1.5 用 512x512,SDXL 用 1024x1024)。

7. Preview Image(预览图像)

作用:在界面上直接显示生成的图像,不用保存就能看结果。
通俗解释:让画家先给你看一眼成品,觉得行再保存。
常见用法:接上 VAE Decode 的输出,方便调试工作流。

8. Conditioning (Set Area)(条件区域设置)

作用:给提示加上区域限制,比如“左边画猫,右边画狗”。
通俗解释:告诉画家在画布的哪块地方画什么。
常见用法:结合 CLIP Text Encode,用于局部控制生成内容。

9. LoRA Loader(LoRA 加载器)

作用:加载 LoRA 模型,微调主模型的风格或特征(比如动漫风、特定角色)。
通俗解释:给画家加个“风格滤镜”,让他画得更有个性。
常见用法:接上 Load Checkpoint 的 MODEL 和 CLIP 输出,调整 LoRA 强度(通常 0.5-1.0)。

10. ControlNet(控制网络)

作用:通过额外的参考图(比如线稿、边缘图、姿势图)控制生成图像的结构。
通俗解释:给画家一张草图,让他照着画细节。
常见用法:需要配合 ControlNet 模型文件,接上 KSampler,输入参考图像。

11. VAE Encode(VAE 编码)

作用:把一张普通图片编码成潜在空间的表示,用于图像到图像(img2img)生成。
通俗解释:把一张旧画交给画家,让他改一改。
常见用法:输入现有图片,接上 KSampler 开始改造。

12. Upscale Model(放大模型)

作用:加载超分辨率模型(比如 ESRGAN、SwinIR)来放大图像。
通俗解释:给画加个放大镜,让它变得更清晰。
常见用法:接上生成的图像,进一步提升分辨率。
一个简单工作流的例子
一个基础的文字到图像工作流可能是这样的:
Load Checkpoint:加载 SDXL 模型。
CLIP Text Encode:输入“一只猫在阳光下”。
Empty Latent Image:设置 1024x1024 的画布。
KSampler:用 30 步和 Euler 方法生成。
VAE Decode:把结果解码成图像。
Preview Image:预览一下。

总结

这些组件是 ComfyUI 的“基本工具箱”,掌握它们就能搭建简单的生成流程。随着需求增加,你可能会用到更多高级节点,比如:
Latent Upscale(潜在空间放大)
Inpaint(修图节点)
AnimateDiff(动画生成)
ComfyUI 的魅力在于它的模块化设计,你可以根据需要自由组合这些节点。如果你是新手,建议从默认工作流开始,逐步尝试添加 LoRA 或 ControlNet 等功能。有什么具体想深入了解的组件,随时问我!

What are the Basic Nodes in ComfyUI

ComfyUI is a node-based user interface for Stable Diffusion, widely popular for its high customizability and flexibility. Its core lies in building image generation workflows by connecting various functional nodes. Here are some commonly used components (nodes) in ComfyUI. I will explain their purposes in plain language to help you get started quickly!

1. Load Checkpoint

Function: This is the foundational node of a workflow, used to load Stable Diffusion models (such as SD 1.5, SDXL, etc.). It loads three parts of the model simultaneously: U-Net (generation core), CLIP (text understanding), and VAE (image encoding/decoding).
Plain Explanation: It’s like preparing brushes and paints for a painter; this node loads the “drawing brain”.
Common Usage: Select a .ckpt or .safetensors file to start the entire generation process.

2. CLIP Text Encode

Function: Converts your text prompts into “digital language” (embedding vectors) that the model can understand, divided into positive prompts and negative prompts.
Plain Explanation: Equivalent to translating your description for the painter, such as “draw a cat” or “don’t make it too blurry”.
Common Usage: Connect to the CLIP output of Load Checkpoint, one for desired content and one for content to avoid.

3. KSampler

Function: Controls the process of generating images from noise. You can choose different sampling methods (such as Euler, DPM++) and steps.
Plain Explanation: Like a painter deciding how many strokes to use to finish a work. More steps mean more detail but slower, fewer steps mean faster but rougher.
Common Usage: Connect the U-Net model and CLIP encoded prompts, adjust steps (20-50 is common) and CFG (guidance scale, 7-12 is common).

4. VAE Decode

Function: Decodes the “latent image” (compressed data) generated by KSampler into the final pixel image.
Plain Explanation: Equivalent to turning the painter’s draft into a finished painting.
Common Usage: Connect the output of KSampler and the VAE output of Load Checkpoint to generate a visible image.

5. Save Image

Function: Saves the generated image to the hard drive.
Plain Explanation: Like framing the finished painting and archiving it.
Common Usage: Connect the output of VAE Decode, specify the save path and filename.

6. Empty Latent Image

Function: Generates a blank “canvas” (noise in latent space), specifying image dimensions (such as 512x512 or 1024x1024).
Plain Explanation: Giving the painter a blank sheet of paper to start drawing from scratch.
Common Usage: As the input starting point for KSampler, dimensions should match model requirements (512x512 for SD 1.5, 1024x1024 for SDXL).

7. Preview Image

Function: Displays the generated image directly on the interface without saving it.
Plain Explanation: Letting the painter show you the finished product first; save it if you like it.
Common Usage: Connect the output of VAE Decode for easy workflow debugging.

8. Conditioning (Set Area)

Function: Adds area restrictions to prompts, such as “draw a cat on the left, draw a dog on the right”.
Plain Explanation: Telling the painter what to draw in which part of the canvas.
Common Usage: Combined with CLIP Text Encode, used for local control of generated content.

9. LoRA Loader

Function: Loads LoRA models to fine-tune the style or features of the main model (such as anime style, specific characters).
Plain Explanation: Adding a “style filter” to the painter to make the drawing more personalized.
Common Usage: Connect the MODEL and CLIP outputs of Load Checkpoint, adjust LoRA strength (usually 0.5-1.0).

10. ControlNet

Function: Controls the structure of the generated image through additional reference images (such as line art, edge maps, pose maps).
Plain Explanation: Giving the painter a sketch to follow for details.
Common Usage: Requires a ControlNet model file, connects to KSampler, inputs reference image.

11. VAE Encode

Function: Encodes a normal image into a representation in latent space, used for image-to-image (img2img) generation.
Plain Explanation: Handing an old painting to the painter and asking them to modify it.
Common Usage: Input existing image, connect to KSampler to start modification.

12. Upscale Model

Function: Loads super-resolution models (such as ESRGAN, SwinIR) to upscale images.
Plain Explanation: Giving the painting a magnifying glass to make it clearer.
Common Usage: Connect the generated image to further improve resolution.

Example of a Simple Workflow

A basic text-to-image workflow might look like this:

  1. Load Checkpoint: Load SDXL model.
  2. CLIP Text Encode: Input “A cat in the sunlight”.
  3. Empty Latent Image: Set a 1024x1024 canvas.
  4. KSampler: Generate using 30 steps and Euler method.
  5. VAE Decode: Decode the result into an image.
  6. Preview Image: Take a look.

Summary

These components are the “basic toolbox” of ComfyUI. Mastering them allows you to build simple generation flows. As your needs increase, you might use more advanced nodes, such as:

  • Latent Upscale
  • Inpaint
  • AnimateDiff (Animation generation)

The charm of ComfyUI lies in its modular design, allowing you to freely combine these nodes as needed. If you are a beginner, it is recommended to start with the default workflow and gradually try adding features like LoRA or ControlNet. If you have any specific components you want to know more about, feel free to ask!