Available ComfyUI Nodes
This guide covers the available nodes and requirements for creating real-time video pipelines using ComfyUI with Livepeer
Required Video Input/Output Nodes
These nodes are required for creating real-time video pipelines
ComfyStream
ComfyStream
- Input:
- Video stream URL or device ID
- Optional configuration parameters
- Output:
- RGB frame tensor (3, H, W)
- Frame metadata (timestamp, index)
- Performance Requirements:
- Frame processing time: < 5ms
- VRAM usage: < 500MB
- Buffer size: ≤ 2 frames
- Supported formats: RTMP, WebRTC, V4L2
- Best Practices:
- Set fixed frame rate
Inference Nodes
Nodes for analyzing video frames in real-time. They can be used for tasks like object detection, segmentation, and depth estimation
Depth Anything TensorRT
Depth Anything TensorRT
- Input: RGB frame (3, H, W)
- Output: Depth map (1, H, W)
- Performance Requirements:
- Inference time: < 20ms
- VRAM usage: 2GB
- Batch size: 1
- Best Practices:
- Place early in workflow
- Cache results for static scenes
- Use lowest viable resolution
- Input: RGB frame (3, H, W)
- Output: Depth map (1, H, W)
- Performance Requirements:
- Inference time: < 20ms
- VRAM usage: 2GB
- Batch size: 1
- Best Practices:
- Place early in workflow
- Cache results for static scenes
- Use lowest viable resolution
- Navigate to the
custom_nodes
directory in your ComfyUI workspace
- Clone the repository
- Install the node
- Download the TensorRT onnx file and build the engine
- Copy the TensorRT engine file to the ComfyUI models directory
Segment Anything 2
Segment Anything 2
Useful for tracking an object given positive and negative prompt coordinates. It can be combined with Florence2 and other object detection nodes.
- Input: RGB frame (3, H, W)
- Output: Segmentation mask (1, H, W)
- Performance Requirements:
- Inference time: < 30ms
- VRAM usage: 3GB
- Batch size: 1
- Best Practices:
- Cache static masks
- Use mask erosion for stability
- Implement confidence thresholding
Useful for tracking an object given positive and negative prompt coordinates. It can be combined with Florence2 and other object detection nodes.
- Input: RGB frame (3, H, W)
- Output: Segmentation mask (1, H, W)
- Performance Requirements:
- Inference time: < 30ms
- VRAM usage: 3GB
- Batch size: 1
- Best Practices:
- Cache static masks
- Use mask erosion for stability
- Implement confidence thresholding
- Navigate to the
custom_nodes
directory in your ComfyUI workspace
- Clone the repository
- Install requirements
For Windows, ensure the prerequisites are installed by following the Windows section in Install Nodes in ComfyUI
Florence 2
Florence 2
- Input: RGB frame (3, H, W)
- Output: Feature vector (1, 512)
- Performance Requirements:
- Inference time: < 15ms
- VRAM usage: 1GB
- Batch size: 1
- Best Practices:
- Cache embeddings for known references
- Use cosine similarity for matching
- Implement feature vector normalization
- Input: RGB frame (3, H, W)
- Output: Feature vector (1, 512)
- Performance Requirements:
- Inference time: < 15ms
- VRAM usage: 1GB
- Batch size: 1
- Best Practices:
- Cache embeddings for known references
- Use cosine similarity for matching
- Implement feature vector normalization
- Navigate to the
custom_nodes
directory in your ComfyUI workspace
- Clone the repository
- Install requirements
Download example workflows here
Generative and Control Nodes
Nodes for applying visual effects and controlling video content in real-time
ComfyUI Real-Time Nodes
ComfyUI Real-Time Nodes
A suite of nodes for real-time video processing and control. Some examples of nodes included are:
- Value: FloatControl, IntControl and StringControl
- Sequence: FloatSequence, IntSequence and StringSequence
- Motion: MotionController, IntegerMotionController, ROINode
- Utility: FPSMonitor, QuickShapeMask, DTypeConverter, FastWebcamCapture
- VAE: TAESDVaeEncodem, TAESDVaeDecode
Refer to the README.MD for more details
A suite of nodes for real-time video processing and control. Some examples of nodes included are:
- Value: FloatControl, IntControl and StringControl
- Sequence: FloatSequence, IntSequence and StringSequence
- Motion: MotionController, IntegerMotionController, ROINode
- Utility: FPSMonitor, QuickShapeMask, DTypeConverter, FastWebcamCapture
- VAE: TAESDVaeEncodem, TAESDVaeDecode
Refer to the README.MD for more details
- Navigate to the
custom_nodes
directory in your ComfyUI workspace
- Clone the repository and install requirements
Download example workflows here
StreamDiffusion
StreamDiffusion
This node is useful for styling videos with a diffusion effect. It can be combined with other nodes to create masked effects.
- Input: RGB frame (3, H, W)
- Output: Feature vector (1, 512)
This node is useful for styling videos with a diffusion effect. It can be combined with other nodes to create masked effects.
- Input: RGB frame (3, H, W)
- Output: Feature vector (1, 512)
- Navigate to the
custom_nodes
directory in your ComfyUI workspace
- Clone the repository
- Install requirements
For Windows, ensure the prerequisites are installed by following the Windows section in Install Nodes in ComfyUI
Download example workflows here
NVIDIA TensorRT
NVIDIA TensorRT
The NVIDIA TensorRT plugin optimizes stable diffusion performance by generating a static TensorRT engine based on static StableDiffusion parameters.
The NVIDIA TensorRT plugin optimizes stable diffusion performance by generating a static TensorRT engine based on static StableDiffusion parameters.
Refer to the README for install and usage instructions.
LivePortraitKJ
LivePortraitKJ
- Input:
- Source image (3, H, W)
- Driving frame (3, H, W)
- Output: Animated frame (3, H, W)
- Performance Requirements:
- Inference time: < 50ms
- VRAM usage: 4GB
- Batch size: 1
- Best Practices:
- Pre-process source images
- Implement motion smoothing
- Cache facial landmarks
- Input:
- Source image (3, H, W)
- Driving frame (3, H, W)
- Output: Animated frame (3, H, W)
- Performance Requirements:
- Inference time: < 50ms
- VRAM usage: 4GB
- Batch size: 1
- Best Practices:
- Pre-process source images
- Implement motion smoothing
- Cache facial landmarks
- Navigate to the
custom_nodes
directory in your ComfyUI workspace
- Clone the repository
- Install requirements
ComfyUI Diffusers
ComfyUI Diffusers
- Input:
- Conditioning tensor
- Latent tensor
- Output: Generated frame (3, H, W)
- Performance Requirements:
- Inference time: < 50ms
- VRAM usage: 4GB
- Maximum steps: 20
- Best Practices:
- Use TensorRT optimization
- Implement denoising strength control
- Cache conditioning tensors
ComfyUI-load-image-from-url Node
ComfyUI-load-image-from-url Node
- Input:
- Control signal
- Target tensor
- Output: Controlled tensor
- Performance Requirements:
- Inference time: < 30ms
- VRAM usage: 2GB
- Resolution: ≤ 512
- Best Practices:
- Use adaptive conditioning
- Implement strength scheduling
- Cache control signals
Supporting Nodes
These nodes are used to provide inputs, prompts and other supporting functions for the video pipeline
K Sampler
K Sampler
- Input:
- Latent tensor
- Conditioning
- Output: Sampled latent
- Performance Requirements:
- Maximum steps: 20
- VRAM usage: 2GB
- Scheduler: euler_ancestral
- Best Practices:
- Use adaptive step sizing
- Cache conditioning tensors
Prompt Control
Prompt Control
- Input: Text prompts
- Output: Conditioning tensors
- Performance Requirements:
- Processing time: < 5ms
- VRAM usage: minimal
- Best Practices:
- Cache common prompts
- Use consistent style tokens
- Implement prompt weighting
VAE
VAE
- Input: Latent tensor
- Output: RGB frame
- Performance Requirements:
- Inference time: < 10ms
- VRAM usage: 1GB
- Tile size: 512
- Best Practices:
- Use tiling for large frames
- Implement half-precision
- Cache common latents
IPAdapter
IPAdapter
- Input:
- Reference image
- Target tensor
- Output: Conditioned tensor
- Performance Requirements:
- Inference time: < 20ms
- VRAM usage: 2GB
- Reference resolution: ≤ 512x512
- Best Practices:
- Cache reference embeddings
- Use consistent weights
- Implement cross-attention
Cache Nodes
Cache Nodes
- Input: Any tensor
- Output: Cached tensor
- Performance Requirements:
- Access time: < 1ms
- Maximum size: 2GB
- Cache type: GPU
- Best Practices:
- Implement LRU eviction
- Monitor cache pressure
- Clear on scene changes
ControlNet
ControlNet
- Input:
- Control signal
- Target tensor
- Output: Controlled tensor
- Performance Requirements:
- Inference time: < 30ms
- VRAM usage: 2GB
- Resolution: ≤ 512
- Best Practices:
- Use adaptive conditioning
- Implement strength scheduling
- Cache control signals
Default Nodes
All default nodes that ship with ComfyUI are available. The list below is subject to change.