Bump ComfyUI version to v0.3.18

Add to README that the Wan model is supported.
Readme changes.
2025-02-26 21:19:14 -05:00 · 2025-02-26 20:48:50 -05:00 · 2025-02-26 20:47:08 -05:00 · 2025-02-26 20:45:13 -05:00 · 2025-02-26 20:34:02 -05:00 · 2025-02-26 17:59:10 -05:00
73 changed files with 12476 additions and 8491 deletions
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
 <div align="center">

 # ComfyUI
-**The most powerful and modular diffusion model GUI and backend.**
+**The most powerful and modular visual AI engine and application.**


 [![Website][website-shield]][website-url]
@@ -31,10 +31,24 @@
 ![ComfyUI Screenshot](https://github.com/user-attachments/assets/7ccaf2c1-9b72-41ae-9a89-5688c94b7abe)
 </div>

-This ui will let you design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart based interface. For some workflow examples and see what ComfyUI can do you can check out:
-### [ComfyUI Examples](https://comfyanonymous.github.io/ComfyUI_examples/)
+ComfyUI lets you design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart based interface. Available on Windows, Linux, and macOS.
+
+## Get Started
+
+#### [Desktop Application](https://www.comfy.org/download)
+- The easiest way to get started. 
+- Available on Windows & macOS.
+
+#### [Windows Portable Package](#installing)
+- Get the latest commits and completely portable.
+- Available on Windows.
+
+#### [Manual Install](#manual-install-windows-linux)
+Supports all operating systems and GPU types (NVIDIA, AMD, Intel, Apple Silicon, Ascend).
+
+## [Examples](https://comfyanonymous.github.io/ComfyUI_examples/)
+See what ComfyUI can do with the [example workflows](https://comfyanonymous.github.io/ComfyUI_examples/).

-### [Installing ComfyUI](#installing)

 ## Features
 - Nodes/graph/flowchart interface to experiment and create complex Stable Diffusion workflows without needing to code anything.
@@ -54,6 +68,7 @@ This ui will let you design and execute advanced stable diffusion pipelines usin
   - [LTX-Video](https://comfyanonymous.github.io/ComfyUI_examples/ltxv/)
   - [Hunyuan Video](https://comfyanonymous.github.io/ComfyUI_examples/hunyuan_video/)
   - [Nvidia Cosmos](https://comfyanonymous.github.io/ComfyUI_examples/cosmos/)
+   - [Wan 2.1](https://comfyanonymous.github.io/ComfyUI_examples/wan/)
 - [Stable Audio](https://comfyanonymous.github.io/ComfyUI_examples/audio/)
 - Asynchronous Queue system
 - Many optimizations: Only re-executes the parts of the workflow that changes between executions.
@@ -121,7 +136,7 @@ Workflow examples can be found on the [Examples page](https://comfyanonymous.git

 # Installing

-## Windows
+## Windows Portable

 There is a portable standalone build for Windows that should work for running on Nvidia GPUs or for running on your CPU only on the [releases page](https://github.com/comfyanonymous/ComfyUI/releases).

@@ -141,6 +156,15 @@ See the [Config file](extra_model_paths.yaml.example) to set the search paths fo

 To run it on services like paperspace, kaggle or colab you can use my [Jupyter Notebook](notebooks/comfyui_colab.ipynb)

+
+## [comfy-cli](https://docs.comfy.org/comfy-cli/getting-started)
+
+You can install and start ComfyUI using comfy-cli:
+```bash
+pip install comfy-cli
+comfy install
+```
+
 ## Manual Install (Windows, Linux)

 python 3.13 is supported but using 3.12 is recommended because some custom nodes and their dependencies might not support it yet.
@@ -237,6 +261,13 @@ For models compatible with Ascend Extension for PyTorch (torch_npu). To get star
 3. Next, install the necessary packages for torch-npu by adhering to the platform-specific instructions on the [Installation](https://ascend.github.io/docs/sources/pytorch/install.html#pytorch) page.
 4. Finally, adhere to the [ComfyUI manual installation](#manual-install-windows-linux) guide for Linux. Once all components are installed, you can run ComfyUI as described earlier.

+#### Cambricon MLUs
+
+For models compatible with Cambricon Extension for PyTorch (torch_mlu). Here's a step-by-step guide tailored to your platform and installation method:
+
+1. Install the Cambricon CNToolkit by adhering to the platform-specific instructions on the [Installation](https://www.cambricon.com/docs/sdk_1.15.0/cntoolkit_3.7.2/cntoolkit_install_3.7.2/index.html)
+2. Next, install the PyTorch(torch_mlu) following the instructions on the [Installation](https://www.cambricon.com/docs/sdk_1.15.0/cambricon_pytorch_1.17.0/user_guide_1.9/index.html)
+3. Launch ComfyUI by running `python main.py`

 # Running

@@ -293,6 +324,8 @@ Use `--tls-keyfile key.pem --tls-certfile cert.pem` to enable TLS/SSL, the app w

 ## Support and dev channel

+[Discord](https://comfy.org/discord): Try the #help or #feedback channels.
+
 [Matrix space: #comfyui_space:matrix.org](https://app.element.io/#/room/%23comfyui_space%3Amatrix.org) (it's like discord but open source).

 See also: [https://www.comfy.org/](https://www.comfy.org/)
@@ -309,7 +342,7 @@ For any bugs, issues, or feature requests related to the frontend, please use th

 The new frontend is now the default for ComfyUI. However, please note:

-1. The frontend in the main ComfyUI repository is updated weekly.
+1. The frontend in the main ComfyUI repository is updated fortnightly.
 2. Daily releases are available in the separate frontend repository.

 To use the most up-to-date frontend version:
@@ -326,7 +359,7 @@ To use the most up-to-date frontend version:
   --front-end-version Comfy-Org/ComfyUI_frontend@1.2.2
   ```

-This approach allows you to easily switch between the stable weekly release and the cutting-edge daily updates, or even specific versions for testing purposes.
+This approach allows you to easily switch between the stable fortnightly release and the cutting-edge daily updates, or even specific versions for testing purposes.

 ### Accessing the Legacy Frontend

--- a/api_server/routes/internal/internal_routes.py
+++ b/api_server/routes/internal/internal_routes.py
@@ -1,8 +1,9 @@
 from aiohttp import web
 from typing import Optional
-from folder_paths import folder_names_and_paths
+from folder_paths import folder_names_and_paths, get_directory_by_type
 from api_server.services.terminal_service import TerminalService
 import app.logger
+import os

 class InternalRoutes:
    '''
@@ -50,6 +51,20 @@ class InternalRoutes:
                response[key] = folder_names_and_paths[key][0]
            return web.json_response(response)

+        @self.routes.get('/files/{directory_type}')
+        async def get_files(request: web.Request) -> web.Response:
+            directory_type = request.match_info['directory_type']
+            if directory_type not in ("output", "input", "temp"):
+                return web.json_response({"error": "Invalid directory type"}, status=400)
+
+            directory = get_directory_by_type(directory_type)
+            sorted_files = sorted(
+                (entry for entry in os.scandir(directory) if entry.is_file()),
+                key=lambda entry: -entry.stat().st_mtime
+            )
+            return web.json_response([entry.name for entry in sorted_files], status=200)
+
+
    def get_app(self):
        if self._app is None:
            self._app = web.Application()
--- a/comfy/cli_args.py
+++ b/comfy/cli_args.py
@@ -191,3 +191,6 @@ if args.windows_standalone_build:

 if args.disable_auto_launch:
    args.auto_launch = False
+
+if args.force_fp16:
+    args.fp16_unet = True
--- a/comfy/clip_model.py
+++ b/comfy/clip_model.py
@@ -104,7 +104,8 @@ class CLIPTextModel_(torch.nn.Module):
            mask = 1.0 - attention_mask.to(x.dtype).reshape((attention_mask.shape[0], 1, -1, attention_mask.shape[-1])).expand(attention_mask.shape[0], 1, attention_mask.shape[-1], attention_mask.shape[-1])
            mask = mask.masked_fill(mask.to(torch.bool), -torch.finfo(x.dtype).max)

-        causal_mask = torch.empty(x.shape[1], x.shape[1], dtype=x.dtype, device=x.device).fill_(-torch.finfo(x.dtype).max).triu_(1)
+        causal_mask = torch.full((x.shape[1], x.shape[1]), -torch.finfo(x.dtype).max, dtype=x.dtype, device=x.device).triu_(1)
+
        if mask is not None:
            mask += causal_mask
        else:
--- a/comfy/comfy_types/node_typing.py
+++ b/comfy/comfy_types/node_typing.py
@@ -66,13 +66,26 @@ class IO(StrEnum):
        b = frozenset(value.split(","))
        return not (b.issubset(a) or a.issubset(b))

+class RemoteInputOptions(TypedDict):
+    route: str
+    """The route to the remote source."""
+    refresh_button: bool
+    """Specifies whether to show a refresh button in the UI below the widget."""
+    control_after_refresh: Literal["first", "last"]
+    """Specifies the control after the refresh button is clicked. If "first", the first item will be automatically selected, and so on."""
+    timeout: int
+    """The maximum amount of time to wait for a response from the remote source in milliseconds."""
+    max_retries: int
+    """The maximum number of retries before aborting the request."""
+    refresh: int
+    """The TTL of the remote input's value in milliseconds. Specifies the interval at which the remote input's value is refreshed."""

 class InputTypeOptions(TypedDict):
    """Provides type hinting for the return type of the INPUT_TYPES node function.

    Due to IDE limitations with unions, for now all options are available for all types (e.g. `label_on` is hinted even when the type is not `IO.BOOLEAN`).

-    Comfy Docs: https://docs.comfy.org/essentials/custom_node_datatypes
+    Comfy Docs: https://docs.comfy.org/custom-nodes/backend/datatypes
    """

    default: bool | str | float | int | list | tuple
@@ -113,6 +126,14 @@ class InputTypeOptions(TypedDict):
    # defaultVal: str
    dynamicPrompts: bool
    """Causes the front-end to evaluate dynamic prompts (``STRING``)"""
+    # class InputTypeCombo(InputTypeOptions):
+    image_upload: bool
+    """Specifies whether the input should have an image upload button and image preview attached to it. Requires that the input's name is `image`."""
+    image_folder: Literal["input", "output", "temp"]
+    """Specifies which folder to get preview images from if the input has the ``image_upload`` flag.
+    """
+    remote: RemoteInputOptions
+    """Specifies the configuration for a remote input."""


 class HiddenInputTypeDict(TypedDict):
@@ -133,7 +154,7 @@ class HiddenInputTypeDict(TypedDict):
 class InputTypeDict(TypedDict):
    """Provides type hinting for node INPUT_TYPES.

-    Comfy Docs: https://docs.comfy.org/essentials/custom_node_more_on_inputs
+    Comfy Docs: https://docs.comfy.org/custom-nodes/backend/more_on_inputs
    """

    required: dict[str, tuple[IO, InputTypeOptions]]
@@ -143,14 +164,14 @@ class InputTypeDict(TypedDict):
    hidden: HiddenInputTypeDict
    """Offers advanced functionality and server-client communication.

-    Comfy Docs: https://docs.comfy.org/essentials/custom_node_more_on_inputs#hidden-inputs
+    Comfy Docs: https://docs.comfy.org/custom-nodes/backend/more_on_inputs#hidden-inputs
    """


 class ComfyNodeABC(ABC):
    """Abstract base class for Comfy nodes.  Includes the names and expected types of attributes.

-    Comfy Docs: https://docs.comfy.org/essentials/custom_node_server_overview
+    Comfy Docs: https://docs.comfy.org/custom-nodes/backend/server_overview
    """

    DESCRIPTION: str
@@ -167,7 +188,7 @@ class ComfyNodeABC(ABC):
    CATEGORY: str
    """The category of the node, as per the "Add Node" menu.

-    Comfy Docs: https://docs.comfy.org/essentials/custom_node_server_overview#category
+    Comfy Docs: https://docs.comfy.org/custom-nodes/backend/server_overview#category
    """
    EXPERIMENTAL: bool
    """Flags a node as experimental, informing users that it may change or not work as expected."""
@@ -181,9 +202,9 @@ class ComfyNodeABC(ABC):

        * Must include the ``required`` key, which describes all inputs that must be connected for the node to execute.
        * The ``optional`` key can be added to describe inputs which do not need to be connected.
-        * The ``hidden`` key offers some advanced functionality.  More info at: https://docs.comfy.org/essentials/custom_node_more_on_inputs#hidden-inputs
+        * The ``hidden`` key offers some advanced functionality.  More info at: https://docs.comfy.org/custom-nodes/backend/more_on_inputs#hidden-inputs

-        Comfy Docs: https://docs.comfy.org/essentials/custom_node_server_overview#input-types
+        Comfy Docs: https://docs.comfy.org/custom-nodes/backend/server_overview#input-types
        """
        return {"required": {}}

@@ -198,7 +219,7 @@ class ComfyNodeABC(ABC):

    By default, a node is not considered an output. Set ``OUTPUT_NODE = True`` to specify that it is.

-    Comfy Docs: https://docs.comfy.org/essentials/custom_node_server_overview#output-node
+    Comfy Docs: https://docs.comfy.org/custom-nodes/backend/server_overview#output-node
    """
    INPUT_IS_LIST: bool
    """A flag indicating if this node implements the additional code necessary to deal with OUTPUT_IS_LIST nodes.
@@ -209,7 +230,7 @@ class ComfyNodeABC(ABC):

    A node can also override the default input behaviour and receive the whole list in a single call. This is done by setting a class attribute `INPUT_IS_LIST` to ``True``.

-    Comfy Docs: https://docs.comfy.org/essentials/custom_node_lists#list-processing
+    Comfy Docs: https://docs.comfy.org/custom-nodes/backend/lists#list-processing
    """
    OUTPUT_IS_LIST: tuple[bool]
    """A tuple indicating which node outputs are lists, but will be connected to nodes that expect individual items.
@@ -227,7 +248,7 @@ class ComfyNodeABC(ABC):
    the node should provide a class attribute `OUTPUT_IS_LIST`, which is a ``tuple[bool]``, of the same length as `RETURN_TYPES`,
    specifying which outputs which should be so treated.

-    Comfy Docs: https://docs.comfy.org/essentials/custom_node_lists#list-processing
+    Comfy Docs: https://docs.comfy.org/custom-nodes/backend/lists#list-processing
    """

    RETURN_TYPES: tuple[IO]
@@ -237,19 +258,19 @@ class ComfyNodeABC(ABC):

        RETURN_TYPES = (IO.INT, "INT", "CUSTOM_TYPE")

-    Comfy Docs: https://docs.comfy.org/essentials/custom_node_server_overview#return-types
+    Comfy Docs: https://docs.comfy.org/custom-nodes/backend/server_overview#return-types
    """
    RETURN_NAMES: tuple[str]
    """The output slot names for each item in `RETURN_TYPES`, e.g. ``RETURN_NAMES = ("count", "filter_string")``

-    Comfy Docs: https://docs.comfy.org/essentials/custom_node_server_overview#return-names
+    Comfy Docs: https://docs.comfy.org/custom-nodes/backend/server_overview#return-names
    """
    OUTPUT_TOOLTIPS: tuple[str]
    """A tuple of strings to use as tooltips for node outputs, one for each item in `RETURN_TYPES`."""
    FUNCTION: str
    """The name of the function to execute as a literal string, e.g. `FUNCTION = "execute"`

-    Comfy Docs: https://docs.comfy.org/essentials/custom_node_server_overview#function
+    Comfy Docs: https://docs.comfy.org/custom-nodes/backend/server_overview#function
    """


@@ -267,7 +288,7 @@ class CheckLazyMixin:
        Params should match the nodes execution ``FUNCTION`` (self, and all inputs by name).
        Will be executed repeatedly until it returns an empty list, or all requested items were already evaluated (and sent as params).

-        Comfy Docs: https://docs.comfy.org/essentials/custom_node_lazy_evaluation#defining-check-lazy-status
+        Comfy Docs: https://docs.comfy.org/custom-nodes/backend/lazy_evaluation#defining-check-lazy-status
        """

        need = [name for name in kwargs if kwargs[name] is None]
--- a/comfy/latent_formats.py
+++ b/comfy/latent_formats.py
@@ -407,3 +407,52 @@ class Cosmos1CV8x8x8(LatentFormat):
    ]

    latent_rgb_factors_bias = [-0.1223, -0.1889, -0.1976]
+
+class Wan21(LatentFormat):
+    latent_channels = 16
+    latent_dimensions = 3
+
+    latent_rgb_factors = [
+            [-0.1299, -0.1692,  0.2932],
+            [ 0.0671,  0.0406,  0.0442],
+            [ 0.3568,  0.2548,  0.1747],
+            [ 0.0372,  0.2344,  0.1420],
+            [ 0.0313,  0.0189, -0.0328],
+            [ 0.0296, -0.0956, -0.0665],
+            [-0.3477, -0.4059, -0.2925],
+            [ 0.0166,  0.1902,  0.1975],
+            [-0.0412,  0.0267, -0.1364],
+            [-0.1293,  0.0740,  0.1636],
+            [ 0.0680,  0.3019,  0.1128],
+            [ 0.0032,  0.0581,  0.0639],
+            [-0.1251,  0.0927,  0.1699],
+            [ 0.0060, -0.0633,  0.0005],
+            [ 0.3477,  0.2275,  0.2950],
+            [ 0.1984,  0.0913,  0.1861]
+        ]
+
+    latent_rgb_factors_bias = [-0.1835, -0.0868, -0.3360]
+
+    def __init__(self):
+        self.scale_factor = 1.0
+        self.latents_mean = torch.tensor([
+            -0.7571, -0.7089, -0.9113, 0.1075, -0.1745, 0.9653, -0.1517, 1.5508,
+            0.4134, -0.0715, 0.5517, -0.3632, -0.1922, -0.9497, 0.2503, -0.2921
+        ]).view(1, self.latent_channels, 1, 1, 1)
+        self.latents_std = torch.tensor([
+            2.8184, 1.4541, 2.3275, 2.6558, 1.2196, 1.7708, 2.6052, 2.0743,
+            3.2687, 2.1526, 2.8652, 1.5579, 1.6382, 1.1253, 2.8251, 1.9160
+        ]).view(1, self.latent_channels, 1, 1, 1)
+
+
+        self.taesd_decoder_name = None #TODO
+
+    def process_in(self, latent):
+        latents_mean = self.latents_mean.to(latent.device, latent.dtype)
+        latents_std = self.latents_std.to(latent.device, latent.dtype)
+        return (latent - latents_mean) * self.scale_factor / latents_std
+
+    def process_out(self, latent):
+        latents_mean = self.latents_mean.to(latent.device, latent.dtype)
+        latents_std = self.latents_std.to(latent.device, latent.dtype)
+        return latent * latents_std / self.scale_factor + latents_mean
--- a/comfy/ldm/flux/math.py
+++ b/comfy/ldm/flux/math.py
@@ -22,7 +22,7 @@ def attention(q: Tensor, k: Tensor, v: Tensor, pe: Tensor, mask=None) -> Tensor:

 def rope(pos: Tensor, dim: int, theta: int) -> Tensor:
    assert dim % 2 == 0
-    if comfy.model_management.is_device_mps(pos.device) or comfy.model_management.is_intel_xpu():
+    if comfy.model_management.is_device_mps(pos.device) or comfy.model_management.is_intel_xpu() or comfy.model_management.is_directml_enabled():
        device = torch.device("cpu")
    else:
        device = pos.device
--- a/comfy/ldm/hunyuan_video/model.py
+++ b/comfy/ldm/hunyuan_video/model.py
@@ -310,7 +310,7 @@ class HunyuanVideo(nn.Module):
            shape[i] = shape[i] // self.patch_size[i]
        img = img.reshape([img.shape[0]] + shape + [self.out_channels] + self.patch_size)
        img = img.permute(0, 4, 1, 5, 2, 6, 3, 7)
-        img = img.reshape(initial_shape)
+        img = img.reshape(initial_shape[0], self.out_channels, initial_shape[2], initial_shape[3], initial_shape[4])
        return img

    def forward(self, x, timestep, context, y, guidance=None, attention_mask=None, control=None, transformer_options={}, **kwargs):
--- a/comfy/ldm/lumina/model.py
+++ b/comfy/ldm/lumina/model.py
@@ -6,6 +6,7 @@ from typing import List, Optional, Tuple
 import torch
 import torch.nn as nn
 import torch.nn.functional as F
+import comfy.ldm.common_dit

 from comfy.ldm.modules.diffusionmodules.mmdit import TimestepEmbedder, RMSNorm
 from comfy.ldm.modules.attention import optimized_attention_masked
@@ -594,6 +595,8 @@ class NextDiT(nn.Module):
        t = 1.0 - timesteps
        cap_feats = context
        cap_mask = attention_mask
+        bs, c, h, w = x.shape
+        x = comfy.ldm.common_dit.pad_to_patch_size(x, (self.patch_size, self.patch_size))
        """
        Forward pass of NextDiT.
        t: (N,) tensor of diffusion timesteps
@@ -613,7 +616,7 @@ class NextDiT(nn.Module):
            x = layer(x, mask, freqs_cis, adaln_input)

        x = self.final_layer(x, adaln_input)
-        x = self.unpatchify(x, img_size, cap_size, return_tensor=x_is_tensor)
+        x = self.unpatchify(x, img_size, cap_size, return_tensor=x_is_tensor)[:,:,:h,:w]

        return -x

--- a/comfy/ldm/modules/attention.py
+++ b/comfy/ldm/modules/attention.py
@@ -30,38 +30,24 @@ ops = comfy.ops.disable_weight_init

 FORCE_UPCAST_ATTENTION_DTYPE = model_management.force_upcast_attention_dtype()

-def get_attn_precision(attn_precision):
+def get_attn_precision(attn_precision, current_dtype):
    if args.dont_upcast_attention:
        return None
-    if FORCE_UPCAST_ATTENTION_DTYPE is not None:
-        return FORCE_UPCAST_ATTENTION_DTYPE
+
+    if FORCE_UPCAST_ATTENTION_DTYPE is not None and current_dtype in FORCE_UPCAST_ATTENTION_DTYPE:
+        return FORCE_UPCAST_ATTENTION_DTYPE[current_dtype]
    return attn_precision

 def exists(val):
    return val is not None


-def uniq(arr):
-    return{el: True for el in arr}.keys()
-
-
 def default(val, d):
    if exists(val):
        return val
    return d


-def max_neg_value(t):
-    return -torch.finfo(t.dtype).max
-
-
-def init_(tensor):
-    dim = tensor.shape[-1]
-    std = 1 / math.sqrt(dim)
-    tensor.uniform_(-std, std)
-    return tensor
-
-
 # feedforward
 class GEGLU(nn.Module):
    def __init__(self, dim_in, dim_out, dtype=None, device=None, operations=ops):
@@ -96,7 +82,7 @@ def Normalize(in_channels, dtype=None, device=None):
    return torch.nn.GroupNorm(num_groups=32, num_channels=in_channels, eps=1e-6, affine=True, dtype=dtype, device=device)

 def attention_basic(q, k, v, heads, mask=None, attn_precision=None, skip_reshape=False, skip_output_reshape=False):
-    attn_precision = get_attn_precision(attn_precision)
+    attn_precision = get_attn_precision(attn_precision, q.dtype)

    if skip_reshape:
        b, _, _, dim_head = q.shape
@@ -165,7 +151,7 @@ def attention_basic(q, k, v, heads, mask=None, attn_precision=None, skip_reshape


 def attention_sub_quad(query, key, value, heads, mask=None, attn_precision=None, skip_reshape=False, skip_output_reshape=False):
-    attn_precision = get_attn_precision(attn_precision)
+    attn_precision = get_attn_precision(attn_precision, query.dtype)

    if skip_reshape:
        b, _, _, dim_head = query.shape
@@ -235,7 +221,7 @@ def attention_sub_quad(query, key, value, heads, mask=None, attn_precision=None,
    return hidden_states

 def attention_split(q, k, v, heads, mask=None, attn_precision=None, skip_reshape=False, skip_output_reshape=False):
-    attn_precision = get_attn_precision(attn_precision)
+    attn_precision = get_attn_precision(attn_precision, q.dtype)

    if skip_reshape:
        b, _, _, dim_head = q.shape
--- a/comfy/ldm/modules/diffusionmodules/model.py
+++ b/comfy/ldm/modules/diffusionmodules/model.py
@@ -297,7 +297,7 @@ def vae_attention():
    if model_management.xformers_enabled_vae():
        logging.info("Using xformers attention in VAE")
        return xformers_attention
-    elif model_management.pytorch_attention_enabled():
+    elif model_management.pytorch_attention_enabled_vae():
        logging.info("Using pytorch attention in VAE")
        return pytorch_attention
    else:
--- a/comfy/ldm/wan/model.py
+++ b/comfy/ldm/wan/model.py
@@ -0,0 +1,485 @@
+# original version: https://github.com/Wan-Video/Wan2.1/blob/main/wan/modules/model.py
+# Copyright 2024-2025 The Alibaba Wan Team Authors. All rights reserved.
+import math
+
+import torch
+import torch.nn as nn
+from einops import repeat
+
+from comfy.ldm.modules.attention import optimized_attention
+from comfy.ldm.flux.layers import EmbedND
+from comfy.ldm.flux.math import apply_rope
+from comfy.ldm.modules.diffusionmodules.mmdit import RMSNorm
+import comfy.ldm.common_dit
+import comfy.model_management
+
+
+def sinusoidal_embedding_1d(dim, position):
+    # preprocess
+    assert dim % 2 == 0
+    half = dim // 2
+    position = position.type(torch.float32)
+
+    # calculation
+    sinusoid = torch.outer(
+        position, torch.pow(10000, -torch.arange(half).to(position).div(half)))
+    x = torch.cat([torch.cos(sinusoid), torch.sin(sinusoid)], dim=1)
+    return x
+
+
+class WanSelfAttention(nn.Module):
+
+    def __init__(self,
+                 dim,
+                 num_heads,
+                 window_size=(-1, -1),
+                 qk_norm=True,
+                 eps=1e-6, operation_settings={}):
+        assert dim % num_heads == 0
+        super().__init__()
+        self.dim = dim
+        self.num_heads = num_heads
+        self.head_dim = dim // num_heads
+        self.window_size = window_size
+        self.qk_norm = qk_norm
+        self.eps = eps
+
+        # layers
+        self.q = operation_settings.get("operations").Linear(dim, dim, device=operation_settings.get("device"), dtype=operation_settings.get("dtype"))
+        self.k = operation_settings.get("operations").Linear(dim, dim, device=operation_settings.get("device"), dtype=operation_settings.get("dtype"))
+        self.v = operation_settings.get("operations").Linear(dim, dim, device=operation_settings.get("device"), dtype=operation_settings.get("dtype"))
+        self.o = operation_settings.get("operations").Linear(dim, dim, device=operation_settings.get("device"), dtype=operation_settings.get("dtype"))
+        self.norm_q = RMSNorm(dim, eps=eps, elementwise_affine=True, device=operation_settings.get("device"), dtype=operation_settings.get("dtype")) if qk_norm else nn.Identity()
+        self.norm_k = RMSNorm(dim, eps=eps, elementwise_affine=True, device=operation_settings.get("device"), dtype=operation_settings.get("dtype")) if qk_norm else nn.Identity()
+
+    def forward(self, x, freqs):
+        r"""
+        Args:
+            x(Tensor): Shape [B, L, num_heads, C / num_heads]
+            freqs(Tensor): Rope freqs, shape [1024, C / num_heads / 2]
+        """
+        b, s, n, d = *x.shape[:2], self.num_heads, self.head_dim
+
+        # query, key, value function
+        def qkv_fn(x):
+            q = self.norm_q(self.q(x)).view(b, s, n, d)
+            k = self.norm_k(self.k(x)).view(b, s, n, d)
+            v = self.v(x).view(b, s, n * d)
+            return q, k, v
+
+        q, k, v = qkv_fn(x)
+        q, k = apply_rope(q, k, freqs)
+
+        x = optimized_attention(
+            q.view(b, s, n * d),
+            k.view(b, s, n * d),
+            v,
+            heads=self.num_heads,
+        )
+
+        x = self.o(x)
+        return x
+
+
+class WanT2VCrossAttention(WanSelfAttention):
+
+    def forward(self, x, context):
+        r"""
+        Args:
+            x(Tensor): Shape [B, L1, C]
+            context(Tensor): Shape [B, L2, C]
+        """
+        # compute query, key, value
+        q = self.norm_q(self.q(x))
+        k = self.norm_k(self.k(context))
+        v = self.v(context)
+
+        # compute attention
+        x = optimized_attention(q, k, v, heads=self.num_heads)
+
+        x = self.o(x)
+        return x
+
+
+class WanI2VCrossAttention(WanSelfAttention):
+
+    def __init__(self,
+                 dim,
+                 num_heads,
+                 window_size=(-1, -1),
+                 qk_norm=True,
+                 eps=1e-6, operation_settings={}):
+        super().__init__(dim, num_heads, window_size, qk_norm, eps, operation_settings=operation_settings)
+
+        self.k_img = operation_settings.get("operations").Linear(dim, dim, device=operation_settings.get("device"), dtype=operation_settings.get("dtype"))
+        self.v_img = operation_settings.get("operations").Linear(dim, dim, device=operation_settings.get("device"), dtype=operation_settings.get("dtype"))
+        # self.alpha = nn.Parameter(torch.zeros((1, )))
+        self.norm_k_img = RMSNorm(dim, eps=eps, elementwise_affine=True, device=operation_settings.get("device"), dtype=operation_settings.get("dtype")) if qk_norm else nn.Identity()
+
+    def forward(self, x, context):
+        r"""
+        Args:
+            x(Tensor): Shape [B, L1, C]
+            context(Tensor): Shape [B, L2, C]
+        """
+        context_img = context[:, :257]
+        context = context[:, 257:]
+
+        # compute query, key, value
+        q = self.norm_q(self.q(x))
+        k = self.norm_k(self.k(context))
+        v = self.v(context)
+        k_img = self.norm_k_img(self.k_img(context_img))
+        v_img = self.v_img(context_img)
+        img_x = optimized_attention(q, k_img, v_img, heads=self.num_heads)
+        # compute attention
+        x = optimized_attention(q, k, v, heads=self.num_heads)
+
+        # output
+        x = x + img_x
+        x = self.o(x)
+        return x
+
+
+WAN_CROSSATTENTION_CLASSES = {
+    't2v_cross_attn': WanT2VCrossAttention,
+    'i2v_cross_attn': WanI2VCrossAttention,
+}
+
+
+class WanAttentionBlock(nn.Module):
+
+    def __init__(self,
+                 cross_attn_type,
+                 dim,
+                 ffn_dim,
+                 num_heads,
+                 window_size=(-1, -1),
+                 qk_norm=True,
+                 cross_attn_norm=False,
+                 eps=1e-6, operation_settings={}):
+        super().__init__()
+        self.dim = dim
+        self.ffn_dim = ffn_dim
+        self.num_heads = num_heads
+        self.window_size = window_size
+        self.qk_norm = qk_norm
+        self.cross_attn_norm = cross_attn_norm
+        self.eps = eps
+
+        # layers
+        self.norm1 = operation_settings.get("operations").LayerNorm(dim, eps, elementwise_affine=False, device=operation_settings.get("device"), dtype=operation_settings.get("dtype"))
+        self.self_attn = WanSelfAttention(dim, num_heads, window_size, qk_norm,
+                                          eps, operation_settings=operation_settings)
+        self.norm3 = operation_settings.get("operations").LayerNorm(
+            dim, eps,
+            elementwise_affine=True, device=operation_settings.get("device"), dtype=operation_settings.get("dtype")) if cross_attn_norm else nn.Identity()
+        self.cross_attn = WAN_CROSSATTENTION_CLASSES[cross_attn_type](dim,
+                                                                      num_heads,
+                                                                      (-1, -1),
+                                                                      qk_norm,
+                                                                      eps, operation_settings=operation_settings)
+        self.norm2 = operation_settings.get("operations").LayerNorm(dim, eps, elementwise_affine=False, device=operation_settings.get("device"), dtype=operation_settings.get("dtype"))
+        self.ffn = nn.Sequential(
+            operation_settings.get("operations").Linear(dim, ffn_dim, device=operation_settings.get("device"), dtype=operation_settings.get("dtype")), nn.GELU(approximate='tanh'),
+            operation_settings.get("operations").Linear(ffn_dim, dim, device=operation_settings.get("device"), dtype=operation_settings.get("dtype")))
+
+        # modulation
+        self.modulation = nn.Parameter(torch.empty(1, 6, dim, device=operation_settings.get("device"), dtype=operation_settings.get("dtype")))
+
+    def forward(
+        self,
+        x,
+        e,
+        freqs,
+        context,
+    ):
+        r"""
+        Args:
+            x(Tensor): Shape [B, L, C]
+            e(Tensor): Shape [B, 6, C]
+            freqs(Tensor): Rope freqs, shape [1024, C / num_heads / 2]
+        """
+        # assert e.dtype == torch.float32
+
+        e = (comfy.model_management.cast_to(self.modulation, dtype=x.dtype, device=x.device) + e).chunk(6, dim=1)
+        # assert e[0].dtype == torch.float32
+
+        # self-attention
+        y = self.self_attn(
+            self.norm1(x) * (1 + e[1]) + e[0],
+            freqs)
+
+        x = x + y * e[2]
+
+        # cross-attention & ffn function
+        def cross_attn_ffn(x, context, e):
+            x = x + self.cross_attn(self.norm3(x), context)
+            y = self.ffn(self.norm2(x) * (1 + e[4]) + e[3])
+            x = x + y * e[5]
+            return x
+
+        x = cross_attn_ffn(x, context, e)
+        return x
+
+
+class Head(nn.Module):
+
+    def __init__(self, dim, out_dim, patch_size, eps=1e-6, operation_settings={}):
+        super().__init__()
+        self.dim = dim
+        self.out_dim = out_dim
+        self.patch_size = patch_size
+        self.eps = eps
+
+        # layers
+        out_dim = math.prod(patch_size) * out_dim
+        self.norm = operation_settings.get("operations").LayerNorm(dim, eps, elementwise_affine=False, device=operation_settings.get("device"), dtype=operation_settings.get("dtype"))
+        self.head = operation_settings.get("operations").Linear(dim, out_dim, device=operation_settings.get("device"), dtype=operation_settings.get("dtype"))
+
+        # modulation
+        self.modulation = nn.Parameter(torch.empty(1, 2, dim, device=operation_settings.get("device"), dtype=operation_settings.get("dtype")))
+
+    def forward(self, x, e):
+        r"""
+        Args:
+            x(Tensor): Shape [B, L1, C]
+            e(Tensor): Shape [B, C]
+        """
+        # assert e.dtype == torch.float32
+        e = (comfy.model_management.cast_to(self.modulation, dtype=x.dtype, device=x.device) + e.unsqueeze(1)).chunk(2, dim=1)
+        x = (self.head(self.norm(x) * (1 + e[1]) + e[0]))
+        return x
+
+
+class MLPProj(torch.nn.Module):
+
+    def __init__(self, in_dim, out_dim, operation_settings={}):
+        super().__init__()
+
+        self.proj = torch.nn.Sequential(
+            operation_settings.get("operations").LayerNorm(in_dim, device=operation_settings.get("device"), dtype=operation_settings.get("dtype")), operation_settings.get("operations").Linear(in_dim, in_dim, device=operation_settings.get("device"), dtype=operation_settings.get("dtype")),
+            torch.nn.GELU(), operation_settings.get("operations").Linear(in_dim, out_dim, device=operation_settings.get("device"), dtype=operation_settings.get("dtype")),
+            operation_settings.get("operations").LayerNorm(out_dim, device=operation_settings.get("device"), dtype=operation_settings.get("dtype")))
+
+    def forward(self, image_embeds):
+        clip_extra_context_tokens = self.proj(image_embeds)
+        return clip_extra_context_tokens
+
+
+class WanModel(torch.nn.Module):
+    r"""
+    Wan diffusion backbone supporting both text-to-video and image-to-video.
+    """
+
+    def __init__(self,
+                 model_type='t2v',
+                 patch_size=(1, 2, 2),
+                 text_len=512,
+                 in_dim=16,
+                 dim=2048,
+                 ffn_dim=8192,
+                 freq_dim=256,
+                 text_dim=4096,
+                 out_dim=16,
+                 num_heads=16,
+                 num_layers=32,
+                 window_size=(-1, -1),
+                 qk_norm=True,
+                 cross_attn_norm=True,
+                 eps=1e-6,
+                 image_model=None,
+                 device=None,
+                 dtype=None,
+                 operations=None,
+                 ):
+        r"""
+        Initialize the diffusion model backbone.
+
+        Args:
+            model_type (`str`, *optional*, defaults to 't2v'):
+                Model variant - 't2v' (text-to-video) or 'i2v' (image-to-video)
+            patch_size (`tuple`, *optional*, defaults to (1, 2, 2)):
+                3D patch dimensions for video embedding (t_patch, h_patch, w_patch)
+            text_len (`int`, *optional*, defaults to 512):
+                Fixed length for text embeddings
+            in_dim (`int`, *optional*, defaults to 16):
+                Input video channels (C_in)
+            dim (`int`, *optional*, defaults to 2048):
+                Hidden dimension of the transformer
+            ffn_dim (`int`, *optional*, defaults to 8192):
+                Intermediate dimension in feed-forward network
+            freq_dim (`int`, *optional*, defaults to 256):
+                Dimension for sinusoidal time embeddings
+            text_dim (`int`, *optional*, defaults to 4096):
+                Input dimension for text embeddings
+            out_dim (`int`, *optional*, defaults to 16):
+                Output video channels (C_out)
+            num_heads (`int`, *optional*, defaults to 16):
+                Number of attention heads
+            num_layers (`int`, *optional*, defaults to 32):
+                Number of transformer blocks
+            window_size (`tuple`, *optional*, defaults to (-1, -1)):
+                Window size for local attention (-1 indicates global attention)
+            qk_norm (`bool`, *optional*, defaults to True):
+                Enable query/key normalization
+            cross_attn_norm (`bool`, *optional*, defaults to False):
+                Enable cross-attention normalization
+            eps (`float`, *optional*, defaults to 1e-6):
+                Epsilon value for normalization layers
+        """
+
+        super().__init__()
+        self.dtype = dtype
+        operation_settings = {"operations": operations, "device": device, "dtype": dtype}
+
+        assert model_type in ['t2v', 'i2v']
+        self.model_type = model_type
+
+        self.patch_size = patch_size
+        self.text_len = text_len
+        self.in_dim = in_dim
+        self.dim = dim
+        self.ffn_dim = ffn_dim
+        self.freq_dim = freq_dim
+        self.text_dim = text_dim
+        self.out_dim = out_dim
+        self.num_heads = num_heads
+        self.num_layers = num_layers
+        self.window_size = window_size
+        self.qk_norm = qk_norm
+        self.cross_attn_norm = cross_attn_norm
+        self.eps = eps
+
+        # embeddings
+        self.patch_embedding = operations.Conv3d(
+            in_dim, dim, kernel_size=patch_size, stride=patch_size, device=operation_settings.get("device"), dtype=torch.float32)
+        self.text_embedding = nn.Sequential(
+            operations.Linear(text_dim, dim, device=operation_settings.get("device"), dtype=operation_settings.get("dtype")), nn.GELU(approximate='tanh'),
+            operations.Linear(dim, dim, device=operation_settings.get("device"), dtype=operation_settings.get("dtype")))
+
+        self.time_embedding = nn.Sequential(
+            operations.Linear(freq_dim, dim, device=operation_settings.get("device"), dtype=operation_settings.get("dtype")), nn.SiLU(), operations.Linear(dim, dim, device=operation_settings.get("device"), dtype=operation_settings.get("dtype")))
+        self.time_projection = nn.Sequential(nn.SiLU(), operations.Linear(dim, dim * 6, device=operation_settings.get("device"), dtype=operation_settings.get("dtype")))
+
+        # blocks
+        cross_attn_type = 't2v_cross_attn' if model_type == 't2v' else 'i2v_cross_attn'
+        self.blocks = nn.ModuleList([
+            WanAttentionBlock(cross_attn_type, dim, ffn_dim, num_heads,
+                              window_size, qk_norm, cross_attn_norm, eps, operation_settings=operation_settings)
+            for _ in range(num_layers)
+        ])
+
+        # head
+        self.head = Head(dim, out_dim, patch_size, eps, operation_settings=operation_settings)
+
+        d = dim // num_heads
+        self.rope_embedder = EmbedND(dim=d, theta=10000.0, axes_dim=[d - 4 * (d // 6), 2 * (d // 6), 2 * (d // 6)])
+
+        if model_type == 'i2v':
+            self.img_emb = MLPProj(1280, dim, operation_settings=operation_settings)
+        else:
+            self.img_emb = None
+
+    def forward_orig(
+        self,
+        x,
+        t,
+        context,
+        clip_fea=None,
+        freqs=None,
+    ):
+        r"""
+        Forward pass through the diffusion model
+
+        Args:
+            x (Tensor):
+                List of input video tensors with shape [B, C_in, F, H, W]
+            t (Tensor):
+                Diffusion timesteps tensor of shape [B]
+            context (List[Tensor]):
+                List of text embeddings each with shape [B, L, C]
+            seq_len (`int`):
+                Maximum sequence length for positional encoding
+            clip_fea (Tensor, *optional*):
+                CLIP image features for image-to-video mode
+            y (List[Tensor], *optional*):
+                Conditional video inputs for image-to-video mode, same shape as x
+
+        Returns:
+            List[Tensor]:
+                List of denoised video tensors with original input shapes [C_out, F, H / 8, W / 8]
+        """
+        # embeddings
+        x = self.patch_embedding(x.float()).to(x.dtype)
+        grid_sizes = x.shape[2:]
+        x = x.flatten(2).transpose(1, 2)
+
+        # time embeddings
+        e = self.time_embedding(
+            sinusoidal_embedding_1d(self.freq_dim, t).to(dtype=x[0].dtype))
+        e0 = self.time_projection(e).unflatten(1, (6, self.dim))
+
+        # context
+        context = self.text_embedding(context)
+
+        if clip_fea is not None and self.img_emb is not None:
+            context_clip = self.img_emb(clip_fea)  # bs x 257 x dim
+            context = torch.concat([context_clip, context], dim=1)
+
+        # arguments
+        kwargs = dict(
+            e=e0,
+            freqs=freqs,
+            context=context)
+
+        for block in self.blocks:
+            x = block(x, **kwargs)
+
+        # head
+        x = self.head(x, e)
+
+        # unpatchify
+        x = self.unpatchify(x, grid_sizes)
+        return x
+        # return [u.float() for u in x]
+
+    def forward(self, x, timestep, context, clip_fea=None, **kwargs):
+        bs, c, t, h, w = x.shape
+        x = comfy.ldm.common_dit.pad_to_patch_size(x, self.patch_size)
+        patch_size = self.patch_size
+        t_len = ((t + (patch_size[0] // 2)) // patch_size[0])
+        h_len = ((h + (patch_size[1] // 2)) // patch_size[1])
+        w_len = ((w + (patch_size[2] // 2)) // patch_size[2])
+        img_ids = torch.zeros((t_len, h_len, w_len, 3), device=x.device, dtype=x.dtype)
+        img_ids[:, :, :, 0] = img_ids[:, :, :, 0] + torch.linspace(0, t_len - 1, steps=t_len, device=x.device, dtype=x.dtype).reshape(-1, 1, 1)
+        img_ids[:, :, :, 1] = img_ids[:, :, :, 1] + torch.linspace(0, h_len - 1, steps=h_len, device=x.device, dtype=x.dtype).reshape(1, -1, 1)
+        img_ids[:, :, :, 2] = img_ids[:, :, :, 2] + torch.linspace(0, w_len - 1, steps=w_len, device=x.device, dtype=x.dtype).reshape(1, 1, -1)
+        img_ids = repeat(img_ids, "t h w c -> b (t h w) c", b=bs)
+
+        freqs = self.rope_embedder(img_ids).movedim(1, 2)
+        return self.forward_orig(x, timestep, context, clip_fea=clip_fea, freqs=freqs)[:, :, :t, :h, :w]
+
+    def unpatchify(self, x, grid_sizes):
+        r"""
+        Reconstruct video tensors from patch embeddings.
+
+        Args:
+            x (List[Tensor]):
+                List of patchified features, each with shape [L, C_out * prod(patch_size)]
+            grid_sizes (Tensor):
+                Original spatial-temporal grid dimensions before patching,
+                    shape [B, 3] (3 dimensions correspond to F_patches, H_patches, W_patches)
+
+        Returns:
+            List[Tensor]:
+                Reconstructed video tensors with shape [L, C_out, F, H / 8, W / 8]
+        """
+
+        c = self.out_dim
+        u = x
+        b = u.shape[0]
+        u = u[:, :math.prod(grid_sizes)].view(b, *grid_sizes, *self.patch_size, c)
+        u = torch.einsum('bfhwpqrc->bcfphqwr', u)
+        u = u.reshape(b, c, *[i * j for i, j in zip(grid_sizes, self.patch_size)])
+        return u
--- a/comfy/ldm/wan/vae.py
+++ b/comfy/ldm/wan/vae.py
@@ -0,0 +1,567 @@
+# original version: https://github.com/Wan-Video/Wan2.1/blob/main/wan/modules/vae.py
+# Copyright 2024-2025 The Alibaba Wan Team Authors. All rights reserved.
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from einops import rearrange
+from comfy.ldm.modules.diffusionmodules.model import vae_attention
+
+import comfy.ops
+ops = comfy.ops.disable_weight_init
+
+CACHE_T = 2
+
+
+class CausalConv3d(ops.Conv3d):
+    """
+    Causal 3d convolusion.
+    """
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self._padding = (self.padding[2], self.padding[2], self.padding[1],
+                         self.padding[1], 2 * self.padding[0], 0)
+        self.padding = (0, 0, 0)
+
+    def forward(self, x, cache_x=None):
+        padding = list(self._padding)
+        if cache_x is not None and self._padding[4] > 0:
+            cache_x = cache_x.to(x.device)
+            x = torch.cat([cache_x, x], dim=2)
+            padding[4] -= cache_x.shape[2]
+        x = F.pad(x, padding)
+
+        return super().forward(x)
+
+
+class RMS_norm(nn.Module):
+
+    def __init__(self, dim, channel_first=True, images=True, bias=False):
+        super().__init__()
+        broadcastable_dims = (1, 1, 1) if not images else (1, 1)
+        shape = (dim, *broadcastable_dims) if channel_first else (dim,)
+
+        self.channel_first = channel_first
+        self.scale = dim**0.5
+        self.gamma = nn.Parameter(torch.ones(shape))
+        self.bias = nn.Parameter(torch.zeros(shape)) if bias else None
+
+    def forward(self, x):
+        return F.normalize(
+            x, dim=(1 if self.channel_first else -1)) * self.scale * self.gamma.to(x) + (self.bias.to(x) if self.bias is not None else 0)
+
+
+class Upsample(nn.Upsample):
+
+    def forward(self, x):
+        """
+        Fix bfloat16 support for nearest neighbor interpolation.
+        """
+        return super().forward(x.float()).type_as(x)
+
+
+class Resample(nn.Module):
+
+    def __init__(self, dim, mode):
+        assert mode in ('none', 'upsample2d', 'upsample3d', 'downsample2d',
+                        'downsample3d')
+        super().__init__()
+        self.dim = dim
+        self.mode = mode
+
+        # layers
+        if mode == 'upsample2d':
+            self.resample = nn.Sequential(
+                Upsample(scale_factor=(2., 2.), mode='nearest-exact'),
+                ops.Conv2d(dim, dim // 2, 3, padding=1))
+        elif mode == 'upsample3d':
+            self.resample = nn.Sequential(
+                Upsample(scale_factor=(2., 2.), mode='nearest-exact'),
+                ops.Conv2d(dim, dim // 2, 3, padding=1))
+            self.time_conv = CausalConv3d(
+                dim, dim * 2, (3, 1, 1), padding=(1, 0, 0))
+
+        elif mode == 'downsample2d':
+            self.resample = nn.Sequential(
+                nn.ZeroPad2d((0, 1, 0, 1)),
+                ops.Conv2d(dim, dim, 3, stride=(2, 2)))
+        elif mode == 'downsample3d':
+            self.resample = nn.Sequential(
+                nn.ZeroPad2d((0, 1, 0, 1)),
+                ops.Conv2d(dim, dim, 3, stride=(2, 2)))
+            self.time_conv = CausalConv3d(
+                dim, dim, (3, 1, 1), stride=(2, 1, 1), padding=(0, 0, 0))
+
+        else:
+            self.resample = nn.Identity()
+
+    def forward(self, x, feat_cache=None, feat_idx=[0]):
+        b, c, t, h, w = x.size()
+        if self.mode == 'upsample3d':
+            if feat_cache is not None:
+                idx = feat_idx[0]
+                if feat_cache[idx] is None:
+                    feat_cache[idx] = 'Rep'
+                    feat_idx[0] += 1
+                else:
+
+                    cache_x = x[:, :, -CACHE_T:, :, :].clone()
+                    if cache_x.shape[2] < 2 and feat_cache[
+                            idx] is not None and feat_cache[idx] != 'Rep':
+                        # cache last frame of last two chunk
+                        cache_x = torch.cat([
+                            feat_cache[idx][:, :, -1, :, :].unsqueeze(2).to(
+                                cache_x.device), cache_x
+                        ],
+                                            dim=2)
+                    if cache_x.shape[2] < 2 and feat_cache[
+                            idx] is not None and feat_cache[idx] == 'Rep':
+                        cache_x = torch.cat([
+                            torch.zeros_like(cache_x).to(cache_x.device),
+                            cache_x
+                        ],
+                                            dim=2)
+                    if feat_cache[idx] == 'Rep':
+                        x = self.time_conv(x)
+                    else:
+                        x = self.time_conv(x, feat_cache[idx])
+                    feat_cache[idx] = cache_x
+                    feat_idx[0] += 1
+
+                    x = x.reshape(b, 2, c, t, h, w)
+                    x = torch.stack((x[:, 0, :, :, :, :], x[:, 1, :, :, :, :]),
+                                    3)
+                    x = x.reshape(b, c, t * 2, h, w)
+        t = x.shape[2]
+        x = rearrange(x, 'b c t h w -> (b t) c h w')
+        x = self.resample(x)
+        x = rearrange(x, '(b t) c h w -> b c t h w', t=t)
+
+        if self.mode == 'downsample3d':
+            if feat_cache is not None:
+                idx = feat_idx[0]
+                if feat_cache[idx] is None:
+                    feat_cache[idx] = x.clone()
+                    feat_idx[0] += 1
+                else:
+
+                    cache_x = x[:, :, -1:, :, :].clone()
+                    # if cache_x.shape[2] < 2 and feat_cache[idx] is not None and feat_cache[idx]!='Rep':
+                    #     # cache last frame of last two chunk
+                    #     cache_x = torch.cat([feat_cache[idx][:, :, -1, :, :].unsqueeze(2).to(cache_x.device), cache_x], dim=2)
+
+                    x = self.time_conv(
+                        torch.cat([feat_cache[idx][:, :, -1:, :, :], x], 2))
+                    feat_cache[idx] = cache_x
+                    feat_idx[0] += 1
+        return x
+
+    def init_weight(self, conv):
+        conv_weight = conv.weight
+        nn.init.zeros_(conv_weight)
+        c1, c2, t, h, w = conv_weight.size()
+        one_matrix = torch.eye(c1, c2)
+        init_matrix = one_matrix
+        nn.init.zeros_(conv_weight)
+        #conv_weight.data[:,:,-1,1,1] = init_matrix * 0.5
+        conv_weight.data[:, :, 1, 0, 0] = init_matrix  #* 0.5
+        conv.weight.data.copy_(conv_weight)
+        nn.init.zeros_(conv.bias.data)
+
+    def init_weight2(self, conv):
+        conv_weight = conv.weight.data
+        nn.init.zeros_(conv_weight)
+        c1, c2, t, h, w = conv_weight.size()
+        init_matrix = torch.eye(c1 // 2, c2)
+        #init_matrix = repeat(init_matrix, 'o ... -> (o 2) ...').permute(1,0,2).contiguous().reshape(c1,c2)
+        conv_weight[:c1 // 2, :, -1, 0, 0] = init_matrix
+        conv_weight[c1 // 2:, :, -1, 0, 0] = init_matrix
+        conv.weight.data.copy_(conv_weight)
+        nn.init.zeros_(conv.bias.data)
+
+
+class ResidualBlock(nn.Module):
+
+    def __init__(self, in_dim, out_dim, dropout=0.0):
+        super().__init__()
+        self.in_dim = in_dim
+        self.out_dim = out_dim
+
+        # layers
+        self.residual = nn.Sequential(
+            RMS_norm(in_dim, images=False), nn.SiLU(),
+            CausalConv3d(in_dim, out_dim, 3, padding=1),
+            RMS_norm(out_dim, images=False), nn.SiLU(), nn.Dropout(dropout),
+            CausalConv3d(out_dim, out_dim, 3, padding=1))
+        self.shortcut = CausalConv3d(in_dim, out_dim, 1) \
+            if in_dim != out_dim else nn.Identity()
+
+    def forward(self, x, feat_cache=None, feat_idx=[0]):
+        h = self.shortcut(x)
+        for layer in self.residual:
+            if isinstance(layer, CausalConv3d) and feat_cache is not None:
+                idx = feat_idx[0]
+                cache_x = x[:, :, -CACHE_T:, :, :].clone()
+                if cache_x.shape[2] < 2 and feat_cache[idx] is not None:
+                    # cache last frame of last two chunk
+                    cache_x = torch.cat([
+                        feat_cache[idx][:, :, -1, :, :].unsqueeze(2).to(
+                            cache_x.device), cache_x
+                    ],
+                                        dim=2)
+                x = layer(x, feat_cache[idx])
+                feat_cache[idx] = cache_x
+                feat_idx[0] += 1
+            else:
+                x = layer(x)
+        return x + h
+
+
+class AttentionBlock(nn.Module):
+    """
+    Causal self-attention with a single head.
+    """
+
+    def __init__(self, dim):
+        super().__init__()
+        self.dim = dim
+
+        # layers
+        self.norm = RMS_norm(dim)
+        self.to_qkv = ops.Conv2d(dim, dim * 3, 1)
+        self.proj = ops.Conv2d(dim, dim, 1)
+        self.optimized_attention = vae_attention()
+
+    def forward(self, x):
+        identity = x
+        b, c, t, h, w = x.size()
+        x = rearrange(x, 'b c t h w -> (b t) c h w')
+        x = self.norm(x)
+        # compute query, key, value
+
+        q, k, v = self.to_qkv(x).chunk(3, dim=1)
+        x = self.optimized_attention(q, k, v)
+
+        # output
+        x = self.proj(x)
+        x = rearrange(x, '(b t) c h w-> b c t h w', t=t)
+        return x + identity
+
+
+class Encoder3d(nn.Module):
+
+    def __init__(self,
+                 dim=128,
+                 z_dim=4,
+                 dim_mult=[1, 2, 4, 4],
+                 num_res_blocks=2,
+                 attn_scales=[],
+                 temperal_downsample=[True, True, False],
+                 dropout=0.0):
+        super().__init__()
+        self.dim = dim
+        self.z_dim = z_dim
+        self.dim_mult = dim_mult
+        self.num_res_blocks = num_res_blocks
+        self.attn_scales = attn_scales
+        self.temperal_downsample = temperal_downsample
+
+        # dimensions
+        dims = [dim * u for u in [1] + dim_mult]
+        scale = 1.0
+
+        # init block
+        self.conv1 = CausalConv3d(3, dims[0], 3, padding=1)
+
+        # downsample blocks
+        downsamples = []
+        for i, (in_dim, out_dim) in enumerate(zip(dims[:-1], dims[1:])):
+            # residual (+attention) blocks
+            for _ in range(num_res_blocks):
+                downsamples.append(ResidualBlock(in_dim, out_dim, dropout))
+                if scale in attn_scales:
+                    downsamples.append(AttentionBlock(out_dim))
+                in_dim = out_dim
+
+            # downsample block
+            if i != len(dim_mult) - 1:
+                mode = 'downsample3d' if temperal_downsample[
+                    i] else 'downsample2d'
+                downsamples.append(Resample(out_dim, mode=mode))
+                scale /= 2.0
+        self.downsamples = nn.Sequential(*downsamples)
+
+        # middle blocks
+        self.middle = nn.Sequential(
+            ResidualBlock(out_dim, out_dim, dropout), AttentionBlock(out_dim),
+            ResidualBlock(out_dim, out_dim, dropout))
+
+        # output blocks
+        self.head = nn.Sequential(
+            RMS_norm(out_dim, images=False), nn.SiLU(),
+            CausalConv3d(out_dim, z_dim, 3, padding=1))
+
+    def forward(self, x, feat_cache=None, feat_idx=[0]):
+        if feat_cache is not None:
+            idx = feat_idx[0]
+            cache_x = x[:, :, -CACHE_T:, :, :].clone()
+            if cache_x.shape[2] < 2 and feat_cache[idx] is not None:
+                # cache last frame of last two chunk
+                cache_x = torch.cat([
+                    feat_cache[idx][:, :, -1, :, :].unsqueeze(2).to(
+                        cache_x.device), cache_x
+                ],
+                                    dim=2)
+            x = self.conv1(x, feat_cache[idx])
+            feat_cache[idx] = cache_x
+            feat_idx[0] += 1
+        else:
+            x = self.conv1(x)
+
+        ## downsamples
+        for layer in self.downsamples:
+            if feat_cache is not None:
+                x = layer(x, feat_cache, feat_idx)
+            else:
+                x = layer(x)
+
+        ## middle
+        for layer in self.middle:
+            if isinstance(layer, ResidualBlock) and feat_cache is not None:
+                x = layer(x, feat_cache, feat_idx)
+            else:
+                x = layer(x)
+
+        ## head
+        for layer in self.head:
+            if isinstance(layer, CausalConv3d) and feat_cache is not None:
+                idx = feat_idx[0]
+                cache_x = x[:, :, -CACHE_T:, :, :].clone()
+                if cache_x.shape[2] < 2 and feat_cache[idx] is not None:
+                    # cache last frame of last two chunk
+                    cache_x = torch.cat([
+                        feat_cache[idx][:, :, -1, :, :].unsqueeze(2).to(
+                            cache_x.device), cache_x
+                    ],
+                                        dim=2)
+                x = layer(x, feat_cache[idx])
+                feat_cache[idx] = cache_x
+                feat_idx[0] += 1
+            else:
+                x = layer(x)
+        return x
+
+
+class Decoder3d(nn.Module):
+
+    def __init__(self,
+                 dim=128,
+                 z_dim=4,
+                 dim_mult=[1, 2, 4, 4],
+                 num_res_blocks=2,
+                 attn_scales=[],
+                 temperal_upsample=[False, True, True],
+                 dropout=0.0):
+        super().__init__()
+        self.dim = dim
+        self.z_dim = z_dim
+        self.dim_mult = dim_mult
+        self.num_res_blocks = num_res_blocks
+        self.attn_scales = attn_scales
+        self.temperal_upsample = temperal_upsample
+
+        # dimensions
+        dims = [dim * u for u in [dim_mult[-1]] + dim_mult[::-1]]
+        scale = 1.0 / 2**(len(dim_mult) - 2)
+
+        # init block
+        self.conv1 = CausalConv3d(z_dim, dims[0], 3, padding=1)
+
+        # middle blocks
+        self.middle = nn.Sequential(
+            ResidualBlock(dims[0], dims[0], dropout), AttentionBlock(dims[0]),
+            ResidualBlock(dims[0], dims[0], dropout))
+
+        # upsample blocks
+        upsamples = []
+        for i, (in_dim, out_dim) in enumerate(zip(dims[:-1], dims[1:])):
+            # residual (+attention) blocks
+            if i == 1 or i == 2 or i == 3:
+                in_dim = in_dim // 2
+            for _ in range(num_res_blocks + 1):
+                upsamples.append(ResidualBlock(in_dim, out_dim, dropout))
+                if scale in attn_scales:
+                    upsamples.append(AttentionBlock(out_dim))
+                in_dim = out_dim
+
+            # upsample block
+            if i != len(dim_mult) - 1:
+                mode = 'upsample3d' if temperal_upsample[i] else 'upsample2d'
+                upsamples.append(Resample(out_dim, mode=mode))
+                scale *= 2.0
+        self.upsamples = nn.Sequential(*upsamples)
+
+        # output blocks
+        self.head = nn.Sequential(
+            RMS_norm(out_dim, images=False), nn.SiLU(),
+            CausalConv3d(out_dim, 3, 3, padding=1))
+
+    def forward(self, x, feat_cache=None, feat_idx=[0]):
+        ## conv1
+        if feat_cache is not None:
+            idx = feat_idx[0]
+            cache_x = x[:, :, -CACHE_T:, :, :].clone()
+            if cache_x.shape[2] < 2 and feat_cache[idx] is not None:
+                # cache last frame of last two chunk
+                cache_x = torch.cat([
+                    feat_cache[idx][:, :, -1, :, :].unsqueeze(2).to(
+                        cache_x.device), cache_x
+                ],
+                                    dim=2)
+            x = self.conv1(x, feat_cache[idx])
+            feat_cache[idx] = cache_x
+            feat_idx[0] += 1
+        else:
+            x = self.conv1(x)
+
+        ## middle
+        for layer in self.middle:
+            if isinstance(layer, ResidualBlock) and feat_cache is not None:
+                x = layer(x, feat_cache, feat_idx)
+            else:
+                x = layer(x)
+
+        ## upsamples
+        for layer in self.upsamples:
+            if feat_cache is not None:
+                x = layer(x, feat_cache, feat_idx)
+            else:
+                x = layer(x)
+
+        ## head
+        for layer in self.head:
+            if isinstance(layer, CausalConv3d) and feat_cache is not None:
+                idx = feat_idx[0]
+                cache_x = x[:, :, -CACHE_T:, :, :].clone()
+                if cache_x.shape[2] < 2 and feat_cache[idx] is not None:
+                    # cache last frame of last two chunk
+                    cache_x = torch.cat([
+                        feat_cache[idx][:, :, -1, :, :].unsqueeze(2).to(
+                            cache_x.device), cache_x
+                    ],
+                                        dim=2)
+                x = layer(x, feat_cache[idx])
+                feat_cache[idx] = cache_x
+                feat_idx[0] += 1
+            else:
+                x = layer(x)
+        return x
+
+
+def count_conv3d(model):
+    count = 0
+    for m in model.modules():
+        if isinstance(m, CausalConv3d):
+            count += 1
+    return count
+
+
+class WanVAE(nn.Module):
+
+    def __init__(self,
+                 dim=128,
+                 z_dim=4,
+                 dim_mult=[1, 2, 4, 4],
+                 num_res_blocks=2,
+                 attn_scales=[],
+                 temperal_downsample=[True, True, False],
+                 dropout=0.0):
+        super().__init__()
+        self.dim = dim
+        self.z_dim = z_dim
+        self.dim_mult = dim_mult
+        self.num_res_blocks = num_res_blocks
+        self.attn_scales = attn_scales
+        self.temperal_downsample = temperal_downsample
+        self.temperal_upsample = temperal_downsample[::-1]
+
+        # modules
+        self.encoder = Encoder3d(dim, z_dim * 2, dim_mult, num_res_blocks,
+                                 attn_scales, self.temperal_downsample, dropout)
+        self.conv1 = CausalConv3d(z_dim * 2, z_dim * 2, 1)
+        self.conv2 = CausalConv3d(z_dim, z_dim, 1)
+        self.decoder = Decoder3d(dim, z_dim, dim_mult, num_res_blocks,
+                                 attn_scales, self.temperal_upsample, dropout)
+
+    def forward(self, x):
+        mu, log_var = self.encode(x)
+        z = self.reparameterize(mu, log_var)
+        x_recon = self.decode(z)
+        return x_recon, mu, log_var
+
+    def encode(self, x):
+        self.clear_cache()
+        ## cache
+        t = x.shape[2]
+        iter_ = 1 + (t - 1) // 4
+        ## 对encode输入的x，按时间拆分为1、4、4、4....
+        for i in range(iter_):
+            self._enc_conv_idx = [0]
+            if i == 0:
+                out = self.encoder(
+                    x[:, :, :1, :, :],
+                    feat_cache=self._enc_feat_map,
+                    feat_idx=self._enc_conv_idx)
+            else:
+                out_ = self.encoder(
+                    x[:, :, 1 + 4 * (i - 1):1 + 4 * i, :, :],
+                    feat_cache=self._enc_feat_map,
+                    feat_idx=self._enc_conv_idx)
+                out = torch.cat([out, out_], 2)
+        mu, log_var = self.conv1(out).chunk(2, dim=1)
+        self.clear_cache()
+        return mu
+
+    def decode(self, z):
+        self.clear_cache()
+        # z: [b,c,t,h,w]
+
+        iter_ = z.shape[2]
+        x = self.conv2(z)
+        for i in range(iter_):
+            self._conv_idx = [0]
+            if i == 0:
+                out = self.decoder(
+                    x[:, :, i:i + 1, :, :],
+                    feat_cache=self._feat_map,
+                    feat_idx=self._conv_idx)
+            else:
+                out_ = self.decoder(
+                    x[:, :, i:i + 1, :, :],
+                    feat_cache=self._feat_map,
+                    feat_idx=self._conv_idx)
+                out = torch.cat([out, out_], 2)
+        self.clear_cache()
+        return out
+
+    def reparameterize(self, mu, log_var):
+        std = torch.exp(0.5 * log_var)
+        eps = torch.randn_like(std)
+        return eps * std + mu
+
+    def sample(self, imgs, deterministic=False):
+        mu, log_var = self.encode(imgs)
+        if deterministic:
+            return mu
+        std = torch.exp(0.5 * log_var.clamp(-30.0, 20.0))
+        return mu + std * torch.randn_like(std)
+
+    def clear_cache(self):
+        self._conv_num = count_conv3d(self.decoder)
+        self._conv_idx = [0]
+        self._feat_map = [None] * self._conv_num
+        #cache encode
+        self._enc_conv_num = count_conv3d(self.encoder)
+        self._enc_conv_idx = [0]
+        self._enc_feat_map = [None] * self._enc_conv_num
--- a/comfy/model_base.py
+++ b/comfy/model_base.py
@@ -35,6 +35,7 @@ import comfy.ldm.lightricks.model
 import comfy.ldm.hunyuan_video.model
 import comfy.ldm.cosmos.model
 import comfy.ldm.lumina.model
+import comfy.ldm.wan.model

 import comfy.model_management
 import comfy.patcher_extension
@@ -871,6 +872,15 @@ class HunyuanVideo(BaseModel):
        if cross_attn is not None:
            out['c_crossattn'] = comfy.conds.CONDRegular(cross_attn)

+        image = kwargs.get("concat_latent_image", None)
+        noise = kwargs.get("noise", None)
+
+        if image is not None:
+            padding_shape = (noise.shape[0], 16, noise.shape[2] - 1, noise.shape[3], noise.shape[4])
+            latent_padding = torch.zeros(padding_shape, device=noise.device, dtype=noise.dtype)
+            image_latents = torch.cat([image.to(noise), latent_padding], dim=2)
+            out['c_concat'] = comfy.conds.CONDNoiseShape(self.process_latent_in(image_latents))
+
        guidance = kwargs.get("guidance", 6.0)
        if guidance is not None:
            out['guidance'] = comfy.conds.CONDRegular(torch.FloatTensor([guidance]))
@@ -918,3 +928,47 @@ class Lumina2(BaseModel):
        if cross_attn is not None:
            out['c_crossattn'] = comfy.conds.CONDRegular(cross_attn)
        return out
+
+class WAN21(BaseModel):
+    def __init__(self, model_config, model_type=ModelType.FLOW, image_to_video=False, device=None):
+        super().__init__(model_config, model_type, device=device, unet_model=comfy.ldm.wan.model.WanModel)
+        self.image_to_video = image_to_video
+
+    def concat_cond(self, **kwargs):
+        if not self.image_to_video:
+            return None
+
+        image = kwargs.get("concat_latent_image", None)
+        noise = kwargs.get("noise", None)
+        device = kwargs["device"]
+
+        if image is None:
+            image = torch.zeros_like(noise)
+
+        image = utils.common_upscale(image.to(device), noise.shape[-1], noise.shape[-2], "bilinear", "center")
+        image = self.process_latent_in(image)
+        image = utils.resize_to_batch_size(image, noise.shape[0])
+
+        mask = kwargs.get("concat_mask", kwargs.get("denoise_mask", None))
+        if mask is None:
+            mask = torch.zeros_like(noise)[:, :4]
+        else:
+            mask = 1.0 - torch.mean(mask, dim=1, keepdim=True)
+            mask = utils.common_upscale(mask.to(device), noise.shape[-1], noise.shape[-2], "bilinear", "center")
+            if mask.shape[-3] < noise.shape[-3]:
+                mask = torch.nn.functional.pad(mask, (0, 0, 0, 0, 0, noise.shape[-3] - mask.shape[-3]), mode='constant', value=0)
+            mask = mask.repeat(1, 4, 1, 1, 1)
+            mask = utils.resize_to_batch_size(mask, noise.shape[0])
+
+        return torch.cat((mask, image), dim=1)
+
+    def extra_conds(self, **kwargs):
+        out = super().extra_conds(**kwargs)
+        cross_attn = kwargs.get("cross_attn", None)
+        if cross_attn is not None:
+            out['c_crossattn'] = comfy.conds.CONDRegular(cross_attn)
+
+        clip_vision_output = kwargs.get("clip_vision_output", None)
+        if clip_vision_output is not None:
+            out['clip_fea'] = comfy.conds.CONDRegular(clip_vision_output.penultimate_hidden_states)
+        return out
--- a/comfy/model_detection.py
+++ b/comfy/model_detection.py
@@ -136,7 +136,7 @@ def detect_unet_config(state_dict, key_prefix):
    if '{}txt_in.individual_token_refiner.blocks.0.norm1.weight'.format(key_prefix) in state_dict_keys: #Hunyuan Video
        dit_config = {}
        dit_config["image_model"] = "hunyuan_video"
-        dit_config["in_channels"] = 16
+        dit_config["in_channels"] = state_dict['{}img_in.proj.weight'.format(key_prefix)].shape[1] #SkyReels img2video has 32 input channels
        dit_config["patch_size"] = [1, 2, 2]
        dit_config["out_channels"] = 16
        dit_config["vec_in_dim"] = 768
@@ -299,6 +299,27 @@ def detect_unet_config(state_dict, key_prefix):
        dit_config["axes_lens"] = [300, 512, 512]
        return dit_config

+    if '{}head.modulation'.format(key_prefix) in state_dict_keys:  # Wan 2.1
+        dit_config = {}
+        dit_config["image_model"] = "wan2.1"
+        dim = state_dict['{}head.modulation'.format(key_prefix)].shape[-1]
+        dit_config["dim"] = dim
+        dit_config["num_heads"] = dim // 128
+        dit_config["ffn_dim"] = state_dict['{}blocks.0.ffn.0.weight'.format(key_prefix)].shape[0]
+        dit_config["num_layers"] = count_blocks(state_dict_keys, '{}blocks.'.format(key_prefix) + '{}.')
+        dit_config["patch_size"] = (1, 2, 2)
+        dit_config["freq_dim"] = 256
+        dit_config["window_size"] = (-1, -1)
+        dit_config["qk_norm"] = True
+        dit_config["cross_attn_norm"] = True
+        dit_config["eps"] = 1e-6
+        dit_config["in_dim"] = state_dict['{}patch_embedding.weight'.format(key_prefix)].shape[1]
+        if '{}img_emb.proj.0.bias'.format(key_prefix) in state_dict_keys:
+            dit_config["model_type"] = "i2v"
+        else:
+            dit_config["model_type"] = "t2v"
+        return dit_config
+
    if '{}input_blocks.0.0.weight'.format(key_prefix) not in state_dict_keys:
        return None

--- a/comfy/model_management.py
+++ b/comfy/model_management.py
@@ -50,7 +50,9 @@ xpu_available = False
 torch_version = ""
 try:
    torch_version = torch.version.__version__
-    xpu_available = (int(torch_version[0]) < 2 or (int(torch_version[0]) == 2 and int(torch_version[2]) <= 4)) and torch.xpu.is_available()
+    temp = torch_version.split(".")
+    torch_version_numeric = (int(temp[0]), int(temp[1]))
+    xpu_available = (torch_version_numeric[0] < 2 or (torch_version_numeric[0] == 2 and torch_version_numeric[1] <= 4)) and torch.xpu.is_available()
 except:
    pass

@@ -93,6 +95,13 @@ try:
 except:
    npu_available = False

+try:
+    import torch_mlu  # noqa: F401
+    _ = torch.mlu.device_count()
+    mlu_available = torch.mlu.is_available()
+except:
+    mlu_available = False
+
 if args.cpu:
    cpu_state = CPUState.CPU

@@ -110,6 +119,12 @@ def is_ascend_npu():
        return True
    return False

+def is_mlu():
+    global mlu_available
+    if mlu_available:
+        return True
+    return False
+
 def get_torch_device():
    global directml_enabled
    global cpu_state
@@ -125,6 +140,8 @@ def get_torch_device():
            return torch.device("xpu", torch.xpu.current_device())
        elif is_ascend_npu():
            return torch.device("npu", torch.npu.current_device())
+        elif is_mlu():
+            return torch.device("mlu", torch.mlu.current_device())
        else:
            return torch.device(torch.cuda.current_device())

@@ -151,6 +168,12 @@ def get_total_memory(dev=None, torch_total_too=False):
            _, mem_total_npu = torch.npu.mem_get_info(dev)
            mem_total_torch = mem_reserved
            mem_total = mem_total_npu
+        elif is_mlu():
+            stats = torch.mlu.memory_stats(dev)
+            mem_reserved = stats['reserved_bytes.all.current']
+            _, mem_total_mlu = torch.mlu.mem_get_info(dev)
+            mem_total_torch = mem_reserved
+            mem_total = mem_total_mlu
        else:
            stats = torch.cuda.memory_stats(dev)
            mem_reserved = stats['reserved_bytes.all.current']
@@ -218,7 +241,7 @@ def is_amd():

 MIN_WEIGHT_MEMORY_RATIO = 0.4
 if is_nvidia():
-    MIN_WEIGHT_MEMORY_RATIO = 0.1
+    MIN_WEIGHT_MEMORY_RATIO = 0.0

 ENABLE_PYTORCH_ATTENTION = False
 if args.use_pytorch_cross_attention:
@@ -227,28 +250,44 @@ if args.use_pytorch_cross_attention:

 try:
    if is_nvidia():
-        if int(torch_version[0]) >= 2:
+        if torch_version_numeric[0] >= 2:
            if ENABLE_PYTORCH_ATTENTION == False and args.use_split_cross_attention == False and args.use_quad_cross_attention == False:
                ENABLE_PYTORCH_ATTENTION = True
-    if is_intel_xpu() or is_ascend_npu():
+    if is_intel_xpu() or is_ascend_npu() or is_mlu():
        if args.use_split_cross_attention == False and args.use_quad_cross_attention == False:
            ENABLE_PYTORCH_ATTENTION = True
 except:
    pass

+
+try:
+    if is_amd():
+        arch = torch.cuda.get_device_properties(get_torch_device()).gcnArchName
+        logging.info("AMD arch: {}".format(arch))
+        if args.use_split_cross_attention == False and args.use_quad_cross_attention == False:
+            if torch_version_numeric[0] >= 2 and torch_version_numeric[1] >= 7:  # works on 2.6 but doesn't actually seem to improve much
+                if any((a in arch) for a in ["gfx1100", "gfx1101"]):  # TODO: more arches
+                    ENABLE_PYTORCH_ATTENTION = True
+except:
+    pass
+
+
 if ENABLE_PYTORCH_ATTENTION:
    torch.backends.cuda.enable_math_sdp(True)
    torch.backends.cuda.enable_flash_sdp(True)
    torch.backends.cuda.enable_mem_efficient_sdp(True)

+
+PRIORITIZE_FP16 = False  # TODO: remove and replace with something that shows exactly which dtype is faster than the other
 try:
    if is_nvidia() and args.fast:
        torch.backends.cuda.matmul.allow_fp16_accumulation = True
+        PRIORITIZE_FP16 = True  # TODO: limit to cards where it actually boosts performance
 except:
    pass

 try:
-    if int(torch_version[0]) == 2 and int(torch_version[2]) >= 5:
+    if torch_version_numeric[0] == 2 and torch_version_numeric[1] >= 5:
        torch.backends.cuda.allow_fp16_bf16_reduction_math_sdp(True)
 except:
    logging.warning("Warning, could not set allow_fp16_bf16_reduction_math_sdp")
@@ -262,15 +301,10 @@ elif args.highvram or args.gpu_only:
    vram_state = VRAMState.HIGH_VRAM

 FORCE_FP32 = False
-FORCE_FP16 = False
 if args.force_fp32:
    logging.info("Forcing FP32, if this improves things please report it.")
    FORCE_FP32 = True

-if args.force_fp16:
-    logging.info("Forcing FP16.")
-    FORCE_FP16 = True
-
 if lowvram_available:
    if set_vram_to in (VRAMState.LOW_VRAM, VRAMState.NO_VRAM):
        vram_state = set_vram_to
@@ -303,6 +337,8 @@ def get_torch_device_name(device):
        return "{} {}".format(device, torch.xpu.get_device_name(device))
    elif is_ascend_npu():
        return "{} {}".format(device, torch.npu.get_device_name(device))
+    elif is_mlu():
+        return "{} {}".format(device, torch.mlu.get_device_name(device))
    else:
        return "CUDA {}: {}".format(device, torch.cuda.get_device_name(device))

@@ -671,6 +707,10 @@ def unet_dtype(device=None, model_params=0, supported_dtypes=[torch.float16, tor
        if model_params * 2 > free_model_memory:
            return fp8_dtype

+    if PRIORITIZE_FP16:
+        if torch.float16 in supported_dtypes and should_use_fp16(device=device, model_params=model_params):
+            return torch.float16
+
    for dt in supported_dtypes:
        if dt == torch.float16 and should_use_fp16(device=device, model_params=model_params):
            if torch.float16 in supported_dtypes:
@@ -888,6 +928,8 @@ def xformers_enabled():
        return False
    if is_ascend_npu():
        return False
+    if is_mlu():
+        return False
    if directml_enabled:
        return False
    return XFORMERS_IS_AVAILABLE
@@ -904,6 +946,11 @@ def pytorch_attention_enabled():
    global ENABLE_PYTORCH_ATTENTION
    return ENABLE_PYTORCH_ATTENTION

+def pytorch_attention_enabled_vae():
+    if is_amd():
+        return False  # enabling pytorch attention on AMD currently causes crash when doing high res
+    return pytorch_attention_enabled()
+
 def pytorch_attention_flash_attention():
    global ENABLE_PYTORCH_ATTENTION
    if ENABLE_PYTORCH_ATTENTION:
@@ -914,6 +961,10 @@ def pytorch_attention_flash_attention():
            return True
        if is_ascend_npu():
            return True
+        if is_mlu():
+            return True
+        if is_amd():
+            return True #if you have pytorch attention enabled on AMD it probably supports at least mem efficient attention
    return False

 def mac_version():
@@ -926,11 +977,11 @@ def force_upcast_attention_dtype():
    upcast = args.force_upcast_attention

    macos_version = mac_version()
-    if macos_version is not None and ((14, 5) <= macos_version <= (15, 2)):  # black image bug on recent versions of macOS
+    if macos_version is not None and ((14, 5) <= macos_version < (16,)):  # black image bug on recent versions of macOS
        upcast = True

    if upcast:
-        return torch.float32
+        return {torch.float16: torch.float32}
    else:
        return None

@@ -960,6 +1011,13 @@ def get_free_memory(dev=None, torch_free_too=False):
            mem_free_npu, _ = torch.npu.mem_get_info(dev)
            mem_free_torch = mem_reserved - mem_active
            mem_free_total = mem_free_npu + mem_free_torch
+        elif is_mlu():
+            stats = torch.mlu.memory_stats(dev)
+            mem_active = stats['active_bytes.all.current']
+            mem_reserved = stats['reserved_bytes.all.current']
+            mem_free_mlu, _ = torch.mlu.mem_get_info(dev)
+            mem_free_torch = mem_reserved - mem_active
+            mem_free_total = mem_free_mlu + mem_free_torch
        else:
            stats = torch.cuda.memory_stats(dev)
            mem_active = stats['active_bytes.all.current']
@@ -996,21 +1054,26 @@ def is_device_mps(device):
 def is_device_cuda(device):
    return is_device_type(device, 'cuda')

-def should_use_fp16(device=None, model_params=0, prioritize_performance=True, manual_cast=False):
+def is_directml_enabled():
    global directml_enabled
+    if directml_enabled:
+        return True

+    return False
+
+def should_use_fp16(device=None, model_params=0, prioritize_performance=True, manual_cast=False):
    if device is not None:
        if is_device_cpu(device):
            return False

-    if FORCE_FP16:
+    if args.force_fp16:
        return True

    if FORCE_FP32:
        return False

-    if directml_enabled:
-        return False
+    if is_directml_enabled():
+        return True

    if (device is not None and is_device_mps(device)) or mps_mode():
        return True
@@ -1024,6 +1087,9 @@ def should_use_fp16(device=None, model_params=0, prioritize_performance=True, ma
    if is_ascend_npu():
        return True

+    if is_mlu():
+        return True
+
    if torch.version.hip:
        return True

@@ -1081,13 +1147,28 @@ def should_use_bf16(device=None, model_params=0, prioritize_performance=True, ma
    if is_intel_xpu():
        return True

+    if is_ascend_npu():
+        return True
+
+    if is_amd():
+        arch = torch.cuda.get_device_properties(device).gcnArchName
+        if any((a in arch) for a in ["gfx1030", "gfx1031", "gfx1010", "gfx1011", "gfx1012", "gfx906", "gfx900", "gfx803"]):  # RDNA2 and older don't support bf16
+            if manual_cast:
+                return True
+            return False
+
    props = torch.cuda.get_device_properties(device)
+
+    if is_mlu():
+        if props.major > 3:
+            return True
+
    if props.major >= 8:
        return True

    bf16_works = torch.cuda.is_bf16_supported()

-    if bf16_works or manual_cast:
+    if bf16_works and manual_cast:
        free_model_memory = maximum_vram_for_weights(device)
        if (not prioritize_performance) or model_params * 4 > free_model_memory:
            return True
@@ -1106,11 +1187,11 @@ def supports_fp8_compute(device=None):
    if props.minor < 9:
        return False

-    if int(torch_version[0]) < 2 or (int(torch_version[0]) == 2 and int(torch_version[2]) < 3):
+    if torch_version_numeric[0] < 2 or (torch_version_numeric[0] == 2 and torch_version_numeric[1] < 3):
        return False

    if WINDOWS:
-        if (int(torch_version[0]) == 2 and int(torch_version[2]) < 4):
+        if (torch_version_numeric[0] == 2 and torch_version_numeric[1] < 4):
            return False

    return True
--- a/comfy/model_patcher.py
+++ b/comfy/model_patcher.py
@@ -96,8 +96,28 @@ def wipe_lowvram_weight(m):
    if hasattr(m, "prev_comfy_cast_weights"):
        m.comfy_cast_weights = m.prev_comfy_cast_weights
        del m.prev_comfy_cast_weights
-    m.weight_function = None
-    m.bias_function = None
+
+    if hasattr(m, "weight_function"):
+        m.weight_function = []
+
+    if hasattr(m, "bias_function"):
+        m.bias_function = []
+
+def move_weight_functions(m, device):
+    if device is None:
+        return 0
+
+    memory = 0
+    if hasattr(m, "weight_function"):
+        for f in m.weight_function:
+            if hasattr(f, "move_to"):
+                memory += f.move_to(device=device)
+
+    if hasattr(m, "bias_function"):
+        for f in m.bias_function:
+            if hasattr(f, "move_to"):
+                memory += f.move_to(device=device)
+    return memory

 class LowVramPatch:
    def __init__(self, key, patches):
@@ -192,11 +212,13 @@ class ModelPatcher:
        self.backup = {}
        self.object_patches = {}
        self.object_patches_backup = {}
+        self.weight_wrapper_patches = {}
        self.model_options = {"transformer_options":{}}
        self.model_size()
        self.load_device = load_device
        self.offload_device = offload_device
        self.weight_inplace_update = weight_inplace_update
+        self.force_cast_weights = False
        self.patches_uuid = uuid.uuid4()
        self.parent = None

@@ -250,11 +272,14 @@ class ModelPatcher:
        n.patches_uuid = self.patches_uuid

        n.object_patches = self.object_patches.copy()
+        n.weight_wrapper_patches = self.weight_wrapper_patches.copy()
        n.model_options = copy.deepcopy(self.model_options)
        n.backup = self.backup
        n.object_patches_backup = self.object_patches_backup
        n.parent = self

+        n.force_cast_weights = self.force_cast_weights
+
        # attachments
        n.attachments = {}
        for k in self.attachments:
@@ -402,6 +427,16 @@ class ModelPatcher:
    def add_object_patch(self, name, obj):
        self.object_patches[name] = obj

+    def set_model_compute_dtype(self, dtype):
+        self.add_object_patch("manual_cast_dtype", dtype)
+        if dtype is not None:
+            self.force_cast_weights = True
+        self.patches_uuid = uuid.uuid4() #TODO: optimize by preventing a full model reload for this
+
+    def add_weight_wrapper(self, name, function):
+        self.weight_wrapper_patches[name] = self.weight_wrapper_patches.get(name, []) + [function]
+        self.patches_uuid = uuid.uuid4()
+
    def get_model_object(self, name: str) -> torch.nn.Module:
        """Retrieves a nested attribute from an object using dot notation considering
        object patches.
@@ -566,6 +601,9 @@ class ModelPatcher:

                lowvram_weight = False

+                weight_key = "{}.weight".format(n)
+                bias_key = "{}.bias".format(n)
+
                if not full_load and hasattr(m, "comfy_cast_weights"):
                    if mem_counter + module_mem >= lowvram_model_memory:
                        lowvram_weight = True
@@ -573,34 +611,46 @@ class ModelPatcher:
                        if hasattr(m, "prev_comfy_cast_weights"): #Already lowvramed
                            continue

-                weight_key = "{}.weight".format(n)
-                bias_key = "{}.bias".format(n)
-
+                cast_weight = self.force_cast_weights
                if lowvram_weight:
+                    if hasattr(m, "comfy_cast_weights"):
+                        m.weight_function = []
+                        m.bias_function = []
+
                    if weight_key in self.patches:
                        if force_patch_weights:
                            self.patch_weight_to_device(weight_key)
                        else:
-                            m.weight_function = LowVramPatch(weight_key, self.patches)
+                            m.weight_function = [LowVramPatch(weight_key, self.patches)]
                            patch_counter += 1
                    if bias_key in self.patches:
                        if force_patch_weights:
                            self.patch_weight_to_device(bias_key)
                        else:
-                            m.bias_function = LowVramPatch(bias_key, self.patches)
+                            m.bias_function = [LowVramPatch(bias_key, self.patches)]
                            patch_counter += 1

-                    m.prev_comfy_cast_weights = m.comfy_cast_weights
-                    m.comfy_cast_weights = True
+                    cast_weight = True
                else:
                    if hasattr(m, "comfy_cast_weights"):
-                        if m.comfy_cast_weights:
-                            wipe_lowvram_weight(m)
+                        wipe_lowvram_weight(m)

                    if full_load or mem_counter + module_mem < lowvram_model_memory:
                        mem_counter += module_mem
                        load_completely.append((module_mem, n, m, params))

+                if cast_weight and hasattr(m, "comfy_cast_weights"):
+                    m.prev_comfy_cast_weights = m.comfy_cast_weights
+                    m.comfy_cast_weights = True
+
+                if weight_key in self.weight_wrapper_patches:
+                    m.weight_function.extend(self.weight_wrapper_patches[weight_key])
+
+                if bias_key in self.weight_wrapper_patches:
+                    m.bias_function.extend(self.weight_wrapper_patches[bias_key])
+
+                mem_counter += move_weight_functions(m, device_to)
+
            load_completely.sort(reverse=True)
            for x in load_completely:
                n = x[1]
@@ -662,6 +712,7 @@ class ModelPatcher:
            self.unpatch_hooks()
            if self.model.model_lowvram:
                for m in self.model.modules():
+                    move_weight_functions(m, device_to)
                    wipe_lowvram_weight(m)

                self.model.model_lowvram = False
@@ -728,15 +779,19 @@ class ModelPatcher:
                    weight_key = "{}.weight".format(n)
                    bias_key = "{}.bias".format(n)
                    if move_weight:
+                        cast_weight = self.force_cast_weights
                        m.to(device_to)
+                        module_mem += move_weight_functions(m, device_to)
                        if lowvram_possible:
                            if weight_key in self.patches:
-                                m.weight_function = LowVramPatch(weight_key, self.patches)
+                                m.weight_function.append(LowVramPatch(weight_key, self.patches))
                                patch_counter += 1
                            if bias_key in self.patches:
-                                m.bias_function = LowVramPatch(bias_key, self.patches)
+                                m.bias_function.append(LowVramPatch(bias_key, self.patches))
                                patch_counter += 1
+                            cast_weight = True

+                        if cast_weight:
                            m.prev_comfy_cast_weights = m.comfy_cast_weights
                            m.comfy_cast_weights = True
                        m.comfy_patched_weights = False
--- a/comfy/ops.py
+++ b/comfy/ops.py
@@ -38,21 +38,23 @@ def cast_bias_weight(s, input=None, dtype=None, device=None, bias_dtype=None):
    bias = None
    non_blocking = comfy.model_management.device_supports_non_blocking(device)
    if s.bias is not None:
-        has_function = s.bias_function is not None
+        has_function = len(s.bias_function) > 0
        bias = comfy.model_management.cast_to(s.bias, bias_dtype, device, non_blocking=non_blocking, copy=has_function)
        if has_function:
-            bias = s.bias_function(bias)
+            for f in s.bias_function:
+                bias = f(bias)

-    has_function = s.weight_function is not None
+    has_function = len(s.weight_function) > 0
    weight = comfy.model_management.cast_to(s.weight, dtype, device, non_blocking=non_blocking, copy=has_function)
    if has_function:
-        weight = s.weight_function(weight)
+        for f in s.weight_function:
+            weight = f(weight)
    return weight, bias

 class CastWeightBiasOp:
    comfy_cast_weights = False
-    weight_function = None
-    bias_function = None
+    weight_function = []
+    bias_function = []

 class disable_weight_init:
    class Linear(torch.nn.Linear, CastWeightBiasOp):
@@ -64,7 +66,7 @@ class disable_weight_init:
            return torch.nn.functional.linear(input, weight, bias)

        def forward(self, *args, **kwargs):
-            if self.comfy_cast_weights:
+            if self.comfy_cast_weights or len(self.weight_function) > 0 or len(self.bias_function) > 0:
                return self.forward_comfy_cast_weights(*args, **kwargs)
            else:
                return super().forward(*args, **kwargs)
@@ -78,7 +80,7 @@ class disable_weight_init:
            return self._conv_forward(input, weight, bias)

        def forward(self, *args, **kwargs):
-            if self.comfy_cast_weights:
+            if self.comfy_cast_weights or len(self.weight_function) > 0 or len(self.bias_function) > 0:
                return self.forward_comfy_cast_weights(*args, **kwargs)
            else:
                return super().forward(*args, **kwargs)
@@ -92,7 +94,7 @@ class disable_weight_init:
            return self._conv_forward(input, weight, bias)

        def forward(self, *args, **kwargs):
-            if self.comfy_cast_weights:
+            if self.comfy_cast_weights or len(self.weight_function) > 0 or len(self.bias_function) > 0:
                return self.forward_comfy_cast_weights(*args, **kwargs)
            else:
                return super().forward(*args, **kwargs)
@@ -106,7 +108,7 @@ class disable_weight_init:
            return self._conv_forward(input, weight, bias)

        def forward(self, *args, **kwargs):
-            if self.comfy_cast_weights:
+            if self.comfy_cast_weights or len(self.weight_function) > 0 or len(self.bias_function) > 0:
                return self.forward_comfy_cast_weights(*args, **kwargs)
            else:
                return super().forward(*args, **kwargs)
@@ -120,12 +122,11 @@ class disable_weight_init:
            return torch.nn.functional.group_norm(input, self.num_groups, weight, bias, self.eps)

        def forward(self, *args, **kwargs):
-            if self.comfy_cast_weights:
+            if self.comfy_cast_weights or len(self.weight_function) > 0 or len(self.bias_function) > 0:
                return self.forward_comfy_cast_weights(*args, **kwargs)
            else:
                return super().forward(*args, **kwargs)

-
    class LayerNorm(torch.nn.LayerNorm, CastWeightBiasOp):
        def reset_parameters(self):
            return None
@@ -139,7 +140,7 @@ class disable_weight_init:
            return torch.nn.functional.layer_norm(input, self.normalized_shape, weight, bias, self.eps)

        def forward(self, *args, **kwargs):
-            if self.comfy_cast_weights:
+            if self.comfy_cast_weights or len(self.weight_function) > 0 or len(self.bias_function) > 0:
                return self.forward_comfy_cast_weights(*args, **kwargs)
            else:
                return super().forward(*args, **kwargs)
@@ -160,7 +161,7 @@ class disable_weight_init:
                output_padding, self.groups, self.dilation)

        def forward(self, *args, **kwargs):
-            if self.comfy_cast_weights:
+            if self.comfy_cast_weights or len(self.weight_function) > 0 or len(self.bias_function) > 0:
                return self.forward_comfy_cast_weights(*args, **kwargs)
            else:
                return super().forward(*args, **kwargs)
@@ -181,7 +182,7 @@ class disable_weight_init:
                output_padding, self.groups, self.dilation)

        def forward(self, *args, **kwargs):
-            if self.comfy_cast_weights:
+            if self.comfy_cast_weights or len(self.weight_function) > 0 or len(self.bias_function) > 0:
                return self.forward_comfy_cast_weights(*args, **kwargs)
            else:
                return super().forward(*args, **kwargs)
@@ -199,7 +200,7 @@ class disable_weight_init:
            return torch.nn.functional.embedding(input, weight, self.padding_idx, self.max_norm, self.norm_type, self.scale_grad_by_freq, self.sparse).to(dtype=output_dtype)

        def forward(self, *args, **kwargs):
-            if self.comfy_cast_weights:
+            if self.comfy_cast_weights or len(self.weight_function) > 0 or len(self.bias_function) > 0:
                return self.forward_comfy_cast_weights(*args, **kwargs)
            else:
                if "out_dtype" in kwargs:
--- a/comfy/sd.py
+++ b/comfy/sd.py
@@ -12,6 +12,7 @@ from .ldm.audio.autoencoder import AudioOobleckVAE
 import comfy.ldm.genmo.vae.model
 import comfy.ldm.lightricks.vae.causal_video_autoencoder
 import comfy.ldm.cosmos.vae
+import comfy.ldm.wan.vae
 import yaml
 import math

@@ -37,6 +38,7 @@ import comfy.text_encoders.lt
 import comfy.text_encoders.hunyuan_video
 import comfy.text_encoders.cosmos
 import comfy.text_encoders.lumina2
+import comfy.text_encoders.wan

 import comfy.model_patcher
 import comfy.lora
@@ -392,6 +394,18 @@ class VAE:
                self.memory_used_decode = lambda shape, dtype: (50 * shape[2] * shape[3] * shape[4] * (8 * 8 * 8)) * model_management.dtype_size(dtype)
                self.memory_used_encode = lambda shape, dtype: (50 * (round((shape[2] + 7) / 8) * 8) * shape[3] * shape[4]) * model_management.dtype_size(dtype)
                self.working_dtypes = [torch.bfloat16, torch.float32]
+            elif "decoder.middle.0.residual.0.gamma" in sd:
+                self.upscale_ratio = (lambda a: max(0, a * 4 - 3), 8, 8)
+                self.upscale_index_formula = (4, 8, 8)
+                self.downscale_ratio = (lambda a: max(0, math.floor((a + 3) / 4)), 8, 8)
+                self.downscale_index_formula = (4, 8, 8)
+                self.latent_dim = 3
+                self.latent_channels = 16
+                ddconfig = {"dim": 96, "z_dim": self.latent_channels, "dim_mult": [1, 2, 4, 4], "num_res_blocks": 2, "attn_scales": [], "temperal_downsample": [False, True, True], "dropout": 0.0}
+                self.first_stage_model = comfy.ldm.wan.vae.WanVAE(**ddconfig)
+                self.working_dtypes = [torch.bfloat16, torch.float16, torch.float32]
+                self.memory_used_encode = lambda shape, dtype: 6000 * shape[3] * shape[4] * model_management.dtype_size(dtype)
+                self.memory_used_decode = lambda shape, dtype: 7000 * shape[3] * shape[4] * (8 * 8) * model_management.dtype_size(dtype)
            else:
                logging.warning("WARNING: No VAE weights detected, VAE not initalized.")
                self.first_stage_model = None
@@ -659,6 +673,7 @@ class CLIPType(Enum):
    PIXART = 10
    COSMOS = 11
    LUMINA2 = 12
+    WAN = 13


 def load_clip(ckpt_paths, embedding_directory=None, clip_type=CLIPType.STABLE_DIFFUSION, model_options={}):
@@ -763,6 +778,10 @@ def load_text_encoder_state_dicts(state_dicts=[], embedding_directory=None, clip
            elif clip_type == CLIPType.PIXART:
                clip_target.clip = comfy.text_encoders.pixart_t5.pixart_te(**t5xxl_detect(clip_data))
                clip_target.tokenizer = comfy.text_encoders.pixart_t5.PixArtTokenizer
+            elif clip_type == CLIPType.WAN:
+                clip_target.clip = comfy.text_encoders.wan.te(**t5xxl_detect(clip_data))
+                clip_target.tokenizer = comfy.text_encoders.wan.WanT5Tokenizer
+                tokenizer_data["spiece_model"] = clip_data[0].get("spiece_model", None)
            else: #CLIPType.MOCHI
                clip_target.clip = comfy.text_encoders.genmo.mochi_te(**t5xxl_detect(clip_data))
                clip_target.tokenizer = comfy.text_encoders.genmo.MochiT5Tokenizer
--- a/comfy/supported_models.py
+++ b/comfy/supported_models.py
@@ -16,6 +16,7 @@ import comfy.text_encoders.lt
 import comfy.text_encoders.hunyuan_video
 import comfy.text_encoders.cosmos
 import comfy.text_encoders.lumina2
+import comfy.text_encoders.wan

 from . import supported_models_base
 from . import latent_formats
@@ -895,6 +896,49 @@ class Lumina2(supported_models_base.BASE):
        hunyuan_detect = comfy.text_encoders.hunyuan_video.llama_detect(state_dict, "{}gemma2_2b.transformer.".format(pref))
        return supported_models_base.ClipTarget(comfy.text_encoders.lumina2.LuminaTokenizer, comfy.text_encoders.lumina2.te(**hunyuan_detect))

-models = [Stable_Zero123, SD15_instructpix2pix, SD15, SD20, SD21UnclipL, SD21UnclipH, SDXL_instructpix2pix, SDXLRefiner, SDXL, SSD1B, KOALA_700M, KOALA_1B, Segmind_Vega, SD_X4Upscaler, Stable_Cascade_C, Stable_Cascade_B, SV3D_u, SV3D_p, SD3, StableAudio, AuraFlow, PixArtAlpha, PixArtSigma, HunyuanDiT, HunyuanDiT1, FluxInpaint, Flux, FluxSchnell, GenmoMochi, LTXV, HunyuanVideo, CosmosT2V, CosmosI2V, Lumina2]
+class WAN21_T2V(supported_models_base.BASE):
+    unet_config = {
+        "image_model": "wan2.1",
+        "model_type": "t2v",
+    }
+
+    sampling_settings = {
+        "shift": 8.0,
+    }
+
+    unet_extra_config = {}
+    latent_format = latent_formats.Wan21
+
+    memory_usage_factor = 1.0
+
+    supported_inference_dtypes = [torch.bfloat16, torch.float16, torch.float32]
+
+    vae_key_prefix = ["vae."]
+    text_encoder_key_prefix = ["text_encoders."]
+
+    def __init__(self, unet_config):
+        super().__init__(unet_config)
+        self.memory_usage_factor = self.unet_config.get("dim", 2000) / 2000
+
+    def get_model(self, state_dict, prefix="", device=None):
+        out = model_base.WAN21(self, device=device)
+        return out
+
+    def clip_target(self, state_dict={}):
+        pref = self.text_encoder_key_prefix[0]
+        t5_detect = comfy.text_encoders.sd3_clip.t5_xxl_detect(state_dict, "{}umt5xxl.transformer.".format(pref))
+        return supported_models_base.ClipTarget(comfy.text_encoders.wan.WanT5Tokenizer, comfy.text_encoders.wan.te(**t5_detect))
+
+class WAN21_I2V(WAN21_T2V):
+    unet_config = {
+        "image_model": "wan2.1",
+        "model_type": "i2v",
+    }
+
+    def get_model(self, state_dict, prefix="", device=None):
+        out = model_base.WAN21(self, image_to_video=True, device=device)
+        return out
+
+models = [Stable_Zero123, SD15_instructpix2pix, SD15, SD20, SD21UnclipL, SD21UnclipH, SDXL_instructpix2pix, SDXLRefiner, SDXL, SSD1B, KOALA_700M, KOALA_1B, Segmind_Vega, SD_X4Upscaler, Stable_Cascade_C, Stable_Cascade_B, SV3D_u, SV3D_p, SD3, StableAudio, AuraFlow, PixArtAlpha, PixArtSigma, HunyuanDiT, HunyuanDiT1, FluxInpaint, Flux, FluxSchnell, GenmoMochi, LTXV, HunyuanVideo, CosmosT2V, CosmosI2V, Lumina2, WAN21_T2V, WAN21_I2V]

 models += [SVD_img2vid]
--- a/comfy/text_encoders/lumina2.py
+++ b/comfy/text_encoders/lumina2.py
@@ -19,11 +19,6 @@ class LuminaTokenizer(sd1_clip.SD1Tokenizer):

 class Gemma2_2BModel(sd1_clip.SDClipModel):
    def __init__(self, device="cpu", layer="hidden", layer_idx=-2, dtype=None, attention_mask=True, model_options={}):
-        llama_scaled_fp8 = model_options.get("llama_scaled_fp8", None)
-        if llama_scaled_fp8 is not None:
-            model_options = model_options.copy()
-            model_options["scaled_fp8"] = llama_scaled_fp8
-
        super().__init__(device=device, layer=layer, layer_idx=layer_idx, textmodel_json_config={}, dtype=dtype, special_tokens={"start": 2, "pad": 0}, layer_norm_hidden_state=False, model_class=comfy.text_encoders.llama.Gemma2_2B, enable_attention_masks=attention_mask, return_attention_masks=attention_mask, model_options=model_options)


@@ -35,10 +30,10 @@ class LuminaModel(sd1_clip.SD1ClipModel):
 def te(dtype_llama=None, llama_scaled_fp8=None):
    class LuminaTEModel_(LuminaModel):
        def __init__(self, device="cpu", dtype=None, model_options={}):
-            if llama_scaled_fp8 is not None and "llama_scaled_fp8" not in model_options:
+            if llama_scaled_fp8 is not None and "scaled_fp8" not in model_options:
                model_options = model_options.copy()
-                model_options["llama_scaled_fp8"] = llama_scaled_fp8
-                if dtype_llama is not None:
-                    dtype = dtype_llama
+                model_options["scaled_fp8"] = llama_scaled_fp8
+            if dtype_llama is not None:
+                dtype = dtype_llama
            super().__init__(device=device, dtype=dtype, model_options=model_options)
    return LuminaTEModel_
--- a/comfy/text_encoders/umt5_config_xxl.json
+++ b/comfy/text_encoders/umt5_config_xxl.json
@@ -0,0 +1,22 @@
+{
+  "d_ff": 10240,
+  "d_kv": 64,
+  "d_model": 4096,
+  "decoder_start_token_id": 0,
+  "dropout_rate": 0.1,
+  "eos_token_id": 1,
+  "dense_act_fn": "gelu_pytorch_tanh",
+  "initializer_factor": 1.0,
+  "is_encoder_decoder": true,
+  "is_gated_act": true,
+  "layer_norm_epsilon": 1e-06,
+  "model_type": "umt5",
+  "num_decoder_layers": 24,
+  "num_heads": 64,
+  "num_layers": 24,
+  "output_past": true,
+  "pad_token_id": 0,
+  "relative_attention_num_buckets": 32,
+  "tie_word_embeddings": false,
+  "vocab_size": 256384
+}
--- a/comfy/text_encoders/wan.py
+++ b/comfy/text_encoders/wan.py
@@ -0,0 +1,37 @@
+from comfy import sd1_clip
+from .spiece_tokenizer import SPieceTokenizer
+import comfy.text_encoders.t5
+import os
+
+class UMT5XXlModel(sd1_clip.SDClipModel):
+    def __init__(self, device="cpu", layer="last", layer_idx=None, dtype=None, model_options={}):
+        textmodel_json_config = os.path.join(os.path.dirname(os.path.realpath(__file__)), "umt5_config_xxl.json")
+        super().__init__(device=device, layer=layer, layer_idx=layer_idx, textmodel_json_config=textmodel_json_config, dtype=dtype, special_tokens={"end": 1, "pad": 0}, model_class=comfy.text_encoders.t5.T5, enable_attention_masks=True, zero_out_masked=True, model_options=model_options)
+
+class UMT5XXlTokenizer(sd1_clip.SDTokenizer):
+    def __init__(self, embedding_directory=None, tokenizer_data={}):
+        tokenizer = tokenizer_data.get("spiece_model", None)
+        super().__init__(tokenizer, pad_with_end=False, embedding_size=4096, embedding_key='umt5xxl', tokenizer_class=SPieceTokenizer, has_start_token=False, pad_to_max_length=False, max_length=99999999, min_length=512, pad_token=0)
+
+    def state_dict(self):
+        return {"spiece_model": self.tokenizer.serialize_model()}
+
+
+class WanT5Tokenizer(sd1_clip.SD1Tokenizer):
+    def __init__(self, embedding_directory=None, tokenizer_data={}):
+        super().__init__(embedding_directory=embedding_directory, tokenizer_data=tokenizer_data, clip_name="umt5xxl", tokenizer=UMT5XXlTokenizer)
+
+class WanT5Model(sd1_clip.SD1ClipModel):
+    def __init__(self, device="cpu", dtype=None, model_options={}, **kwargs):
+        super().__init__(device=device, dtype=dtype, model_options=model_options, name="umt5xxl", clip_model=UMT5XXlModel, **kwargs)
+
+def te(dtype_t5=None, t5xxl_scaled_fp8=None):
+    class WanTEModel(WanT5Model):
+        def __init__(self, device="cpu", dtype=None, model_options={}):
+            if t5xxl_scaled_fp8 is not None and "scaled_fp8" not in model_options:
+                model_options = model_options.copy()
+                model_options["scaled_fp8"] = t5xxl_scaled_fp8
+            if dtype_t5 is not None:
+                dtype = dtype_t5
+            super().__init__(device=device, dtype=dtype, model_options=model_options)
+    return WanTEModel
--- a/comfy_extras/nodes_load_3d.py
+++ b/comfy_extras/nodes_load_3d.py
@@ -20,9 +20,7 @@ class Load3D():
            "width": ("INT", {"default": 1024, "min": 1, "max": 4096, "step": 1}),
            "height": ("INT", {"default": 1024, "min": 1, "max": 4096, "step": 1}),
            "material": (["original", "normal", "wireframe", "depth"],),
-            "light_intensity": ("INT", {"default": 10, "min": 1, "max": 20, "step": 1}),
            "up_direction": (["original", "-x", "+x", "-y", "+y", "-z", "+z"],),
-            "fov": ("INT", {"default": 75, "min": 10, "max": 150, "step": 1}),
        }}

    RETURN_TYPES = ("IMAGE", "MASK", "STRING")
@@ -34,22 +32,14 @@ class Load3D():
    CATEGORY = "3d"

    def process(self, model_file, image, **kwargs):
-        if isinstance(image, dict):
-            image_path = folder_paths.get_annotated_filepath(image['image'])
-            mask_path = folder_paths.get_annotated_filepath(image['mask'])
+        image_path = folder_paths.get_annotated_filepath(image['image'])
+        mask_path = folder_paths.get_annotated_filepath(image['mask'])

-            load_image_node = nodes.LoadImage()
-            output_image, ignore_mask = load_image_node.load_image(image=image_path)
-            ignore_image, output_mask = load_image_node.load_image(image=mask_path)
+        load_image_node = nodes.LoadImage()
+        output_image, ignore_mask = load_image_node.load_image(image=image_path)
+        ignore_image, output_mask = load_image_node.load_image(image=mask_path)

-            return output_image, output_mask, model_file,
-        else:
-            # to avoid the format is not dict which will happen the FE code is not compatibility to core,
-            # we need to this to double-check, it can be removed after merged FE into the core
-            image_path = folder_paths.get_annotated_filepath(image)
-            load_image_node = nodes.LoadImage()
-            output_image, output_mask = load_image_node.load_image(image=image_path)
-            return output_image, output_mask, model_file,
+        return output_image, output_mask, model_file,

 class Load3DAnimation():
    @classmethod
@@ -66,9 +56,7 @@ class Load3DAnimation():
            "width": ("INT", {"default": 1024, "min": 1, "max": 4096, "step": 1}),
            "height": ("INT", {"default": 1024, "min": 1, "max": 4096, "step": 1}),
            "material": (["original", "normal", "wireframe", "depth"],),
-            "light_intensity": ("INT", {"default": 10, "min": 1, "max": 20, "step": 1}),
            "up_direction": (["original", "-x", "+x", "-y", "+y", "-z", "+z"],),
-            "fov": ("INT", {"default": 75, "min": 10, "max": 150, "step": 1}),
        }}

    RETURN_TYPES = ("IMAGE", "MASK", "STRING")
@@ -80,20 +68,14 @@ class Load3DAnimation():
    CATEGORY = "3d"

    def process(self, model_file, image, **kwargs):
-        if isinstance(image, dict):
-            image_path = folder_paths.get_annotated_filepath(image['image'])
-            mask_path = folder_paths.get_annotated_filepath(image['mask'])
+        image_path = folder_paths.get_annotated_filepath(image['image'])
+        mask_path = folder_paths.get_annotated_filepath(image['mask'])

-            load_image_node = nodes.LoadImage()
-            output_image, ignore_mask = load_image_node.load_image(image=image_path)
-            ignore_image, output_mask = load_image_node.load_image(image=mask_path)
+        load_image_node = nodes.LoadImage()
+        output_image, ignore_mask = load_image_node.load_image(image=image_path)
+        ignore_image, output_mask = load_image_node.load_image(image=mask_path)

-            return output_image, output_mask, model_file,
-        else:
-            image_path = folder_paths.get_annotated_filepath(image)
-            load_image_node = nodes.LoadImage()
-            output_image, output_mask = load_image_node.load_image(image=image_path)
-            return output_image, output_mask, model_file,
+        return output_image, output_mask, model_file,

 class Preview3D():
    @classmethod
@@ -101,9 +83,7 @@ class Preview3D():
        return {"required": {
            "model_file": ("STRING", {"default": "", "multiline": False}),
            "material": (["original", "normal", "wireframe", "depth"],),
-            "light_intensity": ("INT", {"default": 10, "min": 1, "max": 20, "step": 1}),
            "up_direction": (["original", "-x", "+x", "-y", "+y", "-z", "+z"],),
-            "fov": ("INT", {"default": 75, "min": 10, "max": 150, "step": 1}),
        }}

    OUTPUT_NODE = True
@@ -123,9 +103,7 @@ class Preview3DAnimation():
        return {"required": {
            "model_file": ("STRING", {"default": "", "multiline": False}),
            "material": (["original", "normal", "wireframe", "depth"],),
-            "light_intensity": ("INT", {"default": 10, "min": 1, "max": 20, "step": 1}),
            "up_direction": (["original", "-x", "+x", "-y", "+y", "-z", "+z"],),
-            "fov": ("INT", {"default": 75, "min": 10, "max": 150, "step": 1}),
        }}

    OUTPUT_NODE = True
--- a/comfy_extras/nodes_lumina2.py
+++ b/comfy_extras/nodes_lumina2.py
@@ -0,0 +1,104 @@
+from comfy.comfy_types import IO, ComfyNodeABC, InputTypeDict
+import torch
+
+
+class RenormCFG:
+    @classmethod
+    def INPUT_TYPES(s):
+        return {"required": { "model": ("MODEL",),
+                              "cfg_trunc": ("FLOAT", {"default": 100, "min": 0.0, "max": 100.0, "step": 0.01}),
+                              "renorm_cfg": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 100.0, "step": 0.01}),
+                              }}
+    RETURN_TYPES = ("MODEL",)
+    FUNCTION = "patch"
+
+    CATEGORY = "advanced/model"
+
+    def patch(self, model, cfg_trunc, renorm_cfg):
+        def renorm_cfg_func(args):
+            cond_denoised = args["cond_denoised"]
+            uncond_denoised = args["uncond_denoised"]
+            cond_scale = args["cond_scale"]
+            timestep = args["timestep"]
+            x_orig = args["input"]
+            in_channels = model.model.diffusion_model.in_channels
+
+            if timestep[0] < cfg_trunc:
+                cond_eps, uncond_eps = cond_denoised[:, :in_channels], uncond_denoised[:, :in_channels]
+                cond_rest, _ = cond_denoised[:, in_channels:], uncond_denoised[:, in_channels:]
+                half_eps = uncond_eps + cond_scale * (cond_eps - uncond_eps)
+                half_rest = cond_rest
+
+                if float(renorm_cfg) > 0.0:
+                    ori_pos_norm = torch.linalg.vector_norm(cond_eps
+                            , dim=tuple(range(1, len(cond_eps.shape))), keepdim=True
+                    )
+                    max_new_norm = ori_pos_norm * float(renorm_cfg)
+                    new_pos_norm = torch.linalg.vector_norm(
+                            half_eps, dim=tuple(range(1, len(half_eps.shape))), keepdim=True
+                        )
+                    if new_pos_norm >= max_new_norm:
+                        half_eps = half_eps * (max_new_norm / new_pos_norm)
+            else:
+                cond_eps, uncond_eps = cond_denoised[:, :in_channels], uncond_denoised[:, :in_channels]
+                cond_rest, _ = cond_denoised[:, in_channels:], uncond_denoised[:, in_channels:]
+                half_eps = cond_eps
+                half_rest = cond_rest
+
+            cfg_result = torch.cat([half_eps, half_rest], dim=1)
+
+            # cfg_result = uncond_denoised + (cond_denoised - uncond_denoised) * cond_scale
+
+            return x_orig - cfg_result
+
+        m = model.clone()
+        m.set_model_sampler_cfg_function(renorm_cfg_func)
+        return (m, )
+
+
+class CLIPTextEncodeLumina2(ComfyNodeABC):
+    SYSTEM_PROMPT = {
+        "superior": "You are an assistant designed to generate superior images with the superior "\
+            "degree of image-text alignment based on textual prompts or user prompts.",
+        "alignment": "You are an assistant designed to generate high-quality images with the "\
+            "highest degree of image-text alignment based on textual prompts."
+    }
+    SYSTEM_PROMPT_TIP = "Lumina2 provide two types of system prompts:" \
+        "Superior: You are an assistant designed to generate superior images with the superior "\
+        "degree of image-text alignment based on textual prompts or user prompts. "\
+        "Alignment: You are an assistant designed to generate high-quality images with the highest "\
+        "degree of image-text alignment based on textual prompts."
+    @classmethod
+    def INPUT_TYPES(s) -> InputTypeDict:
+        return {
+            "required": {
+                "system_prompt": (list(CLIPTextEncodeLumina2.SYSTEM_PROMPT.keys()), {"tooltip": CLIPTextEncodeLumina2.SYSTEM_PROMPT_TIP}),
+                "user_prompt": (IO.STRING, {"multiline": True, "dynamicPrompts": True, "tooltip": "The text to be encoded."}),
+                "clip": (IO.CLIP, {"tooltip": "The CLIP model used for encoding the text."})
+            }
+        }
+    RETURN_TYPES = (IO.CONDITIONING,)
+    OUTPUT_TOOLTIPS = ("A conditioning containing the embedded text used to guide the diffusion model.",)
+    FUNCTION = "encode"
+
+    CATEGORY = "conditioning"
+    DESCRIPTION = "Encodes a system prompt and a user prompt using a CLIP model into an embedding that can be used to guide the diffusion model towards generating specific images."
+
+    def encode(self, clip, user_prompt, system_prompt):
+        if clip is None:
+            raise RuntimeError("ERROR: clip input is invalid: None\n\nIf the clip is from a checkpoint loader node your checkpoint does not contain a valid clip or text encoder model.")
+        system_prompt = CLIPTextEncodeLumina2.SYSTEM_PROMPT[system_prompt]
+        prompt = f'{system_prompt} <Prompt Start> {user_prompt}'
+        tokens = clip.tokenize(prompt)
+        return (clip.encode_from_tokens_scheduled(tokens), )
+
+
+NODE_CLASS_MAPPINGS = {
+    "CLIPTextEncodeLumina2": CLIPTextEncodeLumina2,
+    "RenormCFG": RenormCFG
+}
+
+
+NODE_DISPLAY_NAME_MAPPINGS = {
+    "CLIPTextEncodeLumina2": "CLIP Text Encode for Lumina2",
+}
--- a/comfy_extras/nodes_model_advanced.py
+++ b/comfy_extras/nodes_model_advanced.py
@@ -3,6 +3,8 @@ import comfy.model_sampling
 import comfy.latent_formats
 import nodes
 import torch
+import node_helpers
+

 class LCM(comfy.model_sampling.EPS):
    def calculate_denoised(self, sigma, model_output, model_input):
@@ -294,6 +296,24 @@ class RescaleCFG:
        m.set_model_sampler_cfg_function(rescale_cfg)
        return (m, )

+class ModelComputeDtype:
+    @classmethod
+    def INPUT_TYPES(s):
+        return {"required": { "model": ("MODEL",),
+                              "dtype": (["default", "fp32", "fp16", "bf16"],),
+                              }}
+
+    RETURN_TYPES = ("MODEL",)
+    FUNCTION = "patch"
+
+    CATEGORY = "advanced/debug/model"
+
+    def patch(self, model, dtype):
+        m = model.clone()
+        m.set_model_compute_dtype(node_helpers.string_to_torch_dtype(dtype))
+        return (m, )
+
+
 NODE_CLASS_MAPPINGS = {
    "ModelSamplingDiscrete": ModelSamplingDiscrete,
    "ModelSamplingContinuousEDM": ModelSamplingContinuousEDM,
@@ -303,4 +323,5 @@ NODE_CLASS_MAPPINGS = {
    "ModelSamplingAuraFlow": ModelSamplingAuraFlow,
    "ModelSamplingFlux": ModelSamplingFlux,
    "RescaleCFG": RescaleCFG,
+    "ModelComputeDtype": ModelComputeDtype,
 }
--- a/comfy_extras/nodes_video.py
+++ b/comfy_extras/nodes_video.py
@@ -0,0 +1,76 @@
+import os
+import av
+import torch
+import folder_paths
+import json
+from fractions import Fraction
+
+
+class SaveWEBM:
+    def __init__(self):
+        self.output_dir = folder_paths.get_output_directory()
+        self.type = "output"
+        self.prefix_append = ""
+
+    @classmethod
+    def INPUT_TYPES(s):
+        return {"required":
+                    {"images": ("IMAGE", ),
+                     "filename_prefix": ("STRING", {"default": "ComfyUI"}),
+                     "codec": (["vp9", "av1"],),
+                     "fps": ("FLOAT", {"default": 24.0, "min": 0.01, "max": 1000.0, "step": 0.01}),
+                     "crf": ("FLOAT", {"default": 32.0, "min": 0, "max": 63.0, "step": 1, "tooltip": "Higher crf means lower quality with a smaller file size, lower crf means higher quality higher filesize."}),
+                     },
+                "hidden": {"prompt": "PROMPT", "extra_pnginfo": "EXTRA_PNGINFO"},
+                }
+
+    RETURN_TYPES = ()
+    FUNCTION = "save_images"
+
+    OUTPUT_NODE = True
+
+    CATEGORY = "image/video"
+
+    EXPERIMENTAL = True
+
+    def save_images(self, images, codec, fps, filename_prefix, crf, prompt=None, extra_pnginfo=None):
+        filename_prefix += self.prefix_append
+        full_output_folder, filename, counter, subfolder, filename_prefix = folder_paths.get_save_image_path(filename_prefix, self.output_dir, images[0].shape[1], images[0].shape[0])
+
+        file = f"{filename}_{counter:05}_.webm"
+        container = av.open(os.path.join(full_output_folder, file), mode="w")
+
+        if prompt is not None:
+            container.metadata["prompt"] = json.dumps(prompt)
+
+        if extra_pnginfo is not None:
+            for x in extra_pnginfo:
+                container.metadata[x] = json.dumps(extra_pnginfo[x])
+
+        codec_map = {"vp9": "libvpx-vp9", "av1": "libaom-av1"}
+        stream = container.add_stream(codec_map[codec], rate=Fraction(round(fps * 1000), 1000))
+        stream.width = images.shape[-2]
+        stream.height = images.shape[-3]
+        stream.pix_fmt = "yuv420p"
+        stream.bit_rate = 0
+        stream.options = {'crf': str(crf)}
+
+        for frame in images:
+            frame = av.VideoFrame.from_ndarray(torch.clamp(frame[..., :3] * 255, min=0, max=255).to(device=torch.device("cpu"), dtype=torch.uint8).numpy(), format="rgb24")
+            for packet in stream.encode(frame):
+                container.mux(packet)
+        container.mux(stream.encode())
+        container.close()
+
+        results = [{
+            "filename": file,
+            "subfolder": subfolder,
+            "type": self.type
+        }]
+
+        return {"ui": {"images": results, "animated": (True,)}}  # TODO: frontend side
+
+
+NODE_CLASS_MAPPINGS = {
+    "SaveWEBM": SaveWEBM,
+}
--- a/comfy_extras/nodes_wan.py
+++ b/comfy_extras/nodes_wan.py
@@ -0,0 +1,54 @@
+import nodes
+import node_helpers
+import torch
+import comfy.model_management
+import comfy.utils
+
+
+class WanImageToVideo:
+    @classmethod
+    def INPUT_TYPES(s):
+        return {"required": {"positive": ("CONDITIONING", ),
+                             "negative": ("CONDITIONING", ),
+                             "vae": ("VAE", ),
+                             "width": ("INT", {"default": 1280, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
+                             "height": ("INT", {"default": 720, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
+                             "length": ("INT", {"default": 121, "min": 1, "max": nodes.MAX_RESOLUTION, "step": 4}),
+                             "batch_size": ("INT", {"default": 1, "min": 1, "max": 4096}),
+                },
+                "optional": {"clip_vision_output": ("CLIP_VISION_OUTPUT", ),
+                             "start_image": ("IMAGE", ),
+                }}
+
+    RETURN_TYPES = ("CONDITIONING", "CONDITIONING", "LATENT")
+    RETURN_NAMES = ("positive", "negative", "latent")
+    FUNCTION = "encode"
+
+    CATEGORY = "conditioning/video_models"
+
+    def encode(self, positive, negative, vae, width, height, length, batch_size, start_image=None, clip_vision_output=None):
+        latent = torch.zeros([batch_size, 16, ((length - 1) // 4) + 1, height // 8, width // 8], device=comfy.model_management.intermediate_device())
+        if start_image is not None:
+            start_image = comfy.utils.common_upscale(start_image[:length].movedim(-1, 1), width, height, "bilinear", "center").movedim(1, -1)
+            image = torch.ones((length, height, width, start_image.shape[-1]), device=start_image.device, dtype=start_image.dtype) * 0.5
+            image[:start_image.shape[0]] = start_image
+
+            concat_latent_image = vae.encode(image[:, :, :, :3])
+            mask = torch.ones((1, 1, latent.shape[2], concat_latent_image.shape[-2], concat_latent_image.shape[-1]), device=start_image.device, dtype=start_image.dtype)
+            mask[:, :, :((start_image.shape[0] - 1) // 4) + 1] = 0.0
+
+            positive = node_helpers.conditioning_set_values(positive, {"concat_latent_image": concat_latent_image, "concat_mask": mask})
+            negative = node_helpers.conditioning_set_values(negative, {"concat_latent_image": concat_latent_image, "concat_mask": mask})
+
+        if clip_vision_output is not None:
+            positive = node_helpers.conditioning_set_values(positive, {"clip_vision_output": clip_vision_output})
+            negative = node_helpers.conditioning_set_values(negative, {"clip_vision_output": clip_vision_output})
+
+        out_latent = {}
+        out_latent["samples"] = latent
+        return (positive, negative, out_latent)
+
+
+NODE_CLASS_MAPPINGS = {
+    "WanImageToVideo": WanImageToVideo,
+}
--- a/comfyui_version.py
+++ b/comfyui_version.py
@@ -1,3 +1,3 @@
 # This file is automatically generated by the build process when version is
 # updated in pyproject.toml.
-__version__ = "0.3.14"
+__version__ = "0.3.18"
--- a/node_helpers.py
+++ b/node_helpers.py
@@ -1,4 +1,5 @@
 import hashlib
+import torch

 from comfy.cli_args import args

@@ -35,3 +36,11 @@ def hasher():
        "sha512": hashlib.sha512
    }
    return hashfuncs[args.default_hashing_function]
+
+def string_to_torch_dtype(string):
+    if string == "fp32":
+        return torch.float32
+    if string == "fp16":
+        return torch.float16
+    if string == "bf16":
+        return torch.bfloat16
--- a/nodes.py
+++ b/nodes.py
@@ -914,7 +914,7 @@ class CLIPLoader:
    @classmethod
    def INPUT_TYPES(s):
        return {"required": { "clip_name": (folder_paths.get_filename_list("text_encoders"), ),
-                              "type": (["stable_diffusion", "stable_cascade", "sd3", "stable_audio", "mochi", "ltxv", "pixart", "cosmos", "lumina2"], ),
+                              "type": (["stable_diffusion", "stable_cascade", "sd3", "stable_audio", "mochi", "ltxv", "pixart", "cosmos", "lumina2", "wan"], ),
                              },
                "optional": {
                              "device": (["default", "cpu"], {"advanced": True}),
@@ -924,7 +924,7 @@ class CLIPLoader:

    CATEGORY = "advanced/loaders"

-    DESCRIPTION = "[Recipes]\n\nstable_diffusion: clip-l\nstable_cascade: clip-g\nsd3: t5 / clip-g / clip-l\nstable_audio: t5\nmochi: t5\ncosmos: old t5 xxl\nlumina2: gemma 2 2B"
+    DESCRIPTION = "[Recipes]\n\nstable_diffusion: clip-l\nstable_cascade: clip-g\nsd3: t5 xxl/ clip-g / clip-l\nstable_audio: t5 base\nmochi: t5 xxl\ncosmos: old t5 xxl\nlumina2: gemma 2 2B\nwan: umt5 xxl"

    def load_clip(self, clip_name, type="stable_diffusion", device="default"):
        if type == "stable_cascade":
@@ -943,6 +943,8 @@ class CLIPLoader:
            clip_type = comfy.sd.CLIPType.COSMOS
        elif type == "lumina2":
            clip_type = comfy.sd.CLIPType.LUMINA2
+        elif type == "wan":
+            clip_type = comfy.sd.CLIPType.WAN
        else:
            clip_type = comfy.sd.CLIPType.STABLE_DIFFUSION

@@ -1763,6 +1765,36 @@ class LoadImageMask:

        return True

+
+class LoadImageOutput(LoadImage):
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+                "image": ("COMBO", {
+                    "image_upload": True,
+                    "image_folder": "output",
+                    "remote": {
+                        "route": "/internal/files/output",
+                        "refresh_button": True,
+                        "control_after_refresh": "first",
+                    },
+                }),
+            }
+        }
+
+    DESCRIPTION = "Load an image from the output folder. When the refresh button is clicked, the node will update the image list and automatically select the first image, allowing for easy iteration."
+    EXPERIMENTAL = True
+    FUNCTION = "load_image_output"
+
+    def load_image_output(self, image):
+        return self.load_image(f"{image} [output]")
+
+    @classmethod
+    def VALIDATE_INPUTS(s, image):
+        return True
+
+
 class ImageScale:
    upscale_methods = ["nearest-exact", "bilinear", "area", "bicubic", "lanczos"]
    crop_methods = ["disabled", "center"]
@@ -1949,6 +1981,7 @@ NODE_CLASS_MAPPINGS = {
    "PreviewImage": PreviewImage,
    "LoadImage": LoadImage,
    "LoadImageMask": LoadImageMask,
+    "LoadImageOutput": LoadImageOutput,
    "ImageScale": ImageScale,
    "ImageScaleBy": ImageScaleBy,
    "ImageInvert": ImageInvert,
@@ -2049,6 +2082,7 @@ NODE_DISPLAY_NAME_MAPPINGS = {
    "PreviewImage": "Preview Image",
    "LoadImage": "Load Image",
    "LoadImageMask": "Load Image (as Mask)",
+    "LoadImageOutput": "Load Image (from Outputs)",
    "ImageScale": "Upscale Image",
    "ImageScaleBy": "Upscale Image By",
    "ImageUpscaleWithModel": "Upscale Image (using Model)",
@@ -2233,6 +2267,9 @@ def init_builtin_extra_nodes():
        "nodes_hooks.py",
        "nodes_load_3d.py",
        "nodes_cosmos.py",
+        "nodes_video.py",
+        "nodes_lumina2.py",
+        "nodes_wan.py",
    ]

    import_failed = []
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "ComfyUI"
-version = "0.3.14"
+version = "0.3.18"
 readme = "README.md"
 license = { file = "LICENSE" }
 requires-python = ">=3.9"
--- a/requirements.txt
+++ b/requirements.txt
@@ -8,7 +8,8 @@ transformers>=4.28.1
 tokenizers>=0.13.3
 sentencepiece
 safetensors>=0.4.2
-aiohttp
+aiohttp>=3.11.8
+yarl>=1.18.0
 pyyaml
 Pillow
 scipy
@@ -19,3 +20,4 @@ psutil
 kornia>=0.7.1
 spandrel
 soundfile
+av
--- a/server.py
+++ b/server.py
@@ -150,7 +150,8 @@ class PromptServer():
        PromptServer.instance = self

        mimetypes.init()
-        mimetypes.types_map['.js'] = 'application/javascript; charset=utf-8'
+        mimetypes.add_type('application/javascript; charset=utf-8', '.js')
+        mimetypes.add_type('image/webp', '.webp')

        self.user_manager = UserManager()
        self.model_file_manager = ModelFileManager()
--- a/tests-unit/utils/extra_config_test.py
+++ b/tests-unit/utils/extra_config_test.py
@@ -114,7 +114,7 @@ def test_load_extra_model_paths_expands_userpath(
    mock_yaml_safe_load.assert_called_once()

    # Check if open was called with the correct file path
-    mock_file.assert_called_once_with(dummy_yaml_file_name, 'r')
+    mock_file.assert_called_once_with(dummy_yaml_file_name, 'r', encoding='utf-8')


@patch('builtins.open', new_callable=mock_open)
@@ -145,7 +145,7 @@ def test_load_extra_model_paths_expands_appdata(
    else:
        expected_base_path = '/Users/TestUser/AppData/Roaming/ComfyUI'
    expected_calls = [
-        ('checkpoints', os.path.join(expected_base_path, 'models/checkpoints'), False),
+        ('checkpoints', os.path.normpath(os.path.join(expected_base_path, 'models/checkpoints')), False),
    ]

    assert mock_add_model_folder_path.call_count == len(expected_calls)
@@ -197,8 +197,8 @@ def test_load_extra_path_config_relative_base_path(

    load_extra_path_config(dummy_yaml_name)

-    expected_checkpoints = os.path.abspath(os.path.join(str(tmp_path), sub_folder, "checkpoints"))
-    expected_some_value = os.path.abspath(os.path.join(str(tmp_path), sub_folder, "some_value"))
+    expected_checkpoints = os.path.abspath(os.path.join(str(tmp_path), "my_rel_base", "checkpoints"))
+    expected_some_value = os.path.abspath(os.path.join(str(tmp_path), "my_rel_base", "some_value"))

    actual_paths = folder_paths.folder_names_and_paths["checkpoints"][0]
    assert len(actual_paths) == 1, "Should have one path added for 'checkpoints'."
--- a/utils/extra_config.py
+++ b/utils/extra_config.py
@@ -4,7 +4,7 @@ import folder_paths
 import logging

 def load_extra_path_config(yaml_path):
-    with open(yaml_path, 'r') as stream:
+    with open(yaml_path, 'r', encoding='utf-8') as stream:
        config = yaml.safe_load(stream)
    yaml_dir = os.path.dirname(os.path.abspath(yaml_path))
    for c in config:
@@ -29,5 +29,6 @@ def load_extra_path_config(yaml_path):
                    full_path = os.path.join(base_path, full_path)
                elif not os.path.isabs(full_path):
                    full_path = os.path.abspath(os.path.join(yaml_dir, y))
-                logging.info("Adding extra search path {} {}".format(x, full_path))
-                folder_paths.add_model_folder_path(x, full_path, is_default)
+                normalized_path = os.path.normpath(full_path)
+                logging.info("Adding extra search path {} {}".format(x, normalized_path))
+                folder_paths.add_model_folder_path(x, normalized_path, is_default)
--- a/web/assets/BaseViewTemplate-BTbuZf5t.js
+++ b/web/assets/BaseViewTemplate-BTbuZf5t.js
@@ -1,4 +1,4 @@
-import { d as defineComponent, U as ref, p as onMounted, b4 as isElectron, W as nextTick, b5 as electronAPI, o as openBlock, f as createElementBlock, i as withDirectives, v as vShow, j as unref, b6 as isNativeWindow, m as createBaseVNode, A as renderSlot, ai as normalizeClass } from "./index-DqqhYDnY.js";
+import { d as defineComponent, T as ref, p as onMounted, b8 as isElectron, V as nextTick, b9 as electronAPI, o as openBlock, f as createElementBlock, i as withDirectives, v as vShow, j as unref, ba as isNativeWindow, m as createBaseVNode, A as renderSlot, aj as normalizeClass } from "./index-Bv0b06LE.js";
 const _hoisted_1 = { class: "flex-grow w-full flex items-center justify-center overflow-auto" };
 const _sfc_main = /* @__PURE__ */ defineComponent({
  __name: "BaseViewTemplate",
@@ -27,7 +27,7 @@ const _sfc_main = /* @__PURE__ */ defineComponent({
    });
    return (_ctx, _cache) => {
      return openBlock(), createElementBlock("div", {
-        class: normalizeClass(["font-sans w-screen h-screen flex flex-col pointer-events-auto", [
+        class: normalizeClass(["font-sans w-screen h-screen flex flex-col", [
          props.dark ? "text-neutral-300 bg-neutral-900 dark-theme" : "text-neutral-900 bg-neutral-300"
        ]])
      }, [
@@ -48,4 +48,4 @@ const _sfc_main = /* @__PURE__ */ defineComponent({
 export {
  _sfc_main as _
 };
-//# sourceMappingURL=BaseViewTemplate-Cz111_1A.js.map
+//# sourceMappingURL=BaseViewTemplate-BTbuZf5t.js.map
--- a/web/assets/DesktopStartView-D9r53Bue.js
+++ b/web/assets/DesktopStartView-D9r53Bue.js
@@ -0,0 +1,19 @@
+import { d as defineComponent, o as openBlock, y as createBlock, z as withCtx, k as createVNode, j as unref, bE as script } from "./index-Bv0b06LE.js";
+import { _ as _sfc_main$1 } from "./BaseViewTemplate-BTbuZf5t.js";
+const _sfc_main = /* @__PURE__ */ defineComponent({
+  __name: "DesktopStartView",
+  setup(__props) {
+    return (_ctx, _cache) => {
+      return openBlock(), createBlock(_sfc_main$1, { dark: "" }, {
+        default: withCtx(() => [
+          createVNode(unref(script), { class: "m-8 w-48 h-48" })
+        ]),
+        _: 1
+      });
+    };
+  }
+});
+export {
+  _sfc_main as default
+};
+//# sourceMappingURL=DesktopStartView-D9r53Bue.js.map
--- a/web/assets/DesktopStartView-FKlxS2Lt.js
+++ b/web/assets/DesktopStartView-FKlxS2Lt.js
@@ -1,22 +0,0 @@
-import { d as defineComponent, o as openBlock, y as createBlock, z as withCtx, m as createBaseVNode, k as createVNode, j as unref, bz as script } from "./index-DqqhYDnY.js";
-import { _ as _sfc_main$1 } from "./BaseViewTemplate-Cz111_1A.js";
-const _hoisted_1 = { class: "max-w-screen-sm w-screen p-8" };
-const _sfc_main = /* @__PURE__ */ defineComponent({
-  __name: "DesktopStartView",
-  setup(__props) {
-    return (_ctx, _cache) => {
-      return openBlock(), createBlock(_sfc_main$1, { dark: "" }, {
-        default: withCtx(() => [
-          createBaseVNode("div", _hoisted_1, [
-            createVNode(unref(script), { mode: "indeterminate" })
-          ])
-        ]),
-        _: 1
-      });
-    };
-  }
-});
-export {
-  _sfc_main as default
-};
-//# sourceMappingURL=DesktopStartView-FKlxS2Lt.js.map
--- a/web/assets/DesktopUpdateView-C-R0415K.js
+++ b/web/assets/DesktopUpdateView-C-R0415K.js
@@ -0,0 +1,58 @@
+var __defProp = Object.defineProperty;
+var __name = (target, value) => __defProp(target, "name", { value, configurable: true });
+import { d as defineComponent, T as ref, d8 as onUnmounted, o as openBlock, y as createBlock, z as withCtx, m as createBaseVNode, E as toDisplayString, j as unref, bg as t, k as createVNode, bE as script, l as script$1, b9 as electronAPI, _ as _export_sfc } from "./index-Bv0b06LE.js";
+import { s as script$2 } from "./index-A_bXPJCN.js";
+import { _ as _sfc_main$1 } from "./TerminalOutputDrawer-CKr7Br7O.js";
+import { _ as _sfc_main$2 } from "./BaseViewTemplate-BTbuZf5t.js";
+const _hoisted_1 = { class: "h-screen w-screen grid items-center justify-around overflow-y-auto" };
+const _hoisted_2 = { class: "relative m-8 text-center" };
+const _hoisted_3 = { class: "download-bg pi-download text-4xl font-bold" };
+const _hoisted_4 = { class: "m-8" };
+const _sfc_main = /* @__PURE__ */ defineComponent({
+  __name: "DesktopUpdateView",
+  setup(__props) {
+    const electron = electronAPI();
+    const terminalVisible = ref(false);
+    const toggleConsoleDrawer = /* @__PURE__ */ __name(() => {
+      terminalVisible.value = !terminalVisible.value;
+    }, "toggleConsoleDrawer");
+    onUnmounted(() => electron.Validation.dispose());
+    return (_ctx, _cache) => {
+      return openBlock(), createBlock(_sfc_main$2, { dark: "" }, {
+        default: withCtx(() => [
+          createBaseVNode("div", _hoisted_1, [
+            createBaseVNode("div", _hoisted_2, [
+              createBaseVNode("h1", _hoisted_3, toDisplayString(unref(t)("desktopUpdate.title")), 1),
+              createBaseVNode("div", _hoisted_4, [
+                createBaseVNode("span", null, toDisplayString(unref(t)("desktopUpdate.description")), 1)
+              ]),
+              createVNode(unref(script), { class: "m-8 w-48 h-48" }),
+              createVNode(unref(script$1), {
+                style: { "transform": "translateX(-50%)" },
+                class: "fixed bottom-0 left-1/2 my-8",
+                label: unref(t)("maintenance.consoleLogs"),
+                icon: "pi pi-desktop",
+                "icon-pos": "left",
+                severity: "secondary",
+                onClick: toggleConsoleDrawer
+              }, null, 8, ["label"]),
+              createVNode(_sfc_main$1, {
+                modelValue: terminalVisible.value,
+                "onUpdate:modelValue": _cache[0] || (_cache[0] = ($event) => terminalVisible.value = $event),
+                header: unref(t)("g.terminal"),
+                "default-message": unref(t)("desktopUpdate.terminalDefaultMessage")
+              }, null, 8, ["modelValue", "header", "default-message"])
+            ])
+          ]),
+          createVNode(unref(script$2))
+        ]),
+        _: 1
+      });
+    };
+  }
+});
+const DesktopUpdateView = /* @__PURE__ */ _export_sfc(_sfc_main, [["__scopeId", "data-v-8d77828d"]]);
+export {
+  DesktopUpdateView as default
+};
+//# sourceMappingURL=DesktopUpdateView-C-R0415K.js.map
--- a/web/assets/DesktopUpdateView-CxchaIvw.css
+++ b/web/assets/DesktopUpdateView-CxchaIvw.css
@@ -0,0 +1,20 @@
+
+.download-bg[data-v-8d77828d]::before {
+  position: absolute;
+  margin: 0px;
+  color: var(--p-text-muted-color);
+  font-family: 'primeicons';
+  top: -2rem;
+  right: 2rem;
+  speak: none;
+  font-style: normal;
+  font-weight: normal;
+  font-variant: normal;
+  text-transform: none;
+  line-height: 1;
+  display: inline-block;
+  -webkit-font-smoothing: antialiased;
+  opacity: 0.02;
+  font-size: min(14rem, 90vw);
+  z-index: 0
+}
--- a/web/assets/DownloadGitView-PWqK5ke4.js
+++ b/web/assets/DownloadGitView-PWqK5ke4.js
@@ -1,7 +1,7 @@
 var __defProp = Object.defineProperty;
 var __name = (target, value) => __defProp(target, "name", { value, configurable: true });
-import { d as defineComponent, o as openBlock, y as createBlock, z as withCtx, m as createBaseVNode, E as toDisplayString, k as createVNode, j as unref, l as script, be as useRouter } from "./index-DqqhYDnY.js";
-import { _ as _sfc_main$1 } from "./BaseViewTemplate-Cz111_1A.js";
+import { d as defineComponent, o as openBlock, y as createBlock, z as withCtx, m as createBaseVNode, E as toDisplayString, k as createVNode, j as unref, l as script, bi as useRouter } from "./index-Bv0b06LE.js";
+import { _ as _sfc_main$1 } from "./BaseViewTemplate-BTbuZf5t.js";
 const _hoisted_1 = { class: "max-w-screen-sm flex flex-col gap-8 p-8 bg-[url('/assets/images/Git-Logo-White.svg')] bg-no-repeat bg-right-top bg-origin-padding" };
 const _hoisted_2 = { class: "mt-24 text-4xl font-bold text-red-500" };
 const _hoisted_3 = { class: "space-y-4" };
@@ -55,4 +55,4 @@ const _sfc_main = /* @__PURE__ */ defineComponent({
 export {
  _sfc_main as default
 };
-//# sourceMappingURL=DownloadGitView-DVXUne-M.js.map
+//# sourceMappingURL=DownloadGitView-PWqK5ke4.js.map
--- a/web/assets/ExtensionPanel-Ba57xrmg.js
+++ b/web/assets/ExtensionPanel-Ba57xrmg.js
@@ -1,8 +1,8 @@
 var __defProp = Object.defineProperty;
 var __name = (target, value) => __defProp(target, "name", { value, configurable: true });
-import { d as defineComponent, U as ref, dl as FilterMatchMode, dr as useExtensionStore, a as useSettingStore, p as onMounted, c as computed, o as openBlock, y as createBlock, z as withCtx, k as createVNode, dm as SearchBox, j as unref, bj as script, m as createBaseVNode, f as createElementBlock, D as renderList, E as toDisplayString, a7 as createTextVNode, F as Fragment, l as script$1, B as createCommentVNode, a4 as script$3, ax as script$4, bn as script$5, dn as _sfc_main$1 } from "./index-DqqhYDnY.js";
-import { g as script$2, h as script$6 } from "./index-BapOFhAR.js";
-import "./index-DXE47DZl.js";
+import { d as defineComponent, T as ref, dx as FilterMatchMode, dC as useExtensionStore, a as useSettingStore, p as onMounted, c as computed, o as openBlock, y as createBlock, z as withCtx, k as createVNode, dy as SearchBox, j as unref, bn as script, m as createBaseVNode, f as createElementBlock, D as renderList, E as toDisplayString, a8 as createTextVNode, F as Fragment, l as script$1, B as createCommentVNode, a5 as script$3, ay as script$4, br as script$5, dz as _sfc_main$1 } from "./index-Bv0b06LE.js";
+import { g as script$2, h as script$6 } from "./index-CgMyWf7n.js";
+import "./index-Dzu9WL4p.js";
 const _hoisted_1 = { class: "flex justify-end" };
 const _sfc_main = /* @__PURE__ */ defineComponent({
  __name: "ExtensionPanel",
@@ -179,4 +179,4 @@ const _sfc_main = /* @__PURE__ */ defineComponent({
 export {
  _sfc_main as default
 };
-//# sourceMappingURL=ExtensionPanel-iPOrhDVM.js.map
+//# sourceMappingURL=ExtensionPanel-Ba57xrmg.js.map
--- a/web/assets/GraphView-B_UDZi95.js
+++ b/web/assets/GraphView-B_UDZi95.js
--- a/web/assets/GraphView-Bo28XDd0.css
+++ b/web/assets/GraphView-Bo28XDd0.css
@@ -1,6 +1,5 @@

-.comfy-menu-hamburger[data-v-7ed57d1a] {
-    pointer-events: auto;
+.comfy-menu-hamburger[data-v-82120b51] {
    position: fixed;
    z-index: 9999;
    display: flex;
@@ -41,19 +40,19 @@
  z-index: 999;
 }

-.p-buttongroup-vertical[data-v-cb8f9a1a] {
+.p-buttongroup-vertical[data-v-27a9500c] {
  display: flex;
  flex-direction: column;
  border-radius: var(--p-button-border-radius);
  overflow: hidden;
  border: 1px solid var(--p-panel-border-color);
 }
-.p-buttongroup-vertical .p-button[data-v-cb8f9a1a] {
+.p-buttongroup-vertical .p-button[data-v-27a9500c] {
  margin: 0;
  border-radius: 0;
 }

-.node-tooltip[data-v-46859edf] {
+.node-tooltip[data-v-f03142eb] {
  background: var(--comfy-input-bg);
  border-radius: 5px;
  box-shadow: 0 0 5px rgba(0, 0, 0, 0.4);
@@ -133,13 +132,11 @@
  border-right: 4px solid var(--p-button-text-primary-color);
 }

-.side-tool-bar-container[data-v-33cac83a] {
+.side-tool-bar-container[data-v-04875455] {
  display: flex;
  flex-direction: column;
  align-items: center;

-  pointer-events: auto;
-
  width: var(--sidebar-width);
  height: 100%;

@@ -150,16 +147,16 @@
  --sidebar-width: 4rem;
  --sidebar-icon-size: 1.5rem;
 }
-.side-tool-bar-container.small-sidebar[data-v-33cac83a] {
+.side-tool-bar-container.small-sidebar[data-v-04875455] {
  --sidebar-width: 2.5rem;
  --sidebar-icon-size: 1rem;
 }
-.side-tool-bar-end[data-v-33cac83a] {
+.side-tool-bar-end[data-v-04875455] {
  align-self: flex-end;
  margin-top: auto;
 }

-.status-indicator[data-v-8d011a31] {
+.status-indicator[data-v-fd6ae3af] {
  position: absolute;
  font-weight: 700;
  font-size: 1.5rem;
@@ -221,7 +218,7 @@
  border-radius: 0px
 }

-[data-v-38831d8e] .workflow-tabs {
+[data-v-6ab68035] .workflow-tabs {
  background-color: var(--comfy-menu-bg);
 }

@@ -235,31 +232,36 @@
  border-bottom-right-radius: 0;
 }

-.actionbar[data-v-915e5456] {
+.actionbar[data-v-ebd56d51] {
  pointer-events: all;
  position: fixed;
  z-index: 1000;
 }
-.actionbar.is-docked[data-v-915e5456] {
+.actionbar.is-docked[data-v-ebd56d51] {
  position: static;
  border-style: none;
  background-color: transparent;
  padding: 0px;
 }
-.actionbar.is-dragging[data-v-915e5456] {
+.actionbar.is-dragging[data-v-ebd56d51] {
  -webkit-user-select: none;
     -moz-user-select: none;
          user-select: none;
 }
-[data-v-915e5456] .p-panel-content {
+[data-v-ebd56d51] .p-panel-content {
  padding: 0.25rem;
 }
-.is-docked[data-v-915e5456] .p-panel-content {
+.is-docked[data-v-ebd56d51] .p-panel-content {
  padding: 0px;
 }
-[data-v-915e5456] .p-panel-header {
+[data-v-ebd56d51] .p-panel-header {
  display: none;
 }
+.drag-handle[data-v-ebd56d51] {
+  height: -moz-max-content;
+  height: max-content;
+  width: 0.75rem;
+}

 .top-menubar[data-v-56df69d2] .p-menubar-item-link svg {
  display: none;
@@ -275,7 +277,7 @@
  border-style: solid;
 }

-.comfyui-menu[data-v-929e7543] {
+.comfyui-menu[data-v-68d3b5b9] {
  width: 100vw;
  height: var(--comfy-topbar-height);
  background: var(--comfy-menu-bg);
@@ -288,19 +290,94 @@
  order: 0;
  grid-column: 1/-1;
 }
-.comfyui-menu.dropzone[data-v-929e7543] {
+.comfyui-menu.dropzone[data-v-68d3b5b9] {
  background: var(--p-highlight-background);
 }
-.comfyui-menu.dropzone-active[data-v-929e7543] {
+.comfyui-menu.dropzone-active[data-v-68d3b5b9] {
  background: var(--p-highlight-background-focus);
 }
-[data-v-929e7543] .p-menubar-item-label {
+[data-v-68d3b5b9] .p-menubar-item-label {
  line-height: revert;
 }
-.comfyui-logo[data-v-929e7543] {
+.comfyui-logo[data-v-68d3b5b9] {
  font-size: 1.2em;
  -webkit-user-select: none;
     -moz-user-select: none;
          user-select: none;
  cursor: default;
 }
+
+.comfyui-body[data-v-e89d9273] {
+  grid-template-columns: auto 1fr auto;
+  grid-template-rows: auto 1fr auto;
+}
+
+/**
+  +------------------+------------------+------------------+
+  |                                                        |
+  |  .comfyui-body-                                        |
+  |       top                                              |
+  | (spans all cols)                                       |
+  |                                                        |
+  +------------------+------------------+------------------+
+  |                  |                  |                  |
+  | .comfyui-body-   |   #graph-canvas  | .comfyui-body-   |
+  |      left        |                  |      right       |
+  |                  |                  |                  |
+  |                  |                  |                  |
+  +------------------+------------------+------------------+
+  |                                                        |
+  |  .comfyui-body-                                        |
+  |      bottom                                            |
+  | (spans all cols)                                       |
+  |                                                        |
+  +------------------+------------------+------------------+
+*/
+.comfyui-body-top[data-v-e89d9273] {
+  order: -5;
+  /* Span across all columns */
+  grid-column: 1/-1;
+  /* Position at the first row */
+  grid-row: 1;
+  /* Top menu bar dropdown needs to be above of graph canvas splitter overlay which is z-index: 999 */
+  /* Top menu bar z-index needs to be higher than bottom menu bar z-index as by default
+  pysssss's image feed is located at body-bottom, and it can overlap with the queue button, which
+  is located in body-top. */
+  z-index: 1001;
+  display: flex;
+  flex-direction: column;
+}
+.comfyui-body-left[data-v-e89d9273] {
+  order: -4;
+  /* Position in the first column */
+  grid-column: 1;
+  /* Position below the top element */
+  grid-row: 2;
+  z-index: 10;
+  display: flex;
+}
+.graph-canvas-container[data-v-e89d9273] {
+  width: 100%;
+  height: 100%;
+  order: -3;
+  grid-column: 2;
+  grid-row: 2;
+  position: relative;
+  overflow: hidden;
+}
+.comfyui-body-right[data-v-e89d9273] {
+  order: -2;
+  z-index: 10;
+  grid-column: 3;
+  grid-row: 2;
+}
+.comfyui-body-bottom[data-v-e89d9273] {
+  order: 4;
+  /* Span across all columns */
+  grid-column: 1/-1;
+  grid-row: 3;
+  /* Bottom menu bar dropdown needs to be above of graph canvas splitter overlay which is z-index: 999 */
+  z-index: 1000;
+  display: flex;
+  flex-direction: column;
+}
--- a/web/assets/InstallView-DW9xwU_F.js
+++ b/web/assets/InstallView-DW9xwU_F.js
@@ -1,9 +1,9 @@
 var __defProp = Object.defineProperty;
 var __name = (target, value) => __defProp(target, "name", { value, configurable: true });
-import { d as defineComponent, U as ref, bm as useModel, o as openBlock, f as createElementBlock, m as createBaseVNode, E as toDisplayString, k as createVNode, j as unref, bn as script, bh as script$1, ar as withModifiers, z as withCtx, ab as script$2, K as useI18n, c as computed, ai as normalizeClass, B as createCommentVNode, a4 as script$3, a7 as createTextVNode, b5 as electronAPI, _ as _export_sfc, p as onMounted, r as resolveDirective, bg as script$4, i as withDirectives, bo as script$5, bp as script$6, l as script$7, y as createBlock, bj as script$8, bq as MigrationItems, w as watchEffect, F as Fragment, D as renderList, br as script$9, bs as mergeModels, bt as ValidationState, Y as normalizeI18nKey, O as watch, bu as checkMirrorReachable, bv as _sfc_main$7, bw as mergeValidationStates, bc as t, a$ as script$a, bx as CUDA_TORCH_URL, by as NIGHTLY_CPU_TORCH_URL, be as useRouter, ag as toRaw } from "./index-DqqhYDnY.js";
-import { s as script$b, a as script$c, b as script$d, c as script$e, d as script$f } from "./index-BNlqgrYT.js";
+import { d as defineComponent, T as ref, bq as useModel, o as openBlock, f as createElementBlock, m as createBaseVNode, E as toDisplayString, k as createVNode, j as unref, br as script, bl as script$1, as as withModifiers, z as withCtx, ac as script$2, I as useI18n, c as computed, aj as normalizeClass, B as createCommentVNode, a5 as script$3, a8 as createTextVNode, b9 as electronAPI, _ as _export_sfc, p as onMounted, r as resolveDirective, bk as script$4, i as withDirectives, bs as script$5, bt as script$6, l as script$7, y as createBlock, bn as script$8, bu as MigrationItems, w as watchEffect, F as Fragment, D as renderList, bv as script$9, bw as mergeModels, bx as ValidationState, X as normalizeI18nKey, N as watch, by as checkMirrorReachable, bz as _sfc_main$7, bA as isInChina, bB as mergeValidationStates, bg as t, b3 as script$a, bC as CUDA_TORCH_URL, bD as NIGHTLY_CPU_TORCH_URL, bi as useRouter, ah as toRaw } from "./index-Bv0b06LE.js";
+import { s as script$b, a as script$c, b as script$d, c as script$e, d as script$f } from "./index-SeIZOWJp.js";
 import { P as PYTHON_MIRROR, a as PYPI_MIRROR } from "./uvMirrors-B-HKMf6X.js";
-import { _ as _sfc_main$8 } from "./BaseViewTemplate-Cz111_1A.js";
+import { _ as _sfc_main$8 } from "./BaseViewTemplate-BTbuZf5t.js";
 const _hoisted_1$5 = { class: "flex flex-col gap-6 w-[600px]" };
 const _hoisted_2$5 = { class: "flex flex-col gap-4" };
 const _hoisted_3$5 = { class: "text-2xl font-semibold text-neutral-100" };
@@ -314,6 +314,7 @@ const _sfc_main$4 = /* @__PURE__ */ defineComponent({
    const pathExists = ref(false);
    const appData = ref("");
    const appPath = ref("");
+    const inputTouched = ref(false);
    const electron = electronAPI();
    onMounted(async () => {
      const paths = await electron.getSystemPaths();
@@ -355,6 +356,13 @@ const _sfc_main$4 = /* @__PURE__ */ defineComponent({
        pathError.value = t2("install.failedToSelectDirectory");
      }
    }, "browsePath");
+    const onFocus = /* @__PURE__ */ __name(() => {
+      if (!inputTouched.value) {
+        inputTouched.value = true;
+        return;
+      }
+      validatePath(installPath.value);
+    }, "onFocus");
    return (_ctx, _cache) => {
      const _directive_tooltip = resolveDirective("tooltip");
      return openBlock(), createElementBlock("div", _hoisted_1$3, [
@@ -370,10 +378,16 @@ const _sfc_main$4 = /* @__PURE__ */ defineComponent({
                    _cache[0] || (_cache[0] = ($event) => installPath.value = $event),
                    validatePath
                  ],
-                  class: normalizeClass(["w-full", { "p-invalid": pathError.value }])
+                  class: normalizeClass(["w-full", { "p-invalid": pathError.value }]),
+                  onFocus
                }, null, 8, ["modelValue", "class"]),
                withDirectives(createVNode(unref(script$5), { class: "pi pi-info-circle" }, null, 512), [
-                  [_directive_tooltip, _ctx.$t("install.installLocationTooltip")]
+                  [
+                    _directive_tooltip,
+                    _ctx.$t("install.installLocationTooltip"),
+                    void 0,
+                    { top: true }
+                  ]
                ])
              ]),
              _: 1
@@ -595,13 +609,12 @@ const _sfc_main$2 = /* @__PURE__ */ defineComponent({
      }
    });
    return (_ctx, _cache) => {
-      const _component_UrlInput = _sfc_main$7;
      return openBlock(), createElementBlock("div", _hoisted_1$1, [
        createBaseVNode("div", _hoisted_2$1, [
          createBaseVNode("h3", _hoisted_3$1, toDisplayString(_ctx.$t(`settings.${normalizedSettingId.value}.name`)), 1),
          createBaseVNode("p", _hoisted_4$1, toDisplayString(_ctx.$t(`settings.${normalizedSettingId.value}.tooltip`)), 1)
        ]),
-        createVNode(_component_UrlInput, {
+        createVNode(_sfc_main$7, {
          modelValue: modelValue.value,
          "onUpdate:modelValue": _cache[0] || (_cache[0] = ($event) => modelValue.value = $event),
          "validate-url-fn": /* @__PURE__ */ __name((mirror) => unref(checkMirrorReachable)(mirror + (_ctx.item.validationPathSuffix ?? "")), "validate-url-fn"),
@@ -653,11 +666,24 @@ const _sfc_main$1 = /* @__PURE__ */ defineComponent({
          };
      }
    }, "getTorchMirrorItem");
-    const mirrors = computed(() => [
-      [PYTHON_MIRROR, pythonMirror],
-      [PYPI_MIRROR, pypiMirror],
-      [getTorchMirrorItem(__props.device), torchMirror]
-    ]);
+    const userIsInChina = ref(false);
+    onMounted(async () => {
+      userIsInChina.value = await isInChina();
+    });
+    const useFallbackMirror = /* @__PURE__ */ __name((mirror) => ({
+      ...mirror,
+      mirror: mirror.fallbackMirror
+    }), "useFallbackMirror");
+    const mirrors = computed(
+      () => [
+        [PYTHON_MIRROR, pythonMirror],
+        [PYPI_MIRROR, pypiMirror],
+        [getTorchMirrorItem(__props.device), torchMirror]
+      ].map(([item, modelValue]) => [
+        userIsInChina.value ? useFallbackMirror(item) : item,
+        modelValue
+      ])
+    );
    const validationStates = ref(
      mirrors.value.map(() => ValidationState.IDLE)
    );
@@ -942,4 +968,4 @@ const InstallView = /* @__PURE__ */ _export_sfc(_sfc_main, [["__scopeId", "data-
 export {
  InstallView as default
 };
-//# sourceMappingURL=InstallView-CVZcZZXJ.js.map
+//# sourceMappingURL=InstallView-DW9xwU_F.js.map
--- a/web/assets/KeybindingPanel-CDYVPYDp.css
+++ b/web/assets/KeybindingPanel-CDYVPYDp.css
@@ -0,0 +1,8 @@
+
+[data-v-8454e24f] .p-datatable-tbody > tr > td {
+  padding: 0.25rem;
+  min-height: 2rem
+}
+[data-v-8454e24f] .p-datatable-row-selected .actions,[data-v-8454e24f] .p-datatable-selectable-row:hover .actions {
+  visibility: visible
+}
--- a/web/assets/KeybindingPanel-DvrUYZ4S.css
+++ b/web/assets/KeybindingPanel-DvrUYZ4S.css
@@ -1,8 +0,0 @@
-
-[data-v-2554ab36] .p-datatable-tbody > tr > td {
-  padding: 0.25rem;
-  min-height: 2rem
-}
-[data-v-2554ab36] .p-datatable-row-selected .actions,[data-v-2554ab36] .p-datatable-selectable-row:hover .actions {
-  visibility: visible
-}
--- a/web/assets/KeybindingPanel-oavhFdkz.js
+++ b/web/assets/KeybindingPanel-oavhFdkz.js
@@ -1,9 +1,9 @@
 var __defProp = Object.defineProperty;
 var __name = (target, value) => __defProp(target, "name", { value, configurable: true });
-import { d as defineComponent, c as computed, o as openBlock, f as createElementBlock, F as Fragment, D as renderList, k as createVNode, z as withCtx, a7 as createTextVNode, E as toDisplayString, j as unref, a4 as script, B as createCommentVNode, U as ref, dl as FilterMatchMode, an as useKeybindingStore, L as useCommandStore, K as useI18n, Y as normalizeI18nKey, w as watchEffect, aR as useToast, r as resolveDirective, y as createBlock, dm as SearchBox, m as createBaseVNode, l as script$2, bg as script$4, ar as withModifiers, bj as script$5, ab as script$6, i as withDirectives, dn as _sfc_main$2, dp as KeyComboImpl, dq as KeybindingImpl, _ as _export_sfc } from "./index-DqqhYDnY.js";
-import { g as script$1, h as script$3 } from "./index-BapOFhAR.js";
-import { u as useKeybindingService } from "./keybindingService-DEgCutrm.js";
-import "./index-DXE47DZl.js";
+import { d as defineComponent, c as computed, o as openBlock, f as createElementBlock, F as Fragment, D as renderList, k as createVNode, z as withCtx, a8 as createTextVNode, E as toDisplayString, j as unref, a5 as script, B as createCommentVNode, T as ref, dx as FilterMatchMode, ao as useKeybindingStore, J as useCommandStore, I as useI18n, X as normalizeI18nKey, w as watchEffect, aV as useToast, r as resolveDirective, y as createBlock, dy as SearchBox, m as createBaseVNode, l as script$2, bk as script$4, as as withModifiers, bn as script$5, ac as script$6, i as withDirectives, dz as _sfc_main$2, dA as KeyComboImpl, dB as KeybindingImpl, _ as _export_sfc } from "./index-Bv0b06LE.js";
+import { g as script$1, h as script$3 } from "./index-CgMyWf7n.js";
+import { u as useKeybindingService } from "./keybindingService-DyjX-nxF.js";
+import "./index-Dzu9WL4p.js";
 const _hoisted_1$1 = {
  key: 0,
  class: "px-2"
@@ -96,6 +96,16 @@ const _sfc_main = /* @__PURE__ */ defineComponent({
    }
    __name(removeKeybinding, "removeKeybinding");
    function captureKeybinding(event) {
+      if (!event.shiftKey && !event.altKey && !event.ctrlKey && !event.metaKey) {
+        switch (event.key) {
+          case "Escape":
+            cancelEdit();
+            return;
+          case "Enter":
+            saveKeybinding();
+            return;
+        }
+      }
      const keyCombo = KeyComboImpl.fromEvent(event);
      newBindingKeyCombo.value = keyCombo;
    }
@@ -151,7 +161,7 @@ const _sfc_main = /* @__PURE__ */ defineComponent({
            value: commandsData.value,
            selection: selectedCommandData.value,
            "onUpdate:selection": _cache[1] || (_cache[1] = ($event) => selectedCommandData.value = $event),
-            "global-filter-fields": ["id"],
+            "global-filter-fields": ["id", "label"],
            filters: filters.value,
            selectionMode: "single",
            stripedRows: "",
@@ -216,7 +226,7 @@ const _sfc_main = /* @__PURE__ */ defineComponent({
            visible: editDialogVisible.value,
            "onUpdate:visible": _cache[2] || (_cache[2] = ($event) => editDialogVisible.value = $event),
            modal: "",
-            header: currentEditingCommand.value?.id,
+            header: currentEditingCommand.value?.label,
            onHide: cancelEdit
          }, {
            footer: withCtx(() => [
@@ -275,8 +285,8 @@ const _sfc_main = /* @__PURE__ */ defineComponent({
    };
  }
 });
-const KeybindingPanel = /* @__PURE__ */ _export_sfc(_sfc_main, [["__scopeId", "data-v-2554ab36"]]);
+const KeybindingPanel = /* @__PURE__ */ _export_sfc(_sfc_main, [["__scopeId", "data-v-8454e24f"]]);
 export {
  KeybindingPanel as default
 };
-//# sourceMappingURL=KeybindingPanel-CeHhC2F4.js.map
+//# sourceMappingURL=KeybindingPanel-oavhFdkz.js.map
--- a/web/assets/MaintenanceView-Bh8OZpgl.js
+++ b/web/assets/MaintenanceView-Bh8OZpgl.js
--- a/web/assets/MaintenanceView-DEJCj8SR.css
+++ b/web/assets/MaintenanceView-DEJCj8SR.css
@@ -63,10 +63,10 @@
 }
 }

-[data-v-74b78f7d] .p-tag {
+[data-v-dd50a7dd] .p-tag {
  --p-tag-gap: 0.375rem;
 }
-.backspan[data-v-74b78f7d]::before {
+.backspan[data-v-dd50a7dd]::before {
  position: absolute;
  margin: 0px;
  color: var(--p-text-muted-color);
--- a/web/assets/ManualConfigurationView-DTLyJ3VG.js
+++ b/web/assets/ManualConfigurationView-DTLyJ3VG.js
@@ -1,7 +1,7 @@
 var __defProp = Object.defineProperty;
 var __name = (target, value) => __defProp(target, "name", { value, configurable: true });
-import { d as defineComponent, K as useI18n, U as ref, p as onMounted, o as openBlock, y as createBlock, z as withCtx, m as createBaseVNode, E as toDisplayString, k as createVNode, j as unref, a4 as script, a$ as script$1, l as script$2, b5 as electronAPI, _ as _export_sfc } from "./index-DqqhYDnY.js";
-import { _ as _sfc_main$1 } from "./BaseViewTemplate-Cz111_1A.js";
+import { d as defineComponent, I as useI18n, T as ref, p as onMounted, o as openBlock, y as createBlock, z as withCtx, m as createBaseVNode, E as toDisplayString, k as createVNode, j as unref, a5 as script, b3 as script$1, l as script$2, b9 as electronAPI, _ as _export_sfc } from "./index-Bv0b06LE.js";
+import { _ as _sfc_main$1 } from "./BaseViewTemplate-BTbuZf5t.js";
 const _hoisted_1 = { class: "comfy-installer grow flex flex-col gap-4 text-neutral-300 max-w-110" };
 const _hoisted_2 = { class: "text-2xl font-semibold text-neutral-100" };
 const _hoisted_3 = { class: "m-1 text-neutral-300" };
@@ -71,4 +71,4 @@ const ManualConfigurationView = /* @__PURE__ */ _export_sfc(_sfc_main, [["__scop
 export {
  ManualConfigurationView as default
 };
-//# sourceMappingURL=ManualConfigurationView-Cz0_f_T-.js.map
+//# sourceMappingURL=ManualConfigurationView-DTLyJ3VG.js.map
--- a/web/assets/MetricsConsentView-C80fk2cl.js
+++ b/web/assets/MetricsConsentView-C80fk2cl.js
@@ -1,7 +1,7 @@
 var __defProp = Object.defineProperty;
 var __name = (target, value) => __defProp(target, "name", { value, configurable: true });
-import { _ as _sfc_main$1 } from "./BaseViewTemplate-Cz111_1A.js";
-import { d as defineComponent, aR as useToast, K as useI18n, U as ref, be as useRouter, o as openBlock, y as createBlock, z as withCtx, m as createBaseVNode, E as toDisplayString, a7 as createTextVNode, k as createVNode, j as unref, bn as script, l as script$1, b5 as electronAPI } from "./index-DqqhYDnY.js";
+import { _ as _sfc_main$1 } from "./BaseViewTemplate-BTbuZf5t.js";
+import { d as defineComponent, aV as useToast, I as useI18n, T as ref, bi as useRouter, o as openBlock, y as createBlock, z as withCtx, m as createBaseVNode, E as toDisplayString, a8 as createTextVNode, k as createVNode, j as unref, br as script, l as script$1, b9 as electronAPI } from "./index-Bv0b06LE.js";
 const _hoisted_1 = { class: "h-full p-8 2xl:p-16 flex flex-col items-center justify-center" };
 const _hoisted_2 = { class: "bg-neutral-800 rounded-lg shadow-lg p-6 w-full max-w-[600px] flex flex-col gap-6" };
 const _hoisted_3 = { class: "text-3xl font-semibold text-neutral-100" };
@@ -83,4 +83,4 @@ const _sfc_main = /* @__PURE__ */ defineComponent({
 export {
  _sfc_main as default
 };
-//# sourceMappingURL=MetricsConsentView-B5NlgqrS.js.map
+//# sourceMappingURL=MetricsConsentView-C80fk2cl.js.map
--- a/web/assets/NotSupportedView-B78ZVR9Z.js
+++ b/web/assets/NotSupportedView-B78ZVR9Z.js
@@ -1,7 +1,7 @@
 var __defProp = Object.defineProperty;
 var __name = (target, value) => __defProp(target, "name", { value, configurable: true });
-import { d as defineComponent, be as useRouter, r as resolveDirective, o as openBlock, y as createBlock, z as withCtx, m as createBaseVNode, E as toDisplayString, k as createVNode, j as unref, l as script, i as withDirectives, _ as _export_sfc } from "./index-DqqhYDnY.js";
-import { _ as _sfc_main$1 } from "./BaseViewTemplate-Cz111_1A.js";
+import { d as defineComponent, bi as useRouter, r as resolveDirective, o as openBlock, y as createBlock, z as withCtx, m as createBaseVNode, E as toDisplayString, k as createVNode, j as unref, l as script, i as withDirectives, _ as _export_sfc } from "./index-Bv0b06LE.js";
+import { _ as _sfc_main$1 } from "./BaseViewTemplate-BTbuZf5t.js";
 const _imports_0 = "" + new URL("images/sad_girl.png", import.meta.url).href;
 const _hoisted_1 = { class: "sad-container" };
 const _hoisted_2 = { class: "no-drag sad-text flex items-center" };
@@ -83,4 +83,4 @@ const NotSupportedView = /* @__PURE__ */ _export_sfc(_sfc_main, [["__scopeId", "
 export {
  NotSupportedView as default
 };
-//# sourceMappingURL=NotSupportedView-BUpntA4x.js.map
+//# sourceMappingURL=NotSupportedView-B78ZVR9Z.js.map
--- a/web/assets/ServerConfigPanel-BYrt6wyr.js
+++ b/web/assets/ServerConfigPanel-BYrt6wyr.js
@@ -1,7 +1,7 @@
 var __defProp = Object.defineProperty;
 var __name = (target, value) => __defProp(target, "name", { value, configurable: true });
-import { o as openBlock, f as createElementBlock, m as createBaseVNode, H as markRaw, d as defineComponent, a as useSettingStore, ae as storeToRefs, O as watch, dy as useCopyToClipboard, K as useI18n, y as createBlock, z as withCtx, j as unref, bj as script, E as toDisplayString, D as renderList, F as Fragment, k as createVNode, l as script$1, B as createCommentVNode, bh as script$2, dz as FormItem, dn as _sfc_main$1, b5 as electronAPI } from "./index-DqqhYDnY.js";
-import { u as useServerConfigStore } from "./serverConfigStore-Kb5DJVFt.js";
+import { o as openBlock, f as createElementBlock, m as createBaseVNode, H as markRaw, d as defineComponent, a as useSettingStore, af as storeToRefs, N as watch, dJ as useCopyToClipboard, I as useI18n, y as createBlock, z as withCtx, j as unref, bn as script, E as toDisplayString, D as renderList, F as Fragment, k as createVNode, l as script$1, B as createCommentVNode, bl as script$2, dK as FormItem, dz as _sfc_main$1, b9 as electronAPI } from "./index-Bv0b06LE.js";
+import { u as useServerConfigStore } from "./serverConfigStore-D2Vr0L0h.js";
 const _hoisted_1$1 = {
  viewBox: "0 0 24 24",
  width: "1.2em",
@@ -153,4 +153,4 @@ const _sfc_main = /* @__PURE__ */ defineComponent({
 export {
  _sfc_main as default
 };
-//# sourceMappingURL=ServerConfigPanel-B1lI5M9c.js.map
+//# sourceMappingURL=ServerConfigPanel-BYrt6wyr.js.map
--- a/web/assets/ServerStartView-B7TlHxYo.js
+++ b/web/assets/ServerStartView-B7TlHxYo.js
@@ -1,7 +1,7 @@
 var __defProp = Object.defineProperty;
 var __name = (target, value) => __defProp(target, "name", { value, configurable: true });
-import { d as defineComponent, K as useI18n, U as ref, bk as ProgressStatus, p as onMounted, o as openBlock, y as createBlock, z as withCtx, m as createBaseVNode, a7 as createTextVNode, E as toDisplayString, j as unref, f as createElementBlock, B as createCommentVNode, k as createVNode, l as script, i as withDirectives, v as vShow, bl as BaseTerminal, b5 as electronAPI, _ as _export_sfc } from "./index-DqqhYDnY.js";
-import { _ as _sfc_main$1 } from "./BaseViewTemplate-Cz111_1A.js";
+import { d as defineComponent, I as useI18n, T as ref, bo as ProgressStatus, p as onMounted, o as openBlock, y as createBlock, z as withCtx, m as createBaseVNode, a8 as createTextVNode, E as toDisplayString, j as unref, f as createElementBlock, B as createCommentVNode, k as createVNode, l as script, i as withDirectives, v as vShow, bp as BaseTerminal, b9 as electronAPI, _ as _export_sfc } from "./index-Bv0b06LE.js";
+import { _ as _sfc_main$1 } from "./BaseViewTemplate-BTbuZf5t.js";
 const _hoisted_1 = { class: "flex flex-col w-full h-full items-center" };
 const _hoisted_2 = { class: "text-2xl font-bold" };
 const _hoisted_3 = { key: 0 };
@@ -93,8 +93,8 @@ const _sfc_main = /* @__PURE__ */ defineComponent({
    };
  }
 });
-const ServerStartView = /* @__PURE__ */ _export_sfc(_sfc_main, [["__scopeId", "data-v-4140d62b"]]);
+const ServerStartView = /* @__PURE__ */ _export_sfc(_sfc_main, [["__scopeId", "data-v-e6ba9633"]]);
 export {
  ServerStartView as default
 };
-//# sourceMappingURL=ServerStartView-BpH4TXPO.js.map
+//# sourceMappingURL=ServerStartView-B7TlHxYo.js.map
--- a/web/assets/ServerStartView-BZ7uhZHv.css
+++ b/web/assets/ServerStartView-BZ7uhZHv.css
@@ -1,5 +1,5 @@

-[data-v-4140d62b] .xterm-helper-textarea {
+[data-v-e6ba9633] .xterm-helper-textarea {
  /* Hide this as it moves all over when uv is running */
  display: none;
 }
--- a/web/assets/TerminalOutputDrawer-CKr7Br7O.js
+++ b/web/assets/TerminalOutputDrawer-CKr7Br7O.js
--- a/web/assets/UserSelectView-C703HOyO.js
+++ b/web/assets/UserSelectView-C703HOyO.js
@@ -1,7 +1,7 @@
 var __defProp = Object.defineProperty;
 var __name = (target, value) => __defProp(target, "name", { value, configurable: true });
-import { d as defineComponent, aj as useUserStore, be as useRouter, U as ref, c as computed, p as onMounted, o as openBlock, y as createBlock, z as withCtx, m as createBaseVNode, E as toDisplayString, k as createVNode, bf as withKeys, j as unref, bg as script, bh as script$1, bi as script$2, bj as script$3, a7 as createTextVNode, B as createCommentVNode, l as script$4 } from "./index-DqqhYDnY.js";
-import { _ as _sfc_main$1 } from "./BaseViewTemplate-Cz111_1A.js";
+import { d as defineComponent, ak as useUserStore, bi as useRouter, T as ref, c as computed, p as onMounted, o as openBlock, y as createBlock, z as withCtx, m as createBaseVNode, E as toDisplayString, k as createVNode, bj as withKeys, j as unref, bk as script, bl as script$1, bm as script$2, bn as script$3, a8 as createTextVNode, B as createCommentVNode, l as script$4 } from "./index-Bv0b06LE.js";
+import { _ as _sfc_main$1 } from "./BaseViewTemplate-BTbuZf5t.js";
 const _hoisted_1 = {
  id: "comfy-user-selection",
  class: "min-w-84 relative rounded-lg bg-[var(--comfy-menu-bg)] p-5 px-10 shadow-lg"
@@ -98,4 +98,4 @@ const _sfc_main = /* @__PURE__ */ defineComponent({
 export {
  _sfc_main as default
 };
-//# sourceMappingURL=UserSelectView-wxa07xPk.js.map
+//# sourceMappingURL=UserSelectView-C703HOyO.js.map
--- a/web/assets/WelcomeView-DIFvbWc2.js
+++ b/web/assets/WelcomeView-DIFvbWc2.js
@@ -1,7 +1,7 @@
 var __defProp = Object.defineProperty;
 var __name = (target, value) => __defProp(target, "name", { value, configurable: true });
-import { d as defineComponent, be as useRouter, o as openBlock, y as createBlock, z as withCtx, m as createBaseVNode, E as toDisplayString, k as createVNode, j as unref, l as script, _ as _export_sfc } from "./index-DqqhYDnY.js";
-import { _ as _sfc_main$1 } from "./BaseViewTemplate-Cz111_1A.js";
+import { d as defineComponent, bi as useRouter, o as openBlock, y as createBlock, z as withCtx, m as createBaseVNode, E as toDisplayString, k as createVNode, j as unref, l as script, _ as _export_sfc } from "./index-Bv0b06LE.js";
+import { _ as _sfc_main$1 } from "./BaseViewTemplate-BTbuZf5t.js";
 const _hoisted_1 = { class: "flex flex-col items-center justify-center gap-8 p-8" };
 const _hoisted_2 = { class: "animated-gradient-text text-glow select-none" };
 const _sfc_main = /* @__PURE__ */ defineComponent({
@@ -36,4 +36,4 @@ const WelcomeView = /* @__PURE__ */ _export_sfc(_sfc_main, [["__scopeId", "data-
 export {
  WelcomeView as default
 };
-//# sourceMappingURL=WelcomeView-BrXELNIm.js.map
+//# sourceMappingURL=WelcomeView-DIFvbWc2.js.map
--- a/web/assets/index-A_bXPJCN.js
+++ b/web/assets/index-A_bXPJCN.js
--- a/web/assets/index-B36GcHVN.js
+++ b/web/assets/index-B36GcHVN.js
--- a/web/assets/index-Bv0b06LE.js
+++ b/web/assets/index-Bv0b06LE.js
--- a/web/assets/index-C068lYT4.js
+++ b/web/assets/index-C068lYT4.js
--- a/web/assets/index-CBxvvAzM.css
+++ b/web/assets/index-CBxvvAzM.css
@@ -306,6 +306,7 @@
 .litegraph .dialog .dialog-footer {
  height: 50px;
  padding: 10px;
+  margin: 0;
  border-top: 1px solid #1a1a1a;
 }

@@ -442,63 +443,6 @@
  color: black;
 }

-.litegraph .subgraph_property {
-  padding: 4px;
-}
-
-.litegraph .subgraph_property:hover {
-  background-color: #333;
-}
-
-.litegraph .subgraph_property.extra {
-  margin-top: 8px;
-}
-
-.litegraph .subgraph_property span.name {
-  font-size: 1.3em;
-  padding-left: 4px;
-}
-
-.litegraph .subgraph_property span.type {
-  opacity: 0.5;
-  margin-right: 20px;
-  padding-left: 4px;
-}
-
-.litegraph .subgraph_property span.label {
-  display: inline-block;
-  width: 60px;
-  padding: 0px 10px;
-}
-
-.litegraph .subgraph_property input {
-  width: 140px;
-  color: #999;
-  background-color: #1a1a1a;
-  border-radius: 4px;
-  border: 0;
-  margin-right: 10px;
-  padding: 4px;
-  padding-left: 10px;
-}
-
-.litegraph .subgraph_property button {
-  background-color: #1c1c1c;
-  color: #aaa;
-  border: 0;
-  border-radius: 2px;
-  padding: 4px 10px;
-  cursor: pointer;
-}
-
-.litegraph .subgraph_property.extra {
-  color: #ccc;
-}
-
-.litegraph .subgraph_property.extra input {
-  background-color: #111;
-}
-
 .litegraph .bullet_icon {
  margin-left: 10px;
  border-radius: 10px;
@@ -661,21 +605,6 @@
 .litegraph .dialog .dialog-content {
  display: block;
 }
-.litegraph .dialog .dialog-content .subgraph_property {
-  padding: 5px;
-}
-.litegraph .dialog .dialog-footer {
-  margin: 0;
-}
-.litegraph .dialog .dialog-footer .subgraph_property {
-  margin-top: 0;
-  display: flex;
-  align-items: center;
-  padding: 5px;
-}
-.litegraph .dialog .dialog-footer .subgraph_property .name {
-  flex: 1;
-}
 .litegraph .graphdialog {
  display: flex;
  align-items: center;
@@ -2110,6 +2039,9 @@
  .-right-4{
    right: -1rem;
  }
+  .bottom-0{
+    bottom: 0px;
+  }
  .bottom-\[10px\]{
    bottom: 10px;
  }
@@ -2119,6 +2051,15 @@
  .left-0{
    left: 0px;
  }
+  .left-1\/2{
+    left: 50%;
+  }
+  .left-12{
+    left: 3rem;
+  }
+  .left-2{
+    left: 0.5rem;
+  }
  .left-\[-350px\]{
    left: -350px;
  }
@@ -2128,6 +2069,9 @@
  .top-0{
    top: 0px;
  }
+  .top-2{
+    top: 0.5rem;
+  }
  .top-\[50px\]{
    top: 50px;
  }
@@ -2137,6 +2081,9 @@
  .z-10{
    z-index: 10;
  }
+  .z-20{
+    z-index: 20;
+  }
  .z-\[1000\]{
    z-index: 1000;
  }
@@ -2196,6 +2143,10 @@
    margin-top: 1rem;
    margin-bottom: 1rem;
  }
+  .my-8{
+    margin-top: 2rem;
+    margin-bottom: 2rem;
+  }
  .mb-2{
    margin-bottom: 0.5rem;
  }
@@ -2286,6 +2237,9 @@
  .h-16{
    height: 4rem;
  }
+  .h-48{
+    height: 12rem;
+  }
  .h-6{
    height: 1.5rem;
  }
@@ -2331,6 +2285,9 @@
  .min-h-screen{
    min-height: 100vh;
  }
+  .w-0{
+    width: 0px;
+  }
  .w-1\/2{
    width: 50%;
  }
@@ -2343,12 +2300,21 @@
  .w-16{
    width: 4rem;
  }
+  .w-24{
+    width: 6rem;
+  }
  .w-28{
    width: 7rem;
  }
+  .w-3{
+    width: 0.75rem;
+  }
  .w-3\/12{
    width: 25%;
  }
+  .w-32{
+    width: 8rem;
+  }
  .w-44{
    width: 11rem;
  }
@@ -2458,6 +2424,9 @@
  .cursor-pointer{
    cursor: pointer;
  }
+  .touch-none{
+    touch-action: none;
+  }
  .select-none{
    -webkit-user-select: none;
       -moz-user-select: none;
@@ -2893,6 +2862,10 @@
    --tw-text-opacity: 1;
    color: rgb(239 68 68 / var(--tw-text-opacity));
  }
+  .text-white{
+    --tw-text-opacity: 1;
+    color: rgb(255 255 255 / var(--tw-text-opacity));
+  }
  .underline{
    text-decoration-line: underline;
  }
@@ -3035,8 +3008,6 @@ body {
  height: 100vh;
  margin: 0;
  overflow: hidden;
-  grid-template-columns: auto 1fr auto;
-  grid-template-rows: auto 1fr auto;
  background: var(--bg-color) var(--bg-img);
  color: var(--fg-color);
  min-height: -webkit-fill-available;
@@ -3046,87 +3017,6 @@ body {
  font-family: Arial, sans-serif;
 }

-/**
-  +------------------+------------------+------------------+
-  |                                                        |
-  |  .comfyui-body-                                        |
-  |       top                                              |
-  | (spans all cols)                                       |
-  |                                                        |
-  +------------------+------------------+------------------+
-  |                  |                  |                  |
-  | .comfyui-body-   |   #graph-canvas  | .comfyui-body-   |
-  |      left        |                  |      right       |
-  |                  |                  |                  |
-  |                  |                  |                  |
-  +------------------+------------------+------------------+
-  |                                                        |
-  |  .comfyui-body-                                        |
-  |      bottom                                            |
-  | (spans all cols)                                       |
-  |                                                        |
-  +------------------+------------------+------------------+
-*/
-
-.comfyui-body-top {
-  order: -5;
-  /* Span across all columns */
-  grid-column: 1/-1;
-  /* Position at the first row */
-  grid-row: 1;
-  /* Top menu bar dropdown needs to be above of graph canvas splitter overlay which is z-index: 999 */
-  /* Top menu bar z-index needs to be higher than bottom menu bar z-index as by default
-  pysssss's image feed is located at body-bottom, and it can overlap with the queue button, which
-  is located in body-top. */
-  z-index: 1001;
-  display: flex;
-  flex-direction: column;
-}
-
-.comfyui-body-left {
-  order: -4;
-  /* Position in the first column */
-  grid-column: 1;
-  /* Position below the top element */
-  grid-row: 2;
-  z-index: 10;
-  display: flex;
-}
-
-.graph-canvas-container {
-  width: 100%;
-  height: 100%;
-  order: -3;
-  grid-column: 2;
-  grid-row: 2;
-  position: relative;
-  overflow: hidden;
-}
-
-#graph-canvas {
-  width: 100%;
-  height: 100%;
-  touch-action: none;
-}
-
-.comfyui-body-right {
-  order: -2;
-  z-index: 10;
-  grid-column: 3;
-  grid-row: 2;
-}
-
-.comfyui-body-bottom {
-  order: 4;
-  /* Span across all columns */
-  grid-column: 1/-1;
-  grid-row: 3;
-  /* Bottom menu bar dropdown needs to be above of graph canvas splitter overlay which is z-index: 999 */
-  z-index: 1000;
-  display: flex;
-  flex-direction: column;
-}
-
 .comfy-multiline-input {
  background-color: var(--comfy-input-bg);
  color: var(--input-text);
@@ -3541,84 +3431,6 @@ dialog::backdrop {
  justify-content: center;
 }

-#comfy-settings-dialog {
-  padding: 0;
-  width: 41rem;
-}
-
-#comfy-settings-dialog tr > td:first-child {
-  text-align: right;
-}
-
-#comfy-settings-dialog tbody button,
-#comfy-settings-dialog table > button {
-  background-color: var(--bg-color);
-  border: 1px var(--border-color) solid;
-  border-radius: 0;
-  color: var(--input-text);
-  font-size: 1rem;
-  padding: 0.5rem;
-}
-
-#comfy-settings-dialog button:hover {
-  background-color: var(--tr-odd-bg-color);
-}
-
-/* General CSS for tables */
-
-.comfy-table {
-  border-collapse: collapse;
-  color: var(--input-text);
-  font-family: Arial, sans-serif;
-  width: 100%;
-}
-
-.comfy-table caption {
-  position: sticky;
-  top: 0;
-  background-color: var(--bg-color);
-  color: var(--input-text);
-  font-size: 1rem;
-  font-weight: bold;
-  padding: 8px;
-  text-align: center;
-  border-bottom: 1px solid var(--border-color);
-}
-
-.comfy-table caption .comfy-btn {
-  position: absolute;
-  top: -2px;
-  right: 0;
-  bottom: 0;
-  cursor: pointer;
-  border: none;
-  height: 100%;
-  border-radius: 0;
-  aspect-ratio: 1/1;
-  -webkit-user-select: none;
-     -moz-user-select: none;
-          user-select: none;
-  font-size: 20px;
-}
-
-.comfy-table caption .comfy-btn:focus {
-  outline: none;
-}
-
-.comfy-table tr:nth-child(even) {
-  background-color: var(--tr-even-bg-color);
-}
-
-.comfy-table tr:nth-child(odd) {
-  background-color: var(--tr-odd-bg-color);
-}
-
-.comfy-table td,
-.comfy-table th {
-  border: 1px solid var(--border-color);
-  padding: 8px;
-}
-
 /* Context menu */

 .litegraph .dialog {
@@ -3718,24 +3530,6 @@ dialog::backdrop {
  will-change: transform;
 }

-@media only screen and (max-width: 450px) {
-  #comfy-settings-dialog .comfy-table tbody {
-    display: grid;
-  }
-  #comfy-settings-dialog .comfy-table tr {
-    display: grid;
-  }
-  #comfy-settings-dialog tr > td:first-child {
-    text-align: center;
-    border-bottom: none;
-    padding-bottom: 0;
-  }
-  #comfy-settings-dialog tr > td:not(:first-child) {
-    text-align: center;
-    border-top: none;
-  }
-}
-
 audio.comfy-audio.empty-audio-widget {
  display: none;
 }
@@ -3746,7 +3540,6 @@ audio.comfy-audio.empty-audio-widget {
  left: 0;
  width: 100%;
  height: 100%;
-  pointer-events: none;
 }

 /* Set auto complete panel's width as it is not accessible within vue-root */
@@ -3926,7 +3719,7 @@ audio.comfy-audio.empty-audio-widget {
    padding-top: 0px
 }

-.prompt-dialog-content[data-v-3df70997] {
+.prompt-dialog-content[data-v-4f1e3bbe] {
  white-space: pre-wrap;
 }

@@ -3944,17 +3737,17 @@ audio.comfy-audio.empty-audio-widget {
  margin-bottom: 1rem;
 }

-.comfy-error-report[data-v-3faf7785] {
+.comfy-error-report[data-v-e5000be2] {
  display: flex;
  flex-direction: column;
  gap: 1rem;
 }
-.action-container[data-v-3faf7785] {
+.action-container[data-v-e5000be2] {
  display: flex;
  gap: 1rem;
  justify-content: flex-end;
 }
-.wrapper-pre[data-v-3faf7785] {
+.wrapper-pre[data-v-e5000be2] {
  white-space: pre-wrap;
  word-wrap: break-word;
 }
@@ -4023,13 +3816,13 @@ audio.comfy-audio.empty-audio-widget {
  padding: 0px;
 }

-.form-input[data-v-1451da7b] .input-slider .p-inputnumber input,
-.form-input[data-v-1451da7b] .input-slider .slider-part {
+.form-input[data-v-a29c257f] .input-slider .p-inputnumber input,
+.form-input[data-v-a29c257f] .input-slider .slider-part {

    width: 5rem
 }
-.form-input[data-v-1451da7b] .p-inputtext,
-.form-input[data-v-1451da7b] .p-select {
+.form-input[data-v-a29c257f] .p-inputtext,
+.form-input[data-v-a29c257f] .p-select {

    width: 11rem
 }
@@ -4319,26 +4112,26 @@ audio.comfy-audio.empty-audio-widget {
    position: relative;
 }

-[data-v-250ab9af] .p-terminal .xterm {
+[data-v-873a313f] .p-terminal .xterm {
  overflow-x: auto;
 }
-[data-v-250ab9af] .p-terminal .xterm-screen {
+[data-v-873a313f] .p-terminal .xterm-screen {
  background-color: black;
  overflow-y: hidden;
 }

-[data-v-90a7f075] .p-terminal .xterm {
+[data-v-14fef2e4] .p-terminal .xterm {
  overflow-x: auto;
 }
-[data-v-90a7f075] .p-terminal .xterm-screen {
+[data-v-14fef2e4] .p-terminal .xterm-screen {
  background-color: black;
  overflow-y: hidden;
 }

-[data-v-03daf1c8] .p-terminal .xterm {
+[data-v-cf0c7d52] .p-terminal .xterm {
  overflow-x: auto;
 }
-[data-v-03daf1c8] .p-terminal .xterm-screen {
+[data-v-cf0c7d52] .p-terminal .xterm-screen {
  background-color: black;
  overflow-y: hidden;
 }
@@ -4650,28 +4443,28 @@ audio.comfy-audio.empty-audio-widget {
  box-sizing: border-box;
 }

-.tree-node[data-v-654109c7] {
+.tree-node[data-v-a945b5a8] {
  width: 100%;
  display: flex;
  align-items: center;
  justify-content: space-between;
 }
-.leaf-count-badge[data-v-654109c7] {
+.leaf-count-badge[data-v-a945b5a8] {
  margin-left: 0.5rem;
 }
-.node-content[data-v-654109c7] {
+.node-content[data-v-a945b5a8] {
  display: flex;
  align-items: center;
  flex-grow: 1;
 }
-.leaf-label[data-v-654109c7] {
+.leaf-label[data-v-a945b5a8] {
  margin-left: 0.5rem;
 }
-[data-v-654109c7] .editable-text span {
+[data-v-a945b5a8] .editable-text span {
  word-break: break-all;
 }

-[data-v-976a6d58] .tree-explorer-node-label {
+[data-v-e3a237e6] .tree-explorer-node-label {
  width: 100%;
  display: flex;
  align-items: center;
@@ -4684,10 +4477,10 @@ audio.comfy-audio.empty-audio-widget {
 * By setting the position to relative on the parent and using an absolutely positioned pseudo-element,
 * we can create a visual indicator for the drop target without affecting the layout of other elements.
 */
-[data-v-976a6d58] .p-tree-node-content:has(.tree-folder) {
+[data-v-e3a237e6] .p-tree-node-content:has(.tree-folder) {
  position: relative;
 }
-[data-v-976a6d58] .p-tree-node-content:has(.tree-folder.can-drop)::after {
+[data-v-e3a237e6] .p-tree-node-content:has(.tree-folder.can-drop)::after {
  content: '';
  position: absolute;
  top: 0;
@@ -4790,7 +4583,7 @@ audio.comfy-audio.empty-audio-widget {
  vertical-align: top;
 }

-[data-v-0bb2ac55] .pi-fake-spacer {
+[data-v-3be51840] .pi-fake-spacer {
  height: 1px;
  width: 16px;
 }
--- a/web/assets/index-CgMyWf7n.js
+++ b/web/assets/index-CgMyWf7n.js
@@ -1,7 +1,7 @@
 var __defProp = Object.defineProperty;
 var __name = (target, value) => __defProp(target, "name", { value, configurable: true });
-import { bA as BaseStyle, bB as script$s, bZ as script$t, o as openBlock, f as createElementBlock, as as mergeProps, m as createBaseVNode, E as toDisplayString, bS as Ripple, r as resolveDirective, i as withDirectives, y as createBlock, C as resolveDynamicComponent, bi as script$u, bK as resolveComponent, ai as normalizeClass, co as createSlots, z as withCtx, aU as script$v, cf as script$w, F as Fragment, D as renderList, a7 as createTextVNode, c9 as setAttribute, cv as normalizeProps, A as renderSlot, B as createCommentVNode, b_ as script$x, ce as equals, cA as script$y, br as script$z, cE as getFirstFocusableElement, c8 as OverlayEventBus, cU as getVNodeProp, cc as resolveFieldData, ds as invokeElementMethod, bP as getAttribute, cV as getNextElementSibling, c3 as getOuterWidth, cW as getPreviousElementSibling, l as script$A, bR as script$B, bU as script$C, bJ as script$E, cd as isNotEmpty, ar as withModifiers, d5 as getOuterHeight, bT as UniqueComponentId, cY as _default, bC as ZIndex, bE as focus, b$ as addStyle, c4 as absolutePosition, c0 as ConnectedOverlayScrollHandler, c1 as isTouchDevice, dt as FilterOperator, bI as script$F, cs as script$G, bH as FocusTrap, k as createVNode, bL as Transition, bf as withKeys, c6 as getIndex, cu as script$H, cX as isClickable, cZ as clearSelection, ca as localeComparator, cn as sort, cG as FilterService, dl as FilterMatchMode, bO as findSingle, cJ as findIndexInList, c5 as find, du as exportCSV, cR as getOffset, c_ as isRTL, dv as getHiddenElementOuterWidth, dw as getHiddenElementOuterHeight, dx as reorderArray, bW as removeClass, bD as addClass, ci as isEmpty, cH as script$I, ck as script$J } from "./index-DqqhYDnY.js";
-import { s as script$D } from "./index-DXE47DZl.js";
+import { bG as BaseStyle, bH as script$s, bX as script$t, o as openBlock, f as createElementBlock, at as mergeProps, m as createBaseVNode, E as toDisplayString, bO as Ripple, r as resolveDirective, i as withDirectives, y as createBlock, C as resolveDynamicComponent, bm as script$u, bR as resolveComponent, aj as normalizeClass, cp as createSlots, z as withCtx, aY as script$v, cf as script$w, F as Fragment, D as renderList, a8 as createTextVNode, c8 as setAttribute, cx as normalizeProps, A as renderSlot, B as createCommentVNode, bY as script$x, ce as equals, cF as script$y, bv as script$z, cJ as getFirstFocusableElement, c7 as OverlayEventBus, cZ as getVNodeProp, cc as resolveFieldData, dD as invokeElementMethod, bK as getAttribute, c_ as getNextElementSibling, c2 as getOuterWidth, c$ as getPreviousElementSibling, l as script$A, bN as script$B, bQ as script$C, cl as script$E, cd as isNotEmpty, as as withModifiers, da as getOuterHeight, bP as UniqueComponentId, d1 as _default, bZ as ZIndex, bL as focus, b_ as addStyle, c3 as absolutePosition, b$ as ConnectedOverlayScrollHandler, c0 as isTouchDevice, dE as FilterOperator, ca as script$F, ct as script$G, cB as FocusTrap, k as createVNode, bI as Transition, bj as withKeys, c5 as getIndex, cv as script$H, d0 as isClickable, d2 as clearSelection, c9 as localeComparator, co as sort, cL as FilterService, dx as FilterMatchMode, bJ as findSingle, cO as findIndexInList, c4 as find, dF as exportCSV, cW as getOffset, d3 as isRTL, dG as getHiddenElementOuterWidth, dH as getHiddenElementOuterHeight, dI as reorderArray, bT as removeClass, bU as addClass, ci as isEmpty, cM as script$I, ck as script$J } from "./index-Bv0b06LE.js";
+import { s as script$D } from "./index-Dzu9WL4p.js";
 var ColumnStyle = BaseStyle.extend({
  name: "column"
 });
@@ -8787,4 +8787,4 @@ export {
  script as h,
  script$l as s
 };
-//# sourceMappingURL=index-BapOFhAR.js.map
+//# sourceMappingURL=index-CgMyWf7n.js.map
--- a/web/assets/index-Dzu9WL4p.js
+++ b/web/assets/index-Dzu9WL4p.js
@@ -1,6 +1,6 @@
 var __defProp = Object.defineProperty;
 var __name = (target, value) => __defProp(target, "name", { value, configurable: true });
-import { bZ as script$1, o as openBlock, f as createElementBlock, as as mergeProps, m as createBaseVNode } from "./index-DqqhYDnY.js";
+import { bX as script$1, o as openBlock, f as createElementBlock, at as mergeProps, m as createBaseVNode } from "./index-Bv0b06LE.js";
 var script = {
  name: "BarsIcon",
  "extends": script$1
@@ -24,4 +24,4 @@ script.render = render;
 export {
  script as s
 };
-//# sourceMappingURL=index-DXE47DZl.js.map
+//# sourceMappingURL=index-Dzu9WL4p.js.map
--- a/web/assets/index-SeIZOWJp.js
+++ b/web/assets/index-SeIZOWJp.js
@@ -1,6 +1,6 @@
 var __defProp = Object.defineProperty;
 var __name = (target, value2) => __defProp(target, "name", { value: value2, configurable: true });
-import { bA as BaseStyle, bB as script$6, o as openBlock, f as createElementBlock, as as mergeProps, cJ as findIndexInList, c5 as find, bK as resolveComponent, y as createBlock, C as resolveDynamicComponent, z as withCtx, m as createBaseVNode, E as toDisplayString, A as renderSlot, B as createCommentVNode, ai as normalizeClass, bO as findSingle, F as Fragment, bL as Transition, i as withDirectives, v as vShow, bT as UniqueComponentId } from "./index-DqqhYDnY.js";
+import { bG as BaseStyle, bH as script$6, o as openBlock, f as createElementBlock, at as mergeProps, cO as findIndexInList, c4 as find, bR as resolveComponent, y as createBlock, C as resolveDynamicComponent, z as withCtx, m as createBaseVNode, E as toDisplayString, A as renderSlot, B as createCommentVNode, aj as normalizeClass, bJ as findSingle, F as Fragment, bI as Transition, i as withDirectives, v as vShow, bP as UniqueComponentId } from "./index-Bv0b06LE.js";
 var classes$4 = {
  root: /* @__PURE__ */ __name(function root(_ref) {
    var instance = _ref.instance;
@@ -536,4 +536,4 @@ export {
  script as d,
  script$4 as s
 };
-//# sourceMappingURL=index-BNlqgrYT.js.map
+//# sourceMappingURL=index-SeIZOWJp.js.map
--- a/web/assets/keybindingService-DyjX-nxF.js
+++ b/web/assets/keybindingService-DyjX-nxF.js
@@ -1,6 +1,6 @@
 var __defProp = Object.defineProperty;
 var __name = (target, value) => __defProp(target, "name", { value, configurable: true });
-import { an as useKeybindingStore, L as useCommandStore, a as useSettingStore, dp as KeyComboImpl, dq as KeybindingImpl } from "./index-DqqhYDnY.js";
+import { ao as useKeybindingStore, J as useCommandStore, a as useSettingStore, dA as KeyComboImpl, dB as KeybindingImpl } from "./index-Bv0b06LE.js";
 const CORE_KEYBINDINGS = [
  {
    combo: {
@@ -186,7 +186,7 @@ const useKeybindingService = /* @__PURE__ */ __name(() => {
      return;
    }
    const target = event.composedPath()[0];
-    if (!keyCombo.hasModifier && (target.tagName === "TEXTAREA" || target.tagName === "INPUT" || target.tagName === "SPAN" && target.classList.contains("property_value"))) {
+    if (keyCombo.isReservedByTextInput && (target.tagName === "TEXTAREA" || target.tagName === "INPUT" || target.tagName === "SPAN" && target.classList.contains("property_value"))) {
      return;
    }
    const keybinding = keybindingStore.getKeybinding(keyCombo);
@@ -247,4 +247,4 @@ const useKeybindingService = /* @__PURE__ */ __name(() => {
 export {
  useKeybindingService as u
 };
-//# sourceMappingURL=keybindingService-DEgCutrm.js.map
+//# sourceMappingURL=keybindingService-DyjX-nxF.js.map
--- a/web/assets/serverConfigStore-D2Vr0L0h.js
+++ b/web/assets/serverConfigStore-D2Vr0L0h.js
@@ -1,6 +1,6 @@
 var __defProp = Object.defineProperty;
 var __name = (target, value) => __defProp(target, "name", { value, configurable: true });
-import { I as defineStore, U as ref, c as computed } from "./index-DqqhYDnY.js";
+import { a1 as defineStore, T as ref, c as computed } from "./index-Bv0b06LE.js";
 const useServerConfigStore = defineStore("serverConfig", () => {
  const serverConfigById = ref({});
  const serverConfigs = computed(() => {
@@ -87,4 +87,4 @@ const useServerConfigStore = defineStore("serverConfig", () => {
 export {
  useServerConfigStore as u
 };
-//# sourceMappingURL=serverConfigStore-Kb5DJVFt.js.map
+//# sourceMappingURL=serverConfigStore-D2Vr0L0h.js.map
--- a/web/index.html
+++ b/web/index.html
@@ -6,8 +6,8 @@
    <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=no">
    <link rel="stylesheet" type="text/css" href="user.css" />
    <link rel="stylesheet" type="text/css" href="materialdesignicons.min.css" />
-    <script type="module" crossorigin src="./assets/index-DqqhYDnY.js"></script>
-    <link rel="stylesheet" crossorigin href="./assets/index-C1Hb_Yo9.css">
+    <script type="module" crossorigin src="./assets/index-Bv0b06LE.js"></script>
+    <link rel="stylesheet" crossorigin href="./assets/index-CBxvvAzM.css">
  </head>
  <body class="litegraph grid">
    <div id="vue-app"></div>
--- a/web/scripts/domWidget.js
+++ b/web/scripts/domWidget.js
@@ -0,0 +1,2 @@
+// Shim for scripts/domWidget.ts
+export const DOMWidgetImpl = window.comfyAPI.domWidget.DOMWidgetImpl;
--- a/web/templates/image2image.json
+++ b/web/templates/image2image.json
@@ -330,7 +330,7 @@
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
-        "v1-5-pruned-emaonly.safetensors"
+        "v1-5-pruned-emaonly-fp16.safetensors"
      ]
    }
  ],
@@ -440,8 +440,8 @@
  "extra": {},
  "version": 0.4,
  "models": [{
-    "name": "v1-5-pruned-emaonly.safetensors",
-    "url": "https://huggingface.co/Comfy-Org/stable-diffusion-v1-5-archive/resolve/main/v1-5-pruned-emaonly.safetensors?download=true",
+    "name": "v1-5-pruned-emaonly-fp16.safetensors",
+    "url": "https://huggingface.co/Comfy-Org/stable-diffusion-v1-5-archive/resolve/main/v1-5-pruned-emaonly-fp16.safetensors?download=true",
    "directory": "checkpoints"
  }]
 }
Author	SHA1	Message	Date
comfyanonymous	b07f116dea	Bump ComfyUI version to v0.3.18	2025-02-26 21:19:14 -05:00
comfyanonymous	714f728820	Add to README that the Wan model is supported.	2025-02-26 20:48:50 -05:00
comfyanonymous	92d8d15300	Readme changes. Instructions shouldn't recommend to run comfyui with --listen	2025-02-26 20:47:08 -05:00
BiologicalExplosion	89253e9fe5	Support Cambricon MLU (#6964 ) Co-authored-by: huzhan <huzhan@cambricon.com>	2025-02-26 20:45:13 -05:00
comfyanonymous	3ea3bc8546	Fix wan issues when prompt length is long.	2025-02-26 20:34:02 -05:00
comfyanonymous	8e69e2ddfd	Bump ComfyUI version to v0.3.17	2025-02-26 17:59:10 -05:00
comfyanonymous	0270a0b41c	Reduce artifacts on Wan by doing the patch embedding in fp32.	2025-02-26 16:59:26 -05:00
comfyanonymous	26c7baf789	Bump ComfyUI version to v0.3.16	2025-02-26 14:30:32 -05:00
comfyanonymous	c37f15f98e	Add fast preview support for Wan models.	2025-02-26 08:56:23 -05:00
comfyanonymous	4bca7367f3	Don't try to use clip_fea on t2v model.	2025-02-26 08:38:09 -05:00
comfyanonymous	b6fefe686b	Better wan memory estimation.	2025-02-26 07:51:22 -05:00
comfyanonymous	fa62287f1f	More code reuse in wan. Fix bug when changing the compute dtype on wan.	2025-02-26 05:22:29 -05:00
comfyanonymous	0844998db3	Slightly better wan i2v mask implementation.	2025-02-26 03:49:50 -05:00
comfyanonymous	4ced06b879	WIP support for Wan I2V model.	2025-02-26 01:49:43 -05:00
comfyanonymous	cb06e9669b	Wan seems to work with fp16.	2025-02-25 21:37:12 -05:00
comfyanonymous	0c32f82298	Fix missing frames in SaveWEBM node.	2025-02-25 20:21:03 -05:00
Yoland Yan	189da3726d	Update README.md (#6960 )	2025-02-25 17:17:18 -08:00
comfyanonymous	9a66bb972d	Make wan work with all latent resolutions. Cleanup some code.	2025-02-25 19:56:04 -05:00
comfyanonymous	ea0f939df3	Fix issue with wan and other attention implementations.	2025-02-25 19:13:39 -05:00
comfyanonymous	f37551c1d2	Change wan rope implementation to the flux one. Should be more compatible.	2025-02-25 19:11:14 -05:00
comfyanonymous	63023011b9	WIP support for Wan t2v model.	2025-02-25 17:20:35 -05:00
comfyanonymous	f40076096e	Cleanup some lumina te code.	2025-02-25 04:10:26 -05:00
comfyanonymous	96d891cb94	Speedup on some models by not upcasting bfloat16 to float32 on mac.	2025-02-24 05:41:32 -05:00
Robin Huang	4553891bbd	Update installation documentation to include desktop + cli. (#6899 ) * Update installation documentation. * Add portable to description. * Move cli further down.	2025-02-23 19:13:39 -05:00
comfyanonymous	ace899e71a	Prioritize fp16 compute when using allow_fp16_accumulation	2025-02-23 04:45:54 -05:00
comfyanonymous	aff16532d4	Remove some useless code.	2025-02-22 04:45:14 -05:00
comfyanonymous	b50ab153f9	Bump ComfyUI version to v0.3.15	2025-02-21 20:28:28 -05:00
comfyanonymous	072db3bea6	Assume the mac black image bug won't be fixed before v16.	2025-02-21 20:24:07 -05:00
comfyanonymous	a6deca6d9a	Latest mac still has the black image bug.	2025-02-21 20:14:30 -05:00
comfyanonymous	41c30e92e7	Let all model memory be offloaded on nvidia.	2025-02-21 06:32:21 -05:00
filtered	f579a740dd	Update frontend release schedule in README. (#6908 ) Changes release schedule from weekly to fortnightly.	2025-02-21 05:58:12 -05:00
Robin Huang	d37272532c	Add discord channel to support section. (#6900 )	2025-02-20 18:26:16 -05:00
comfyanonymous	12da6ef581	Apparently directml supports fp16.	2025-02-20 09:30:24 -05:00
Robin Huang	29d4384a75	Normalize extra_model_config.yaml paths to prevent duplicates. (#6885 ) * Normalize extra_model_config.yaml paths before adding. * Fix tests. * Fix tests.	2025-02-20 07:09:45 -05:00
Silver	c5be423d6b	Fix link pointing to non-exisiting docs (#6891 ) * Fix link pointing to non-exisiting docs The current link is pointing to a path that does not exist any longer. I changed it to point to the currect correct path for custom nodes datatypes. * Update node_typing.py	2025-02-20 07:07:07 -05:00
Dr.Lt.Data	b4d3652d88	fixed: crash caused by outdated incompatible aiohttp dependency (#6841 ) https://github.com/comfyanonymous/ComfyUI/issues/6038#issuecomment-2661776795 https://github.com/comfyanonymous/ComfyUI/issues/5814#issue-2700816845	2025-02-19 07:15:36 -05:00
maedtb	5715be2ca9	Fix Hunyuan unet config detection for some models. (#6877 ) The change to support 32 channel hunyuan models is missing the `key_prefix` on the key. This addresses a complain in the comments of `acc152b674`.	2025-02-19 07:14:45 -05:00
comfyanonymous	0d4d9222c6	Add early experimental SaveWEBM node to save .webm files. The frontend part isn't done yet so there is no video preview on the node or dragging the webm on the interface to load the workflow yet. This uses a new dependency: PyAV.	2025-02-19 07:12:15 -05:00
bymyself	afc85cdeb6	Add Load Image Output node (#6790 ) * add LoadImageOutput node * add route for input/output/temp files * update node_typing.py * use literal type for image_folder field * mark node as beta	2025-02-18 17:53:01 -05:00
Jukka Seppänen	acc152b674	Support loading and using SkyReels-V1-Hunyuan-I2V (#6862 ) * Support SkyReels-V1-Hunyuan-I2V * VAE scaling * Fix T2V oops * Proper latent scaling	2025-02-18 17:06:54 -05:00
comfyanonymous	b07258cef2	Fix typo. Let me know if this slows things down on 2000 series and below.	2025-02-18 07:28:33 -05:00
comfyanonymous	31e54b7052	Improve AMD arch detection.	2025-02-17 04:53:40 -05:00
comfyanonymous	8c0bae50c3	bf16 manual cast works on old AMD.	2025-02-17 04:42:40 -05:00
comfyanonymous	530412cb9d	Refactor torch version checks to be more future proof.	2025-02-17 04:36:45 -05:00
Zhong-Yu Li	61c8c70c6e	support system prompt and cfg renorm in Lumina2 (#6795 ) * support system prompt and cfg renorm in Lumina2 * fix issues with the ruff style check	2025-02-16 18:15:43 -05:00
Comfy Org PR Bot	d0399f4343	Update frontend to v1.9.18 (#6828 ) Co-authored-by: huchenlei <20929282+huchenlei@users.noreply.github.com>	2025-02-16 11:45:47 -05:00
comfyanonymous	e2919d38b4	Disable bf16 on AMD GPUs that don't support it.	2025-02-16 05:46:10 -05:00
Terry Jia	93c8607d51	remove light_intensity and fov from load3d (#6742 )	2025-02-15 15:34:36 -05:00
Comfy Org PR Bot	b3d6ae15b3	Update frontend to v1.9.17 (#6814 ) Co-authored-by: huchenlei <20929282+huchenlei@users.noreply.github.com>	2025-02-15 04:32:47 -05:00
comfyanonymous	2e21122aab	Add a node to set the model compute dtype for debugging.	2025-02-15 04:15:37 -05:00
comfyanonymous	1cd6cd6080	Disable pytorch attention in VAE for AMD.	2025-02-14 05:42:14 -05:00
comfyanonymous	d7b4bf21a2	Auto enable mem efficient attention on gfx1100 on pytorch nightly 2.7 I'm not not sure which arches are supported yet. If you see improvements in memory usage while using --use-pytorch-cross-attention on your AMD GPU let me know and I will add it to the list.	2025-02-14 04:18:14 -05:00
Robin Huang	042a905c37	Open yaml files with utf-8 encoding for extra_model_paths.yaml (#6807 ) * Using utf-8 encoding for yaml files. * Fix test assertion.	2025-02-13 20:39:04 -05:00
comfyanonymous	019c7029ea	Add a way to set a different compute dtype for the model at runtime. Currently only works for diffusion models.	2025-02-13 20:34:03 -05:00
comfyanonymous	8773ccf74d	Better memory estimation for ROCm that support mem efficient attention. There is no way to check if the card actually supports it so it assumes that it does if you use --use-pytorch-cross-attention with yours.	2025-02-13 08:32:36 -05:00
comfyanonymous	1d5d6586f3	Fix ruff.	2025-02-12 06:49:16 -05:00
zhoufan2956	35740259de	mix_ascend_bf16_infer_err (#6794 )	2025-02-12 06:48:11 -05:00
comfyanonymous	ab888e1e0b	Add add_weight_wrapper function to model patcher. Functions can now easily be added to wrap/modify model weights.	2025-02-12 05:55:35 -05:00
comfyanonymous	d9f0fcdb0c	Cleanup.	2025-02-11 17:17:03 -05:00
HishamC	b124256817	Fix for running via DirectML (#6542 ) * Fix for running via DirectML Fix DirectML empty image generation issue with Flux1. add CPU fallback for unsupported path. Verified the model works on AMD GPUs * fix formating * update casual mask calculation	2025-02-11 17:11:32 -05:00
comfyanonymous	af4b7c91be	Make --force-fp16 actually force the diffusion model to be fp16.	2025-02-11 08:33:09 -05:00
bananasss00	e57d2282d1	Fix incorrect Content-Type for WebP images (#6752 )	2025-02-11 04:48:35 -05:00
comfyanonymous	4027466c80	Make lumina model work with any latent resolution.	2025-02-10 00:24:20 -05:00