Add fast previews for hunyuan video.

Small optimizations.
Add .github folder to maintainer owner list (#6027 )
2024-12-18 18:24:23 -05:00 · 2024-12-18 18:23:28 -05:00 · 2024-12-18 15:06:53 -05:00 · 2024-12-18 15:06:20 -05:00 · 2024-12-18 06:23:54 -05:00 · 2024-12-18 03:08:28 -05:00
174 changed files with 482111 additions and 30351 deletions
--- a/.ci/update_windows/update.py
+++ b/.ci/update_windows/update.py
@@ -33,12 +33,12 @@ def pull(repo, remote_name='origin', branch='master'):

                user = repo.default_signature
                tree = repo.index.write_tree()
-                commit = repo.create_commit('HEAD',
-                                            user,
-                                            user,
-                                            'Merge!',
-                                            tree,
-                                            [repo.head.target, remote_master_id])
+                repo.create_commit('HEAD',
+                                    user,
+                                    user,
+                                    'Merge!',
+                                    tree,
+                                    [repo.head.target, remote_master_id])
                # We need to do this or git CLI will think we are still merging.
                repo.state_cleanup()
            else:
--- a/.github/workflows/pylint.yml
+++ b/.github/workflows/pylint.yml
@@ -3,8 +3,8 @@ name: Python Linting
 on: [push, pull_request]

 jobs:
-  pylint:
-    name: Run Pylint
+  ruff:
+    name: Run Ruff
    runs-on: ubuntu-latest

    steps:
@@ -16,8 +16,8 @@ jobs:
      with:
        python-version: 3.x

-    - name: Install Pylint
-      run: pip install pylint
+    - name: Install Ruff
+      run: pip install ruff

-    - name: Run Pylint
-      run: pylint --rcfile=.pylintrc $(find . -type f -name "*.py")
+    - name: Run Ruff
+      run: ruff check .
--- a/.github/workflows/test-ci.yml
+++ b/.github/workflows/test-ci.yml
@@ -20,7 +20,8 @@ jobs:
    strategy:
      fail-fast: false
      matrix:
-        os: [macos, linux, windows]
+        # os: [macos, linux, windows]
+        os: [macos, linux]
        python_version: ["3.9", "3.10", "3.11", "3.12"]
        cuda_version: ["12.1"]
        torch_version: ["stable"]
@@ -31,9 +32,9 @@ jobs:
          - os: linux
            runner_label: [self-hosted, Linux]
            flags: ""
-          - os: windows
-            runner_label: [self-hosted, Windows]
-            flags: ""
+          # - os: windows
+          #   runner_label: [self-hosted, Windows]
+          #   flags: ""
    runs-on: ${{ matrix.runner_label }}
    steps:
      - name: Test Workflows
@@ -45,28 +46,28 @@ jobs:
          google_credentials: ${{ secrets.GCS_SERVICE_ACCOUNT_JSON }}
          comfyui_flags: ${{ matrix.flags }}

-  test-win-nightly:
-    strategy:
-      fail-fast: true
-      matrix:
-        os: [windows]
-        python_version: ["3.9", "3.10", "3.11", "3.12"]
-        cuda_version: ["12.1"]
-        torch_version: ["nightly"]
-        include:
-          - os: windows
-            runner_label: [self-hosted, Windows]
-            flags: ""
-    runs-on: ${{ matrix.runner_label }}
-    steps:
-      - name: Test Workflows
-        uses: comfy-org/comfy-action@main
-        with:
-          os: ${{ matrix.os }}
-          python_version: ${{ matrix.python_version }}
-          torch_version: ${{ matrix.torch_version }}
-          google_credentials: ${{ secrets.GCS_SERVICE_ACCOUNT_JSON }}
-          comfyui_flags: ${{ matrix.flags }}
+  # test-win-nightly:
+  #   strategy:
+  #     fail-fast: true
+  #     matrix:
+  #       os: [windows]
+  #       python_version: ["3.9", "3.10", "3.11", "3.12"]
+  #       cuda_version: ["12.1"]
+  #       torch_version: ["nightly"]
+  #       include:
+  #         - os: windows
+  #           runner_label: [self-hosted, Windows]
+  #           flags: ""
+  #   runs-on: ${{ matrix.runner_label }}
+  #   steps:
+  #     - name: Test Workflows
+  #       uses: comfy-org/comfy-action@main
+  #       with:
+  #         os: ${{ matrix.os }}
+  #         python_version: ${{ matrix.python_version }}
+  #         torch_version: ${{ matrix.torch_version }}
+  #         google_credentials: ${{ secrets.GCS_SERVICE_ACCOUNT_JSON }}
+  #         comfyui_flags: ${{ matrix.flags }}

  test-unix-nightly:
    strategy:
--- a/.github/workflows/test-launch.yml
+++ b/.github/workflows/test-launch.yml
@@ -28,7 +28,7 @@ jobs:
    - name: Start ComfyUI server
      run: |
        python main.py --cpu 2>&1 | tee console_output.log &
-        wait-for-it --service 127.0.0.1:8188 -t 600
+        wait-for-it --service 127.0.0.1:8188 -t 30
      working-directory: ComfyUI
    - name: Check for unhandled exceptions in server log
      run: |
--- a/.pylintrc
+++ b/.pylintrc
@@ -1,3 +0,0 @@
-[MESSAGES CONTROL]
-disable=all
-enable=eval-used
--- a/24
+++ b/24
@@ -1 +1,23 @@
-*       @comfyanonymous
+# Admins
+* @comfyanonymous
+
+# Note: Github teams syntax cannot be used here as the repo is not owned by Comfy-Org.
+# Inlined the team members for now.
+
+# Maintainers
+*.md @yoland68 @robinjhuang @huchenlei @webfiltered @pythongosssss @ltdrdata @Kosinkadink
+/tests/ @yoland68 @robinjhuang @huchenlei @webfiltered @pythongosssss @ltdrdata @Kosinkadink
+/tests-unit/ @yoland68 @robinjhuang @huchenlei @webfiltered @pythongosssss @ltdrdata @Kosinkadink
+/notebooks/ @yoland68 @robinjhuang @huchenlei @webfiltered @pythongosssss @ltdrdata @Kosinkadink
+/script_examples/ @yoland68 @robinjhuang @huchenlei @webfiltered @pythongosssss @ltdrdata @Kosinkadink
+/.github/ @yoland68 @robinjhuang @huchenlei @webfiltered @pythongosssss @ltdrdata @Kosinkadink
+
+# Python web server
+/api_server/ @yoland68 @robinjhuang @huchenlei @webfiltered @pythongosssss @ltdrdata
+/app/ @yoland68 @robinjhuang @huchenlei @webfiltered @pythongosssss @ltdrdata
+
+# Frontend assets
+/web/ @huchenlei @webfiltered @pythongosssss
+
+# Extra nodes
+/comfy_extras/ @yoland68 @robinjhuang @huchenlei @pythongosssss @ltdrdata @Kosinkadink
--- a/README.md
+++ b/README.md
@@ -101,6 +101,8 @@ Workflow examples can be found on the [Examples page](https://comfyanonymous.git
 | `Q`                                 | Toggle visibility of the queue                                                                                     |
 | `H`                                  | Toggle visibility of history                                                                                       |
 | `R`                                  | Refresh graph                                                                                                      |
+| `F`                                  | Show/Hide menu                                                                                                      |
+| `.`                                  | Fit view to selection (Whole graph when nothing is selected)                                                        |
 | Double-Click LMB                   | Open node quick search palette                                                                                     |
 | `Shift` + Drag                       | Move multiple wires at once                                                                                        |
 | `Ctrl` + `Alt` + LMB                   | Disconnect all wires from clicked slot                                                                             |
@@ -145,7 +147,7 @@ AMD users can install rocm and pytorch with pip if you don't have it already ins

 This is the command to install the nightly with ROCm 6.2 which might have some performance improvements:

-```pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.2```
+```pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.2.4```

 ### NVIDIA

@@ -219,6 +221,8 @@ You can enable experimental memory efficient attention on pytorch 2.5 in ComfyUI

 ```TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 python main.py --use-pytorch-cross-attention```

+You can also try setting this env variable `PYTORCH_TUNABLEOP_ENABLED=1` which might speed things up at the cost of a very slow initial run.
+
 # Notes

 Only parts of the graph that have an output with all the correct inputs will be executed.
--- a/api_server/routes/internal/internal_routes.py
+++ b/api_server/routes/internal/internal_routes.py
@@ -40,7 +40,7 @@ class InternalRoutes:
            return web.json_response("".join([(l["t"] + " - " + l["m"]) for l in app.logger.get_logs()]))

        @self.routes.get('/logs/raw')
-        async def get_logs(request):
+        async def get_raw_logs(request):
            self.terminal_service.update_size()
            return web.json_response({
                "entries": list(app.logger.get_logs()),
--- a/app/model_manager.py
+++ b/app/model_manager.py
@@ -0,0 +1,167 @@
+from __future__ import annotations
+
+import os
+import time
+import logging
+import folder_paths
+import glob
+from aiohttp import web
+from PIL import Image
+from io import BytesIO
+from folder_paths import map_legacy, filter_files_extensions, filter_files_content_types
+
+
+class ModelFileManager:
+    def __init__(self) -> None:
+        self.cache: dict[str, tuple[list[dict], dict[str, float], float]] = {}
+
+    def get_cache(self, key: str, default=None) -> tuple[list[dict], dict[str, float], float] | None:
+        return self.cache.get(key, default)
+
+    def set_cache(self, key: str, value: tuple[list[dict], dict[str, float], float]):
+        self.cache[key] = value
+
+    def clear_cache(self):
+        self.cache.clear()
+
+    def add_routes(self, routes):
+        # NOTE: This is an experiment to replace `/models`
+        @routes.get("/experiment/models")
+        async def get_model_folders(request):
+            model_types = list(folder_paths.folder_names_and_paths.keys())
+            folder_black_list = ["configs", "custom_nodes"]
+            output_folders: list[dict] = []
+            for folder in model_types:
+                if folder in folder_black_list:
+                    continue
+                output_folders.append({"name": folder, "folders": folder_paths.get_folder_paths(folder)})
+            return web.json_response(output_folders)
+
+        # NOTE: This is an experiment to replace `/models/{folder}`
+        @routes.get("/experiment/models/{folder}")
+        async def get_all_models(request):
+            folder = request.match_info.get("folder", None)
+            if not folder in folder_paths.folder_names_and_paths:
+                return web.Response(status=404)
+            files = self.get_model_file_list(folder)
+            return web.json_response(files)
+
+        @routes.get("/experiment/models/preview/{folder}/{path_index}/{filename:.*}")
+        async def get_model_preview(request):
+            folder_name = request.match_info.get("folder", None)
+            path_index = int(request.match_info.get("path_index", None))
+            filename = request.match_info.get("filename", None)
+
+            if not folder_name in folder_paths.folder_names_and_paths:
+                return web.Response(status=404)
+
+            folders = folder_paths.folder_names_and_paths[folder_name]
+            folder = folders[0][path_index]
+            full_filename = os.path.join(folder, filename)
+
+            preview_files = self.get_model_previews(full_filename)
+            default_preview_file = preview_files[0] if len(preview_files) > 0 else None
+            if default_preview_file is None or not os.path.isfile(default_preview_file):
+                return web.Response(status=404)
+
+            try:
+                with Image.open(default_preview_file) as img:
+                    img_bytes = BytesIO()
+                    img.save(img_bytes, format="WEBP")
+                    img_bytes.seek(0)
+                    return web.Response(body=img_bytes.getvalue(), content_type="image/webp")
+            except:
+                return web.Response(status=404)
+
+    def get_model_file_list(self, folder_name: str):
+        folder_name = map_legacy(folder_name)
+        folders = folder_paths.folder_names_and_paths[folder_name]
+        output_list: list[dict] = []
+
+        for index, folder in enumerate(folders[0]):
+            if not os.path.isdir(folder):
+                continue
+            out = self.cache_model_file_list_(folder)
+            if out is None:
+                out = self.recursive_search_models_(folder, index)
+                self.set_cache(folder, out)
+            output_list.extend(out[0])
+
+        return output_list
+
+    def cache_model_file_list_(self, folder: str):
+        model_file_list_cache = self.get_cache(folder)
+
+        if model_file_list_cache is None:
+            return None
+        if not os.path.isdir(folder):
+            return None
+        if os.path.getmtime(folder) != model_file_list_cache[1]:
+            return None
+        for x in model_file_list_cache[1]:
+            time_modified = model_file_list_cache[1][x]
+            folder = x
+            if os.path.getmtime(folder) != time_modified:
+                return None
+
+        return model_file_list_cache
+
+    def recursive_search_models_(self, directory: str, pathIndex: int) -> tuple[list[str], dict[str, float], float]:
+        if not os.path.isdir(directory):
+            return [], {}, time.perf_counter()
+
+        excluded_dir_names = [".git"]
+        # TODO use settings
+        include_hidden_files = False
+
+        result: list[str] = []
+        dirs: dict[str, float] = {}
+
+        for dirpath, subdirs, filenames in os.walk(directory, followlinks=True, topdown=True):
+            subdirs[:] = [d for d in subdirs if d not in excluded_dir_names]
+            if not include_hidden_files:
+                subdirs[:] = [d for d in subdirs if not d.startswith(".")]
+                filenames = [f for f in filenames if not f.startswith(".")]
+
+            filenames = filter_files_extensions(filenames, folder_paths.supported_pt_extensions)
+
+            for file_name in filenames:
+                try:
+                    relative_path = os.path.relpath(os.path.join(dirpath, file_name), directory)
+                    result.append(relative_path)
+                except:
+                    logging.warning(f"Warning: Unable to access {file_name}. Skipping this file.")
+                    continue
+
+            for d in subdirs:
+                path: str = os.path.join(dirpath, d)
+                try:
+                    dirs[path] = os.path.getmtime(path)
+                except FileNotFoundError:
+                    logging.warning(f"Warning: Unable to access {path}. Skipping this path.")
+                    continue
+
+        return [{"name": f, "pathIndex": pathIndex} for f in result], dirs, time.perf_counter()
+
+    def get_model_previews(self, filepath: str) -> list[str]:
+        dirname = os.path.dirname(filepath)
+
+        if not os.path.exists(dirname):
+            return []
+
+        basename = os.path.splitext(filepath)[0]
+        match_files = glob.glob(f"{basename}.*", recursive=False)
+        image_files = filter_files_content_types(match_files, "image")
+
+        result: list[str] = []
+
+        for filename in image_files:
+            _basename = os.path.splitext(filename)[0]
+            if _basename == basename:
+                result.append(filename)
+            if _basename == f"{basename}.preview":
+                result.append(filename)
+        return result
+
+    def __exit__(self, exc_type, exc_value, traceback):
+        self.clear_cache()
--- a/comfy/cldm/cldm.py
+++ b/comfy/cldm/cldm.py
@@ -2,11 +2,9 @@
 #and modified

 import torch
-import torch as th
 import torch.nn as nn

 from ..ldm.modules.diffusionmodules.util import (
-    zero_module,
    timestep_embedding,
 )

@@ -415,7 +413,6 @@ class ControlNet(nn.Module):
        out_output = []
        out_middle = []

-        hs = []
        if self.num_classes is not None:
            assert y.shape[0] == x.shape[0]
            emb = emb + self.label_emb(y)
--- a/comfy/cldm/dit_embedder.py
+++ b/comfy/cldm/dit_embedder.py
@@ -1,10 +1,8 @@
 import math
 from typing import List, Optional, Tuple

-import numpy as np
 import torch
 import torch.nn as nn
-from einops import rearrange
 from torch import Tensor

 from comfy.ldm.modules.diffusionmodules.mmdit import DismantledBlock, PatchEmbed, VectorEmbedder, TimestepEmbedder, get_2d_sincos_pos_embed_torch
--- a/comfy/cldm/mmdit.py
+++ b/comfy/cldm/mmdit.py
@@ -1,5 +1,5 @@
 import torch
-from typing import Dict, Optional
+from typing import Optional
 import comfy.ldm.modules.diffusionmodules.mmdit

 class ControlNet(comfy.ldm.modules.diffusionmodules.mmdit.MMDiT):
--- a/comfy/cli_args.py
+++ b/comfy/cli_args.py
@@ -104,6 +104,7 @@ attn_group = parser.add_mutually_exclusive_group()
 attn_group.add_argument("--use-split-cross-attention", action="store_true", help="Use the split cross attention optimization. Ignored when xformers is used.")
 attn_group.add_argument("--use-quad-cross-attention", action="store_true", help="Use the sub-quadratic cross attention optimization . Ignored when xformers is used.")
 attn_group.add_argument("--use-pytorch-cross-attention", action="store_true", help="Use the new pytorch 2.0 cross attention function.")
+attn_group.add_argument("--use-sage-attention", action="store_true", help="Use sage attention.")

 parser.add_argument("--disable-xformers", action="store_true", help="Disable xformers.")

--- a/comfy/comfy_types/README.md
+++ b/comfy/comfy_types/README.md
@@ -0,0 +1,43 @@
+# Comfy Typing
+## Type hinting for ComfyUI Node development
+
+This module provides type hinting and concrete convenience types for node developers.
+If cloned to the custom_nodes directory of ComfyUI, types can be imported using:
+
+```python
+from comfy_types import IO, ComfyNodeABC, CheckLazyMixin
+
+class ExampleNode(ComfyNodeABC):
+    @classmethod
+    def INPUT_TYPES(s) -> InputTypeDict:
+        return {"required": {}}
+```
+
+Full example is in [examples/example_nodes.py](examples/example_nodes.py).
+
+# Types
+A few primary types are documented below.  More complete information is available via the docstrings on each type.
+
+## `IO`
+
+A string enum of built-in and a few custom data types.  Includes the following special types and their requisite plumbing:
+
+- `ANY`: `"*"`
+- `NUMBER`: `"FLOAT,INT"`
+- `PRIMITIVE`: `"STRING,FLOAT,INT,BOOLEAN"`
+
+## `ComfyNodeABC`
+
+An abstract base class for nodes, offering type-hinting / autocomplete, and somewhat-alright docstrings.
+
+### Type hinting for `INPUT_TYPES`
+
+![INPUT_TYPES auto-completion in Visual Studio Code](examples/input_types.png)
+
+### `INPUT_TYPES` return dict
+
+![INPUT_TYPES return value type hinting in Visual Studio Code](examples/required_hint.png)
+
+### Options for individual inputs
+
+![INPUT_TYPES return value option auto-completion in Visual Studio Code](examples/input_options.png)
--- a/comfy/comfy_types/init.py
+++ b/comfy/comfy_types/init.py
@@ -1,5 +1,6 @@
 import torch
 from typing import Callable, Protocol, TypedDict, Optional, List
+from .node_typing import IO, InputTypeDict, ComfyNodeABC, CheckLazyMixin


 class UnetApplyFunction(Protocol):
@@ -30,3 +31,15 @@ class UnetParams(TypedDict):


 UnetWrapperFunction = Callable[[UnetApplyFunction, UnetParams], torch.Tensor]
+
+
+__all__ = [
+    "UnetWrapperFunction",
+    UnetApplyConds.__name__,
+    UnetParams.__name__,
+    UnetApplyFunction.__name__,
+    IO.__name__,
+    InputTypeDict.__name__,
+    ComfyNodeABC.__name__,
+    CheckLazyMixin.__name__,
+]
--- a/comfy/comfy_types/examples/example_nodes.py
+++ b/comfy/comfy_types/examples/example_nodes.py
@@ -0,0 +1,28 @@
+from comfy_types import IO, ComfyNodeABC, InputTypeDict
+from inspect import cleandoc
+
+
+class ExampleNode(ComfyNodeABC):
+    """An example node that just adds 1 to an input integer.
+
+    * Requires an IDE configured with analysis paths etc to be worth looking at.
+    * Not intended for use in ComfyUI.
+    """
+
+    DESCRIPTION = cleandoc(__doc__)
+    CATEGORY = "examples"
+
+    @classmethod
+    def INPUT_TYPES(s) -> InputTypeDict:
+        return {
+            "required": {
+                "input_int": (IO.INT, {"defaultInput": True}),
+            }
+        }
+
+    RETURN_TYPES = (IO.INT,)
+    RETURN_NAMES = ("input_plus_one",)
+    FUNCTION = "execute"
+
+    def execute(self, input_int: int):
+        return (input_int + 1,)
--- a/comfy/comfy_types/examples/input_options.png
+++ b/comfy/comfy_types/examples/input_options.png
--- a/comfy/comfy_types/examples/input_types.png
+++ b/comfy/comfy_types/examples/input_types.png
--- a/comfy/comfy_types/examples/required_hint.png
+++ b/comfy/comfy_types/examples/required_hint.png
--- a/comfy/comfy_types/node_typing.py
+++ b/comfy/comfy_types/node_typing.py
@@ -0,0 +1,274 @@
+"""Comfy-specific type hinting"""
+
+from __future__ import annotations
+from typing import Literal, TypedDict
+from abc import ABC, abstractmethod
+from enum import Enum
+
+
+class StrEnum(str, Enum):
+    """Base class for string enums. Python's StrEnum is not available until 3.11."""
+
+    def __str__(self) -> str:
+        return self.value
+
+
+class IO(StrEnum):
+    """Node input/output data types.
+
+    Includes functionality for ``"*"`` (`ANY`) and ``"MULTI,TYPES"``.
+    """
+
+    STRING = "STRING"
+    IMAGE = "IMAGE"
+    MASK = "MASK"
+    LATENT = "LATENT"
+    BOOLEAN = "BOOLEAN"
+    INT = "INT"
+    FLOAT = "FLOAT"
+    CONDITIONING = "CONDITIONING"
+    SAMPLER = "SAMPLER"
+    SIGMAS = "SIGMAS"
+    GUIDER = "GUIDER"
+    NOISE = "NOISE"
+    CLIP = "CLIP"
+    CONTROL_NET = "CONTROL_NET"
+    VAE = "VAE"
+    MODEL = "MODEL"
+    CLIP_VISION = "CLIP_VISION"
+    CLIP_VISION_OUTPUT = "CLIP_VISION_OUTPUT"
+    STYLE_MODEL = "STYLE_MODEL"
+    GLIGEN = "GLIGEN"
+    UPSCALE_MODEL = "UPSCALE_MODEL"
+    AUDIO = "AUDIO"
+    WEBCAM = "WEBCAM"
+    POINT = "POINT"
+    FACE_ANALYSIS = "FACE_ANALYSIS"
+    BBOX = "BBOX"
+    SEGS = "SEGS"
+
+    ANY = "*"
+    """Always matches any type, but at a price.
+
+    Causes some functionality issues (e.g. reroutes, link types), and should be avoided whenever possible.
+    """
+    NUMBER = "FLOAT,INT"
+    """A float or an int - could be either"""
+    PRIMITIVE = "STRING,FLOAT,INT,BOOLEAN"
+    """Could be any of: string, float, int, or bool"""
+
+    def __ne__(self, value: object) -> bool:
+        if self == "*" or value == "*":
+            return False
+        if not isinstance(value, str):
+            return True
+        a = frozenset(self.split(","))
+        b = frozenset(value.split(","))
+        return not (b.issubset(a) or a.issubset(b))
+
+
+class InputTypeOptions(TypedDict):
+    """Provides type hinting for the return type of the INPUT_TYPES node function.
+
+    Due to IDE limitations with unions, for now all options are available for all types (e.g. `label_on` is hinted even when the type is not `IO.BOOLEAN`).
+
+    Comfy Docs: https://docs.comfy.org/essentials/custom_node_datatypes
+    """
+
+    default: bool | str | float | int | list | tuple
+    """The default value of the widget"""
+    defaultInput: bool
+    """Defaults to an input slot rather than a widget"""
+    forceInput: bool
+    """`defaultInput` and also don't allow converting to a widget"""
+    lazy: bool
+    """Declares that this input uses lazy evaluation"""
+    rawLink: bool
+    """When a link exists, rather than receiving the evaluated value, you will receive the link (i.e. `["nodeId", <outputIndex>]`). Designed for node expansion."""
+    tooltip: str
+    """Tooltip for the input (or widget), shown on pointer hover"""
+    # class InputTypeNumber(InputTypeOptions):
+    # default: float | int
+    min: float
+    """The minimum value of a number (``FLOAT`` | ``INT``)"""
+    max: float
+    """The maximum value of a number (``FLOAT`` | ``INT``)"""
+    step: float
+    """The amount to increment or decrement a widget by when stepping up/down (``FLOAT`` | ``INT``)"""
+    round: float
+    """Floats are rounded by this value (``FLOAT``)"""
+    # class InputTypeBoolean(InputTypeOptions):
+    # default: bool
+    label_on: str
+    """The label to use in the UI when the bool is True (``BOOLEAN``)"""
+    label_on: str
+    """The label to use in the UI when the bool is False (``BOOLEAN``)"""
+    # class InputTypeString(InputTypeOptions):
+    # default: str
+    multiline: bool
+    """Use a multiline text box (``STRING``)"""
+    placeholder: str
+    """Placeholder text to display in the UI when empty (``STRING``)"""
+    # Deprecated:
+    # defaultVal: str
+    dynamicPrompts: bool
+    """Causes the front-end to evaluate dynamic prompts (``STRING``)"""
+
+
+class HiddenInputTypeDict(TypedDict):
+    """Provides type hinting for the hidden entry of node INPUT_TYPES."""
+
+    node_id: Literal["UNIQUE_ID"]
+    """UNIQUE_ID is the unique identifier of the node, and matches the id property of the node on the client side. It is commonly used in client-server communications (see messages)."""
+    unique_id: Literal["UNIQUE_ID"]
+    """UNIQUE_ID is the unique identifier of the node, and matches the id property of the node on the client side. It is commonly used in client-server communications (see messages)."""
+    prompt: Literal["PROMPT"]
+    """PROMPT is the complete prompt sent by the client to the server. See the prompt object for a full description."""
+    extra_pnginfo: Literal["EXTRA_PNGINFO"]
+    """EXTRA_PNGINFO is a dictionary that will be copied into the metadata of any .png files saved. Custom nodes can store additional information in this dictionary for saving (or as a way to communicate with a downstream node)."""
+    dynprompt: Literal["DYNPROMPT"]
+    """DYNPROMPT is an instance of comfy_execution.graph.DynamicPrompt. It differs from PROMPT in that it may mutate during the course of execution in response to Node Expansion."""
+
+
+class InputTypeDict(TypedDict):
+    """Provides type hinting for node INPUT_TYPES.
+
+    Comfy Docs: https://docs.comfy.org/essentials/custom_node_more_on_inputs
+    """
+
+    required: dict[str, tuple[IO, InputTypeOptions]]
+    """Describes all inputs that must be connected for the node to execute."""
+    optional: dict[str, tuple[IO, InputTypeOptions]]
+    """Describes inputs which do not need to be connected."""
+    hidden: HiddenInputTypeDict
+    """Offers advanced functionality and server-client communication.
+
+    Comfy Docs: https://docs.comfy.org/essentials/custom_node_more_on_inputs#hidden-inputs
+    """
+
+
+class ComfyNodeABC(ABC):
+    """Abstract base class for Comfy nodes.  Includes the names and expected types of attributes.
+
+    Comfy Docs: https://docs.comfy.org/essentials/custom_node_server_overview
+    """
+
+    DESCRIPTION: str
+    """Node description, shown as a tooltip when hovering over the node.
+
+    Usage::
+
+        # Explicitly define the description
+        DESCRIPTION = "Example description here."
+
+        # Use the docstring of the node class.
+        DESCRIPTION = cleandoc(__doc__)
+    """
+    CATEGORY: str
+    """The category of the node, as per the "Add Node" menu.
+
+    Comfy Docs: https://docs.comfy.org/essentials/custom_node_server_overview#category
+    """
+    EXPERIMENTAL: bool
+    """Flags a node as experimental, informing users that it may change or not work as expected."""
+    DEPRECATED: bool
+    """Flags a node as deprecated, indicating to users that they should find alternatives to this node."""
+
+    @classmethod
+    @abstractmethod
+    def INPUT_TYPES(s) -> InputTypeDict:
+        """Defines node inputs.
+
+        * Must include the ``required`` key, which describes all inputs that must be connected for the node to execute.
+        * The ``optional`` key can be added to describe inputs which do not need to be connected.
+        * The ``hidden`` key offers some advanced functionality.  More info at: https://docs.comfy.org/essentials/custom_node_more_on_inputs#hidden-inputs
+
+        Comfy Docs: https://docs.comfy.org/essentials/custom_node_server_overview#input-types
+        """
+        return {"required": {}}
+
+    OUTPUT_NODE: bool
+    """Flags this node as an output node, causing any inputs it requires to be executed.
+
+    If a node is not connected to any output nodes, that node will not be executed.  Usage::
+
+        OUTPUT_NODE = True
+
+    From the docs:
+
+    By default, a node is not considered an output. Set ``OUTPUT_NODE = True`` to specify that it is.
+
+    Comfy Docs: https://docs.comfy.org/essentials/custom_node_server_overview#output-node
+    """
+    INPUT_IS_LIST: bool
+    """A flag indicating if this node implements the additional code necessary to deal with OUTPUT_IS_LIST nodes.
+
+    All inputs of ``type`` will become ``list[type]``, regardless of how many items are passed in.  This also affects ``check_lazy_status``.
+
+    From the docs:
+
+    A node can also override the default input behaviour and receive the whole list in a single call. This is done by setting a class attribute `INPUT_IS_LIST` to ``True``.
+
+    Comfy Docs: https://docs.comfy.org/essentials/custom_node_lists#list-processing
+    """
+    OUTPUT_IS_LIST: tuple[bool]
+    """A tuple indicating which node outputs are lists, but will be connected to nodes that expect individual items.
+
+    Connected nodes that do not implement `INPUT_IS_LIST` will be executed once for every item in the list.
+
+    A ``tuple[bool]``, where the items match those in `RETURN_TYPES`::
+
+        RETURN_TYPES = (IO.INT, IO.INT, IO.STRING)
+        OUTPUT_IS_LIST = (True, True, False) # The string output will be handled normally
+
+    From the docs:
+
+    In order to tell Comfy that the list being returned should not be wrapped, but treated as a series of data for sequential processing,
+    the node should provide a class attribute `OUTPUT_IS_LIST`, which is a ``tuple[bool]``, of the same length as `RETURN_TYPES`,
+    specifying which outputs which should be so treated.
+
+    Comfy Docs: https://docs.comfy.org/essentials/custom_node_lists#list-processing
+    """
+
+    RETURN_TYPES: tuple[IO]
+    """A tuple representing the outputs of this node.
+
+    Usage::
+
+        RETURN_TYPES = (IO.INT, "INT", "CUSTOM_TYPE")
+
+    Comfy Docs: https://docs.comfy.org/essentials/custom_node_server_overview#return-types
+    """
+    RETURN_NAMES: tuple[str]
+    """The output slot names for each item in `RETURN_TYPES`, e.g. ``RETURN_NAMES = ("count", "filter_string")``
+
+    Comfy Docs: https://docs.comfy.org/essentials/custom_node_server_overview#return-names
+    """
+    OUTPUT_TOOLTIPS: tuple[str]
+    """A tuple of strings to use as tooltips for node outputs, one for each item in `RETURN_TYPES`."""
+    FUNCTION: str
+    """The name of the function to execute as a literal string, e.g. `FUNCTION = "execute"`
+
+    Comfy Docs: https://docs.comfy.org/essentials/custom_node_server_overview#function
+    """
+
+
+class CheckLazyMixin:
+    """Provides a basic check_lazy_status implementation and type hinting for nodes that use lazy inputs."""
+
+    def check_lazy_status(self, **kwargs) -> list[str]:
+        """Returns a list of input names that should be evaluated.
+
+        This basic mixin impl. requires all inputs.
+
+        :kwargs: All node inputs will be included here.  If the input is ``None``, it should be assumed that it has not yet been evaluated.  \
+            When using ``INPUT_IS_LIST = True``, unevaluated will instead be ``(None,)``.
+
+        Params should match the nodes execution ``FUNCTION`` (self, and all inputs by name).
+        Will be executed repeatedly until it returns an empty list, or all requested items were already evaluated (and sent as params).
+
+        Comfy Docs: https://docs.comfy.org/essentials/custom_node_lazy_evaluation#defining-check-lazy-status
+        """
+
+        need = [name for name in kwargs if kwargs[name] is None]
+        return need
--- a/comfy/controlnet.py
+++ b/comfy/controlnet.py
@@ -36,6 +36,10 @@ import comfy.cldm.mmdit
 import comfy.ldm.hydit.controlnet
 import comfy.ldm.flux.controlnet
 import comfy.cldm.dit_embedder
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    from comfy.hooks import HookGroup
+

 def broadcast_image_to(tensor, target_batch_size, batched_number):
    current_batch_size = tensor.shape[0]
@@ -78,6 +82,7 @@ class ControlBase:
        self.concat_mask = False
        self.extra_concat_orig = []
        self.extra_concat = None
+        self.extra_hooks: HookGroup = None
        self.preprocess_image = lambda a: a

    def set_cond_hint(self, cond_hint, strength=1.0, timestep_percent_range=(0.0, 1.0), vae=None, extra_concat=[]):
@@ -115,6 +120,14 @@ class ControlBase:
        if self.previous_controlnet is not None:
            out += self.previous_controlnet.get_models()
        return out
+    
+    def get_extra_hooks(self):
+        out = []
+        if self.extra_hooks is not None:
+            out.append(self.extra_hooks)
+        if self.previous_controlnet is not None:
+            out += self.previous_controlnet.get_extra_hooks()
+        return out

    def copy_to(self, c):
        c.cond_hint_original = self.cond_hint_original
@@ -130,6 +143,7 @@ class ControlBase:
        c.strength_type = self.strength_type
        c.concat_mask = self.concat_mask
        c.extra_concat_orig = self.extra_concat_orig.copy()
+        c.extra_hooks = self.extra_hooks.clone() if self.extra_hooks else None
        c.preprocess_image = self.preprocess_image

    def inference_memory_requirements(self, dtype):
@@ -200,10 +214,10 @@ class ControlNet(ControlBase):
        self.concat_mask = concat_mask
        self.preprocess_image = preprocess_image

-    def get_control(self, x_noisy, t, cond, batched_number):
+    def get_control(self, x_noisy, t, cond, batched_number, transformer_options):
        control_prev = None
        if self.previous_controlnet is not None:
-            control_prev = self.previous_controlnet.get_control(x_noisy, t, cond, batched_number)
+            control_prev = self.previous_controlnet.get_control(x_noisy, t, cond, batched_number, transformer_options)

        if self.timestep_range is not None:
            if t[0] > self.timestep_range[0] or t[0] < self.timestep_range[1]:
@@ -283,7 +297,6 @@ class ControlLoraOps:
    class Linear(torch.nn.Module, comfy.ops.CastWeightBiasOp):
        def __init__(self, in_features: int, out_features: int, bias: bool = True,
                    device=None, dtype=None) -> None:
-            factory_kwargs = {'device': device, 'dtype': dtype}
            super().__init__()
            self.in_features = in_features
            self.out_features = out_features
@@ -368,7 +381,6 @@ class ControlLora(ControlNet):
        self.control_model.to(comfy.model_management.get_torch_device())
        diffusion_model = model.diffusion_model
        sd = diffusion_model.state_dict()
-        cm = self.control_model.state_dict()

        for k in sd:
            weight = sd[k]
@@ -758,10 +770,10 @@ class T2IAdapter(ControlBase):
        height = math.ceil(height / unshuffle_amount) * unshuffle_amount
        return width, height

-    def get_control(self, x_noisy, t, cond, batched_number):
+    def get_control(self, x_noisy, t, cond, batched_number, transformer_options):
        control_prev = None
        if self.previous_controlnet is not None:
-            control_prev = self.previous_controlnet.get_control(x_noisy, t, cond, batched_number)
+            control_prev = self.previous_controlnet.get_control(x_noisy, t, cond, batched_number, transformer_options)

        if self.timestep_range is not None:
            if t[0] > self.timestep_range[0] or t[0] < self.timestep_range[1]:
@@ -809,7 +821,7 @@ def load_t2i_adapter(t2i_data, model_options={}): #TODO: model_options
        for i in range(4):
            for j in range(2):
                prefix_replace["adapter.body.{}.resnets.{}.".format(i, j)] = "body.{}.".format(i * 2 + j)
-            prefix_replace["adapter.body.{}.".format(i, j)] = "body.{}.".format(i * 2)
+            prefix_replace["adapter.body.{}.".format(i, )] = "body.{}.".format(i * 2)
        prefix_replace["adapter."] = ""
        t2i_data = comfy.utils.state_dict_prefix_replace(t2i_data, prefix_replace)
    keys = t2i_data.keys()
--- a/comfy/diffusers_convert.py
+++ b/comfy/diffusers_convert.py
@@ -157,16 +157,23 @@ vae_conversion_map_attn = [
 ]


-def reshape_weight_for_sd(w):
+def reshape_weight_for_sd(w, conv3d=False):
    # convert HF linear weights to SD conv2d weights
-    return w.reshape(*w.shape, 1, 1)
+    if conv3d:
+        return w.reshape(*w.shape, 1, 1, 1)
+    else:
+        return w.reshape(*w.shape, 1, 1)


 def convert_vae_state_dict(vae_state_dict):
    mapping = {k: k for k in vae_state_dict.keys()}
+    conv3d = False
    for k, v in mapping.items():
        for sd_part, hf_part in vae_conversion_map:
            v = v.replace(hf_part, sd_part)
+        if v.endswith(".conv.weight"):
+            if not conv3d and vae_state_dict[k].ndim == 5:
+                conv3d = True
        mapping[k] = v
    for k, v in mapping.items():
        if "attentions" in k:
@@ -179,7 +186,7 @@ def convert_vae_state_dict(vae_state_dict):
        for weight_name in weights_to_convert:
            if f"mid.attn_1.{weight_name}.weight" in k:
                logging.debug(f"Reshaping {k} for SD format")
-                new_state_dict[k] = reshape_weight_for_sd(v)
+                new_state_dict[k] = reshape_weight_for_sd(v, conv3d=conv3d)
    return new_state_dict


--- a/comfy/extra_samplers/uni_pc.py
+++ b/comfy/extra_samplers/uni_pc.py
@@ -1,10 +1,9 @@
 #code taken from: https://github.com/wl-zhao/UniPC and modified

 import torch
-import torch.nn.functional as F
 import math

-from tqdm.auto import trange, tqdm
+from tqdm.auto import trange


 class NoiseScheduleVP:
@@ -704,7 +703,6 @@ class UniPC:
    ):
        # t_0 = 1. / self.noise_schedule.total_N if t_end is None else t_end
        # t_T = self.noise_schedule.T if t_start is None else t_start
-        device = x.device
        steps = len(timesteps) - 1
        if method == 'multistep':
            assert steps >= order
--- a/comfy/gligen.py
+++ b/comfy/gligen.py
@@ -1,3 +1,4 @@
+import math
 import torch
 from torch import nn
 from .ldm.modules.attention import CrossAttention
--- a/comfy/hooks.py
+++ b/comfy/hooks.py
@@ -0,0 +1,690 @@
+from __future__ import annotations
+from typing import TYPE_CHECKING, Callable
+import enum
+import math
+import torch
+import numpy as np
+import itertools
+
+if TYPE_CHECKING:
+    from comfy.model_patcher import ModelPatcher, PatcherInjection
+    from comfy.model_base import BaseModel
+    from comfy.sd import CLIP
+import comfy.lora
+import comfy.model_management
+import comfy.patcher_extension
+from node_helpers import conditioning_set_values
+
+class EnumHookMode(enum.Enum):
+    MinVram = "minvram"
+    MaxSpeed = "maxspeed"
+
+class EnumHookType(enum.Enum):
+    Weight = "weight"
+    Patch = "patch"
+    ObjectPatch = "object_patch"
+    AddModels = "add_models"
+    Callbacks = "callbacks"
+    Wrappers = "wrappers"
+    SetInjections = "add_injections"
+
+class EnumWeightTarget(enum.Enum):
+    Model = "model"
+    Clip = "clip"
+
+class _HookRef:
+    pass
+
+# NOTE: this is an example of how the should_register function should look
+def default_should_register(hook: 'Hook', model: 'ModelPatcher', model_options: dict, target: EnumWeightTarget, registered: list[Hook]):
+    return True
+
+
+class Hook:
+    def __init__(self, hook_type: EnumHookType=None, hook_ref: _HookRef=None, hook_id: str=None,
+                 hook_keyframe: 'HookKeyframeGroup'=None):
+        self.hook_type = hook_type
+        self.hook_ref = hook_ref if hook_ref else _HookRef()
+        self.hook_id = hook_id
+        self.hook_keyframe = hook_keyframe if hook_keyframe else HookKeyframeGroup()
+        self.custom_should_register = default_should_register
+        self.auto_apply_to_nonpositive = False
+
+    @property
+    def strength(self):
+        return self.hook_keyframe.strength
+
+    def initialize_timesteps(self, model: 'BaseModel'):
+        self.reset()
+        self.hook_keyframe.initialize_timesteps(model)
+
+    def reset(self):
+        self.hook_keyframe.reset()
+
+    def clone(self, subtype: Callable=None):
+        if subtype is None:
+            subtype = type(self)
+        c: Hook = subtype()
+        c.hook_type = self.hook_type
+        c.hook_ref = self.hook_ref
+        c.hook_id = self.hook_id
+        c.hook_keyframe = self.hook_keyframe
+        c.custom_should_register = self.custom_should_register
+        # TODO: make this do something
+        c.auto_apply_to_nonpositive = self.auto_apply_to_nonpositive
+        return c
+
+    def should_register(self, model: 'ModelPatcher', model_options: dict, target: EnumWeightTarget, registered: list[Hook]):
+        return self.custom_should_register(self, model, model_options, target, registered)
+
+    def add_hook_patches(self, model: 'ModelPatcher', model_options: dict, target: EnumWeightTarget, registered: list[Hook]):
+        raise NotImplementedError("add_hook_patches should be defined for Hook subclasses")
+
+    def on_apply(self, model: 'ModelPatcher', transformer_options: dict[str]):
+        pass
+
+    def on_unapply(self, model: 'ModelPatcher', transformer_options: dict[str]):
+        pass
+
+    def __eq__(self, other: 'Hook'):
+        return self.__class__ == other.__class__ and self.hook_ref == other.hook_ref
+
+    def __hash__(self):
+        return hash(self.hook_ref)
+
+class WeightHook(Hook):
+    def __init__(self, strength_model=1.0, strength_clip=1.0):
+        super().__init__(hook_type=EnumHookType.Weight)
+        self.weights: dict = None
+        self.weights_clip: dict = None
+        self.need_weight_init = True
+        self._strength_model = strength_model
+        self._strength_clip = strength_clip
+    
+    @property
+    def strength_model(self):
+        return self._strength_model * self.strength
+    
+    @property
+    def strength_clip(self):
+        return self._strength_clip * self.strength
+
+    def add_hook_patches(self, model: 'ModelPatcher', model_options: dict, target: EnumWeightTarget, registered: list[Hook]):
+        if not self.should_register(model, model_options, target, registered):
+            return False
+        weights = None
+        if target == EnumWeightTarget.Model:
+            strength = self._strength_model
+        else:
+            strength = self._strength_clip
+        
+        if self.need_weight_init:
+            key_map = {}
+            if target == EnumWeightTarget.Model:
+                key_map = comfy.lora.model_lora_keys_unet(model.model, key_map)
+            else:
+                key_map = comfy.lora.model_lora_keys_clip(model.model, key_map)
+            weights = comfy.lora.load_lora(self.weights, key_map, log_missing=False)
+        else:
+            if target == EnumWeightTarget.Model:
+                weights = self.weights
+            else:
+                weights = self.weights_clip
+        model.add_hook_patches(hook=self, patches=weights, strength_patch=strength)
+        registered.append(self)
+        return True
+        # TODO: add logs about any keys that were not applied
+
+    def clone(self, subtype: Callable=None):
+        if subtype is None:
+            subtype = type(self)
+        c: WeightHook = super().clone(subtype)
+        c.weights = self.weights
+        c.weights_clip = self.weights_clip
+        c.need_weight_init = self.need_weight_init
+        c._strength_model = self._strength_model
+        c._strength_clip = self._strength_clip
+        return c
+
+class PatchHook(Hook):
+    def __init__(self):
+        super().__init__(hook_type=EnumHookType.Patch)
+        self.patches: dict = None
+    
+    def clone(self, subtype: Callable=None):
+        if subtype is None:
+            subtype = type(self)
+        c: PatchHook = super().clone(subtype)
+        c.patches = self.patches
+        return c
+    # TODO: add functionality
+
+class ObjectPatchHook(Hook):
+    def __init__(self):
+        super().__init__(hook_type=EnumHookType.ObjectPatch)
+        self.object_patches: dict = None
+    
+    def clone(self, subtype: Callable=None):
+        if subtype is None:
+            subtype = type(self)
+        c: ObjectPatchHook = super().clone(subtype)
+        c.object_patches = self.object_patches
+        return c
+    # TODO: add functionality
+
+class AddModelsHook(Hook):
+    def __init__(self, key: str=None, models: list['ModelPatcher']=None):
+        super().__init__(hook_type=EnumHookType.AddModels)
+        self.key = key
+        self.models = models
+        self.append_when_same = True
+    
+    def clone(self, subtype: Callable=None):
+        if subtype is None:
+            subtype = type(self)
+        c: AddModelsHook = super().clone(subtype)
+        c.key = self.key
+        c.models = self.models.copy() if self.models else self.models
+        c.append_when_same = self.append_when_same
+        return c
+    # TODO: add functionality
+
+class CallbackHook(Hook):
+    def __init__(self, key: str=None, callback: Callable=None):
+        super().__init__(hook_type=EnumHookType.Callbacks)
+        self.key = key
+        self.callback = callback
+
+    def clone(self, subtype: Callable=None):
+        if subtype is None:
+            subtype = type(self)
+        c: CallbackHook = super().clone(subtype)
+        c.key = self.key
+        c.callback = self.callback
+        return c
+    # TODO: add functionality
+
+class WrapperHook(Hook):
+    def __init__(self, wrappers_dict: dict[str, dict[str, dict[str, list[Callable]]]]=None):
+        super().__init__(hook_type=EnumHookType.Wrappers)
+        self.wrappers_dict = wrappers_dict
+
+    def clone(self, subtype: Callable=None):
+        if subtype is None:
+            subtype = type(self)
+        c: WrapperHook = super().clone(subtype)
+        c.wrappers_dict = self.wrappers_dict
+        return c
+    
+    def add_hook_patches(self, model: 'ModelPatcher', model_options: dict, target: EnumWeightTarget, registered: list[Hook]):
+        if not self.should_register(model, model_options, target, registered):
+            return False
+        add_model_options = {"transformer_options": self.wrappers_dict}
+        comfy.patcher_extension.merge_nested_dicts(model_options, add_model_options, copy_dict1=False)
+        registered.append(self)
+        return True
+
+class SetInjectionsHook(Hook):
+    def __init__(self, key: str=None, injections: list['PatcherInjection']=None):
+        super().__init__(hook_type=EnumHookType.SetInjections)
+        self.key = key
+        self.injections = injections
+    
+    def clone(self, subtype: Callable=None):
+        if subtype is None:
+            subtype = type(self)
+        c: SetInjectionsHook = super().clone(subtype)
+        c.key = self.key
+        c.injections = self.injections.copy() if self.injections else self.injections
+        return c
+    
+    def add_hook_injections(self, model: 'ModelPatcher'):
+        # TODO: add functionality
+        pass
+
+class HookGroup:
+    def __init__(self):
+        self.hooks: list[Hook] = []
+
+    def add(self, hook: Hook):
+        if hook not in self.hooks:
+            self.hooks.append(hook)
+    
+    def contains(self, hook: Hook):
+        return hook in self.hooks
+    
+    def clone(self):
+        c = HookGroup()
+        for hook in self.hooks:
+            c.add(hook.clone())
+        return c
+
+    def clone_and_combine(self, other: 'HookGroup'):
+        c = self.clone()
+        if other is not None:
+            for hook in other.hooks:
+                c.add(hook.clone())
+        return c
+    
+    def set_keyframes_on_hooks(self, hook_kf: 'HookKeyframeGroup'):
+        if hook_kf is None:
+            hook_kf = HookKeyframeGroup()
+        else:
+            hook_kf = hook_kf.clone()
+        for hook in self.hooks:
+            hook.hook_keyframe = hook_kf
+
+    def get_dict_repr(self):
+        d: dict[EnumHookType, dict[Hook, None]] = {}
+        for hook in self.hooks:
+            with_type = d.setdefault(hook.hook_type, {})
+            with_type[hook] = None
+        return d
+
+    def get_hooks_for_clip_schedule(self):
+        scheduled_hooks: dict[WeightHook, list[tuple[tuple[float,float], HookKeyframe]]] = {}
+        for hook in self.hooks:
+            # only care about WeightHooks, for now
+            if hook.hook_type == EnumHookType.Weight:
+                hook_schedule = []
+                # if no hook keyframes, assign default value
+                if len(hook.hook_keyframe.keyframes) == 0:
+                    hook_schedule.append(((0.0, 1.0), None))
+                    scheduled_hooks[hook] = hook_schedule
+                    continue
+                # find ranges of values
+                prev_keyframe = hook.hook_keyframe.keyframes[0]
+                for keyframe in hook.hook_keyframe.keyframes:
+                    if keyframe.start_percent > prev_keyframe.start_percent and not math.isclose(keyframe.strength, prev_keyframe.strength):
+                        hook_schedule.append(((prev_keyframe.start_percent, keyframe.start_percent), prev_keyframe))
+                        prev_keyframe = keyframe
+                    elif keyframe.start_percent == prev_keyframe.start_percent:
+                        prev_keyframe = keyframe
+                # create final range, assuming last start_percent was not 1.0
+                if not math.isclose(prev_keyframe.start_percent, 1.0):
+                    hook_schedule.append(((prev_keyframe.start_percent, 1.0), prev_keyframe))
+                scheduled_hooks[hook] = hook_schedule
+        # hooks should not have their schedules in a list of tuples
+        all_ranges: list[tuple[float, float]] = []
+        for range_kfs in scheduled_hooks.values():
+            for t_range, keyframe in range_kfs:
+                all_ranges.append(t_range)
+        # turn list of ranges into boundaries
+        boundaries_set = set(itertools.chain.from_iterable(all_ranges))
+        boundaries_set.add(0.0)
+        boundaries = sorted(boundaries_set)
+        real_ranges = [(boundaries[i], boundaries[i + 1]) for i in range(len(boundaries) - 1)]
+        # with real ranges defined, give appropriate hooks w/ keyframes for each range
+        scheduled_keyframes: list[tuple[tuple[float,float], list[tuple[WeightHook, HookKeyframe]]]] = []
+        for t_range in real_ranges:
+            hooks_schedule = []
+            for hook, val in scheduled_hooks.items():
+                keyframe = None
+                # check if is a keyframe that works for the current t_range
+                for stored_range, stored_kf in val:
+                    # if stored start is less than current end, then fits - give it assigned keyframe
+                    if stored_range[0] < t_range[1] and stored_range[1] > t_range[0]:
+                        keyframe = stored_kf
+                        break
+                hooks_schedule.append((hook, keyframe))
+            scheduled_keyframes.append((t_range, hooks_schedule))
+        return scheduled_keyframes
+
+    def reset(self):
+        for hook in self.hooks:
+            hook.reset()
+
+    @staticmethod
+    def combine_all_hooks(hooks_list: list['HookGroup'], require_count=0) -> 'HookGroup':
+        actual: list[HookGroup] = []
+        for group in hooks_list:
+            if group is not None:
+                actual.append(group)
+        if len(actual) < require_count:
+            raise Exception(f"Need at least {require_count} hooks to combine, but only had {len(actual)}.")
+        # if no hooks, then return None
+        if len(actual) == 0:
+            return None
+        # if only 1 hook, just return itself without cloning
+        elif len(actual) == 1:
+            return actual[0]
+        final_hook: HookGroup = None
+        for hook in actual:
+            if final_hook is None:
+                final_hook = hook.clone()
+            else:
+                final_hook = final_hook.clone_and_combine(hook)
+        return final_hook
+
+
+class HookKeyframe:
+    def __init__(self, strength: float, start_percent=0.0, guarantee_steps=1):
+        self.strength = strength
+        # scheduling
+        self.start_percent = float(start_percent)
+        self.start_t = 999999999.9
+        self.guarantee_steps = guarantee_steps
+    
+    def clone(self):
+        c = HookKeyframe(strength=self.strength,
+                                start_percent=self.start_percent, guarantee_steps=self.guarantee_steps)
+        c.start_t = self.start_t
+        return c
+
+class HookKeyframeGroup:
+    def __init__(self):
+        self.keyframes: list[HookKeyframe] = []
+        self._current_keyframe: HookKeyframe = None
+        self._current_used_steps = 0
+        self._current_index = 0
+        self._current_strength = None
+        self._curr_t = -1.
+
+    # properties shadow those of HookWeightsKeyframe
+    @property
+    def strength(self):
+        if self._current_keyframe is not None:
+            return self._current_keyframe.strength
+        return 1.0
+
+    def reset(self):
+        self._current_keyframe = None
+        self._current_used_steps = 0
+        self._current_index = 0
+        self._current_strength = None
+        self.curr_t = -1.
+        self._set_first_as_current()
+    
+    def add(self, keyframe: HookKeyframe):
+        # add to end of list, then sort
+        self.keyframes.append(keyframe)
+        self.keyframes = get_sorted_list_via_attr(self.keyframes, "start_percent")
+        self._set_first_as_current()
+
+    def _set_first_as_current(self):
+        if len(self.keyframes) > 0:
+            self._current_keyframe = self.keyframes[0]
+        else:
+            self._current_keyframe = None
+    
+    def has_index(self, index: int):
+        return index >= 0 and index < len(self.keyframes)
+
+    def is_empty(self):
+        return len(self.keyframes) == 0
+    
+    def clone(self):
+        c = HookKeyframeGroup()
+        for keyframe in self.keyframes:
+            c.keyframes.append(keyframe.clone())
+        c._set_first_as_current()
+        return c
+    
+    def initialize_timesteps(self, model: 'BaseModel'):
+        for keyframe in self.keyframes:
+            keyframe.start_t = model.model_sampling.percent_to_sigma(keyframe.start_percent)
+
+    def prepare_current_keyframe(self, curr_t: float) -> bool:
+        if self.is_empty():
+            return False
+        if curr_t == self._curr_t:
+            return False
+        prev_index = self._current_index
+        prev_strength = self._current_strength
+        # if met guaranteed steps, look for next keyframe in case need to switch
+        if self._current_used_steps >= self._current_keyframe.guarantee_steps:
+            # if has next index, loop through and see if need to switch
+            if self.has_index(self._current_index+1):
+                for i in range(self._current_index+1, len(self.keyframes)):
+                    eval_c = self.keyframes[i]
+                    # check if start_t is greater or equal to curr_t
+                    # NOTE: t is in terms of sigmas, not percent, so bigger number = earlier step in sampling
+                    if eval_c.start_t >= curr_t:
+                        self._current_index = i
+                        self._current_strength = eval_c.strength
+                        self._current_keyframe = eval_c
+                        self._current_used_steps = 0
+                        # if guarantee_steps greater than zero, stop searching for other keyframes
+                        if self._current_keyframe.guarantee_steps > 0:
+                            break
+                    # if eval_c is outside the percent range, stop looking further
+                    else: break
+        # update steps current context is used
+        self._current_used_steps += 1
+        # update current timestep this was performed on
+        self._curr_t = curr_t
+        # return True if keyframe changed, False if no change
+        return prev_index != self._current_index and prev_strength != self._current_strength
+
+
+class InterpolationMethod:
+    LINEAR = "linear"
+    EASE_IN = "ease_in"
+    EASE_OUT = "ease_out"
+    EASE_IN_OUT = "ease_in_out"
+
+    _LIST = [LINEAR, EASE_IN, EASE_OUT, EASE_IN_OUT]
+
+    @classmethod
+    def get_weights(cls, num_from: float, num_to: float, length: int, method: str, reverse=False):
+        diff = num_to - num_from
+        if method == cls.LINEAR:
+            weights = torch.linspace(num_from, num_to, length)
+        elif method == cls.EASE_IN:
+            index = torch.linspace(0, 1, length)
+            weights = diff * np.power(index, 2) + num_from
+        elif method == cls.EASE_OUT:
+            index = torch.linspace(0, 1, length)
+            weights = diff * (1 - np.power(1 - index, 2)) + num_from
+        elif method == cls.EASE_IN_OUT:
+            index = torch.linspace(0, 1, length)
+            weights = diff * ((1 - np.cos(index * np.pi)) / 2) + num_from
+        else:
+            raise ValueError(f"Unrecognized interpolation method '{method}'.")
+        if reverse:
+            weights = weights.flip(dims=(0,))
+        return weights
+
+def get_sorted_list_via_attr(objects: list, attr: str) -> list:
+    if not objects:
+        return objects
+    elif len(objects) <= 1:
+        return [x for x in objects]
+    # now that we know we have to sort, do it following these rules:
+    # a) if objects have same value of attribute, maintain their relative order
+    # b) perform sorting of the groups of objects with same attributes
+    unique_attrs = {}
+    for o in objects:
+        val_attr = getattr(o, attr)
+        attr_list: list = unique_attrs.get(val_attr, list())
+        attr_list.append(o)
+        if val_attr not in unique_attrs:
+            unique_attrs[val_attr] = attr_list
+    # now that we have the unique attr values grouped together in relative order, sort them by key
+    sorted_attrs = dict(sorted(unique_attrs.items()))
+    # now flatten out the dict into a list to return
+    sorted_list = []
+    for object_list in sorted_attrs.values():
+        sorted_list.extend(object_list)
+    return sorted_list
+
+def create_hook_lora(lora: dict[str, torch.Tensor], strength_model: float, strength_clip: float):
+    hook_group = HookGroup()
+    hook = WeightHook(strength_model=strength_model, strength_clip=strength_clip)
+    hook_group.add(hook)
+    hook.weights = lora
+    return hook_group
+
+def create_hook_model_as_lora(weights_model, weights_clip, strength_model: float, strength_clip: float):
+    hook_group = HookGroup()
+    hook = WeightHook(strength_model=strength_model, strength_clip=strength_clip)
+    hook_group.add(hook)
+    patches_model = None
+    patches_clip = None
+    if weights_model is not None:
+        patches_model = {}
+        for key in weights_model:
+            patches_model[key] = ("model_as_lora", (weights_model[key],))
+    if weights_clip is not None:
+        patches_clip = {}
+        for key in weights_clip:
+            patches_clip[key] = ("model_as_lora", (weights_clip[key],))
+    hook.weights = patches_model
+    hook.weights_clip = patches_clip
+    hook.need_weight_init = False
+    return hook_group
+
+def get_patch_weights_from_model(model: 'ModelPatcher', discard_model_sampling=True):
+    if model is None:
+        return None
+    patches_model: dict[str, torch.Tensor] = model.model.state_dict()
+    if discard_model_sampling:
+        # do not include ANY model_sampling components of the model that should act as a patch
+        for key in list(patches_model.keys()):
+            if key.startswith("model_sampling"):
+                patches_model.pop(key, None)
+    return patches_model
+
+# NOTE: this function shows how to register weight hooks directly on the ModelPatchers
+def load_hook_lora_for_models(model: 'ModelPatcher', clip: 'CLIP', lora: dict[str, torch.Tensor],
+                              strength_model: float, strength_clip: float):
+    key_map = {}
+    if model is not None:
+        key_map = comfy.lora.model_lora_keys_unet(model.model, key_map)
+    if clip is not None:
+        key_map = comfy.lora.model_lora_keys_clip(clip.cond_stage_model, key_map)
+
+    hook_group = HookGroup()
+    hook = WeightHook()
+    hook_group.add(hook)
+    loaded: dict[str] = comfy.lora.load_lora(lora, key_map)
+    if model is not None:
+        new_modelpatcher = model.clone()
+        k = new_modelpatcher.add_hook_patches(hook=hook, patches=loaded, strength_patch=strength_model)
+    else:
+        k = ()
+        new_modelpatcher = None
+    
+    if clip is not None:
+        new_clip = clip.clone()
+        k1 = new_clip.patcher.add_hook_patches(hook=hook, patches=loaded, strength_patch=strength_clip)
+    else:
+        k1 = ()
+        new_clip = None
+    k = set(k)
+    k1 = set(k1)
+    for x in loaded:
+        if (x not in k) and (x not in k1):
+            print(f"NOT LOADED {x}")
+    return (new_modelpatcher, new_clip, hook_group)
+
+def _combine_hooks_from_values(c_dict: dict[str, HookGroup], values: dict[str, HookGroup], cache: dict[tuple[HookGroup, HookGroup], HookGroup]):
+    hooks_key = 'hooks'
+    # if hooks only exist in one dict, do what's needed so that it ends up in c_dict
+    if hooks_key not in values:
+        return
+    if hooks_key not in c_dict:
+        hooks_value = values.get(hooks_key, None)
+        if hooks_value is not None:
+            c_dict[hooks_key] = hooks_value
+        return
+    # otherwise, need to combine with minimum duplication via cache
+    hooks_tuple = (c_dict[hooks_key], values[hooks_key])
+    cached_hooks = cache.get(hooks_tuple, None)
+    if cached_hooks is None:
+        new_hooks = hooks_tuple[0].clone_and_combine(hooks_tuple[1])
+        cache[hooks_tuple] = new_hooks
+        c_dict[hooks_key] = new_hooks
+    else:
+        c_dict[hooks_key] = cache[hooks_tuple]
+
+def conditioning_set_values_with_hooks(conditioning, values={}, append_hooks=True):
+    c = []
+    hooks_combine_cache: dict[tuple[HookGroup, HookGroup], HookGroup] = {}
+    for t in conditioning:
+        n = [t[0], t[1].copy()]
+        for k in values:
+            if append_hooks and k == 'hooks':
+                _combine_hooks_from_values(n[1], values, hooks_combine_cache)
+            else:
+                n[1][k] = values[k]
+        c.append(n)
+
+    return c
+
+def set_hooks_for_conditioning(cond, hooks: HookGroup, append_hooks=True):
+    if hooks is None:
+        return cond
+    return conditioning_set_values_with_hooks(cond, {'hooks': hooks}, append_hooks=append_hooks)
+
+def set_timesteps_for_conditioning(cond, timestep_range: tuple[float,float]):
+    if timestep_range is None:
+        return cond
+    return conditioning_set_values(cond, {"start_percent": timestep_range[0],
+                                          "end_percent": timestep_range[1]})
+
+def set_mask_for_conditioning(cond, mask: torch.Tensor, set_cond_area: str, strength: float):
+    if mask is None:
+        return cond
+    set_area_to_bounds = False
+    if set_cond_area != 'default':
+        set_area_to_bounds = True
+    if len(mask.shape) < 3:
+        mask = mask.unsqueeze(0)
+    return conditioning_set_values(cond, {'mask': mask,
+                                          'set_area_to_bounds': set_area_to_bounds,
+                                          'mask_strength': strength})
+
+def combine_conditioning(conds: list):
+    combined_conds = []
+    for cond in conds:
+        combined_conds.extend(cond)
+    return combined_conds
+
+def combine_with_new_conds(conds: list, new_conds: list):
+    combined_conds = []
+    for c, new_c in zip(conds, new_conds):
+        combined_conds.append(combine_conditioning([c, new_c]))
+    return combined_conds
+
+def set_conds_props(conds: list, strength: float, set_cond_area: str,
+                   mask: torch.Tensor=None, hooks: HookGroup=None, timesteps_range: tuple[float,float]=None, append_hooks=True):
+    final_conds = []
+    for c in conds:
+        # first, apply lora_hook to conditioning, if provided
+        c = set_hooks_for_conditioning(c, hooks, append_hooks=append_hooks)
+        # next, apply mask to conditioning
+        c = set_mask_for_conditioning(cond=c, mask=mask, strength=strength, set_cond_area=set_cond_area)
+        # apply timesteps, if present
+        c = set_timesteps_for_conditioning(cond=c, timestep_range=timesteps_range)
+        # finally, apply mask to conditioning and store
+        final_conds.append(c)
+    return final_conds
+
+def set_conds_props_and_combine(conds: list, new_conds: list, strength: float=1.0, set_cond_area: str="default",
+                               mask: torch.Tensor=None, hooks: HookGroup=None, timesteps_range: tuple[float,float]=None, append_hooks=True):
+    combined_conds = []
+    for c, masked_c in zip(conds, new_conds):
+        # first, apply lora_hook to new conditioning, if provided
+        masked_c = set_hooks_for_conditioning(masked_c, hooks, append_hooks=append_hooks)
+        # next, apply mask to new conditioning, if provided
+        masked_c = set_mask_for_conditioning(cond=masked_c, mask=mask, set_cond_area=set_cond_area, strength=strength)
+        # apply timesteps, if present
+        masked_c = set_timesteps_for_conditioning(cond=masked_c, timestep_range=timesteps_range)
+        # finally, combine with existing conditioning and store
+        combined_conds.append(combine_conditioning([c, masked_c]))
+    return combined_conds
+
+def set_default_conds_and_combine(conds: list, new_conds: list,
+                                   hooks: HookGroup=None, timesteps_range: tuple[float,float]=None, append_hooks=True):
+    combined_conds = []
+    for c, new_c in zip(conds, new_conds):
+        # first, apply lora_hook to new conditioning, if provided
+        new_c = set_hooks_for_conditioning(new_c, hooks, append_hooks=append_hooks)
+        # next, add default_cond key to cond so that during sampling, it can be identified
+        new_c = conditioning_set_values(new_c, {'default': True})
+        # apply timesteps, if present
+        new_c = set_timesteps_for_conditioning(cond=new_c, timestep_range=timesteps_range)
+        # finally, combine with existing conditioning and store
+        combined_conds.append(combine_conditioning([c, new_c]))
+    return combined_conds
--- a/comfy/k_diffusion/deis.py
+++ b/comfy/k_diffusion/deis.py
@@ -11,7 +11,6 @@ import numpy as np
 # Transfer from the input time (sigma) used in EDM to that (t) used in DEIS.

 def edm2t(edm_steps, epsilon_s=1e-3, sigma_min=0.002, sigma_max=80):
-    vp_sigma = lambda beta_d, beta_min: lambda t: (np.e ** (0.5 * beta_d * (t ** 2) + beta_min * t) - 1) ** 0.5
    vp_sigma_inv = lambda beta_d, beta_min: lambda sigma: ((beta_min ** 2 + 2 * beta_d * (sigma ** 2 + 1).log()).sqrt() - beta_min) / beta_d
    vp_beta_d = 2 * (np.log(torch.tensor(sigma_min).cpu() ** 2 + 1) / epsilon_s - np.log(torch.tensor(sigma_max).cpu() ** 2 + 1)) / (epsilon_s - 1)
    vp_beta_min = np.log(torch.tensor(sigma_max).cpu() ** 2 + 1) - 0.5 * vp_beta_d
--- a/comfy/k_diffusion/sampling.py
+++ b/comfy/k_diffusion/sampling.py
@@ -175,12 +175,14 @@ def sample_euler_ancestral(model, x, sigmas, extra_args=None, callback=None, dis
        sigma_down, sigma_up = get_ancestral_step(sigmas[i], sigmas[i + 1], eta=eta)
        if callback is not None:
            callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
-        d = to_d(x, sigmas[i], denoised)
-        # Euler method
-        dt = sigma_down - sigmas[i]
-        x = x + d * dt
-        if sigmas[i + 1] > 0:
-            x = x + noise_sampler(sigmas[i], sigmas[i + 1]) * s_noise * sigma_up
+
+        if sigma_down == 0:
+            x = denoised
+        else:
+            d = to_d(x, sigmas[i], denoised)
+            # Euler method
+            dt = sigma_down - sigmas[i]
+            x = x + d * dt + noise_sampler(sigmas[i], sigmas[i + 1]) * s_noise * sigma_up
    return x

@torch.no_grad()
@@ -192,19 +194,22 @@ def sample_euler_ancestral_RF(model, x, sigmas, extra_args=None, callback=None,
    for i in trange(len(sigmas) - 1, disable=disable):
        denoised = model(x, sigmas[i] * s_in, **extra_args)
        # sigma_down, sigma_up = get_ancestral_step(sigmas[i], sigmas[i + 1], eta=eta)
-        downstep_ratio = 1 + (sigmas[i+1]/sigmas[i] - 1) * eta
-        sigma_down = sigmas[i+1] * downstep_ratio
-        alpha_ip1 = 1 - sigmas[i+1]
-        alpha_down = 1 - sigma_down
-        renoise_coeff = (sigmas[i+1]**2 - sigma_down**2*alpha_ip1**2/alpha_down**2)**0.5
        if callback is not None:
            callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})

-        # Euler method
-        sigma_down_i_ratio = sigma_down / sigmas[i]
-        x = sigma_down_i_ratio * x + (1 - sigma_down_i_ratio) * denoised
-        if sigmas[i + 1] > 0 and eta > 0:
-            x = (alpha_ip1/alpha_down) * x + noise_sampler(sigmas[i], sigmas[i + 1]) * s_noise * renoise_coeff
+        if sigmas[i + 1] == 0:
+            x = denoised
+        else:
+            downstep_ratio = 1 + (sigmas[i + 1] / sigmas[i] - 1) * eta
+            sigma_down = sigmas[i + 1] * downstep_ratio
+            alpha_ip1 = 1 - sigmas[i + 1]
+            alpha_down = 1 - sigma_down
+            renoise_coeff = (sigmas[i + 1]**2 - sigma_down**2 * alpha_ip1**2 / alpha_down**2)**0.5
+            # Euler method
+            sigma_down_i_ratio = sigma_down / sigmas[i]
+            x = sigma_down_i_ratio * x + (1 - sigma_down_i_ratio) * denoised
+            if eta > 0:
+                x = (alpha_ip1 / alpha_down) * x + noise_sampler(sigmas[i], sigmas[i + 1]) * s_noise * renoise_coeff
    return x

@torch.no_grad()
@@ -317,7 +322,6 @@ def sample_dpm_2_ancestral_RF(model, x, sigmas, extra_args=None, callback=None,
    s_in = x.new_ones([x.shape[0]])
    for i in trange(len(sigmas) - 1, disable=disable):
        denoised = model(x, sigmas[i] * s_in, **extra_args)
-        sigma_down, sigma_up = get_ancestral_step(sigmas[i], sigmas[i + 1], eta=eta)
        downstep_ratio = 1 + (sigmas[i+1]/sigmas[i] - 1) * eta
        sigma_down = sigmas[i+1] * downstep_ratio
        alpha_ip1 = 1 - sigmas[i+1]
--- a/comfy/latent_formats.py
+++ b/comfy/latent_formats.py
@@ -352,3 +352,27 @@ class LTXV(LatentFormat):
        ]

        self.latent_rgb_factors_bias = [-0.0571, -0.1657, -0.2512]
+
+class HunyuanVideo(LatentFormat):
+    latent_channels = 16
+    scale_factor = 0.476986
+    latent_rgb_factors = [
+        [-0.0395, -0.0331,  0.0445],
+        [ 0.0696,  0.0795,  0.0518],
+        [ 0.0135, -0.0945, -0.0282],
+        [ 0.0108, -0.0250, -0.0765],
+        [-0.0209,  0.0032,  0.0224],
+        [-0.0804, -0.0254, -0.0639],
+        [-0.0991,  0.0271, -0.0669],
+        [-0.0646, -0.0422, -0.0400],
+        [-0.0696, -0.0595, -0.0894],
+        [-0.0799, -0.0208, -0.0375],
+        [ 0.1166,  0.1627,  0.0962],
+        [ 0.1165,  0.0432,  0.0407],
+        [-0.2315, -0.1920, -0.1355],
+        [-0.0270,  0.0401, -0.0821],
+        [-0.0616, -0.0997, -0.0727],
+        [ 0.0249, -0.0469, -0.1703]
+    ]
+
+    latent_rgb_factors_bias = [ 0.0259, -0.0192, -0.0761]
--- a/comfy/ldm/audio/autoencoder.py
+++ b/comfy/ldm/audio/autoencoder.py
@@ -2,7 +2,7 @@

 import torch
 from torch import nn
-from typing import Literal, Dict, Any
+from typing import Literal
 import math
 import comfy.ops
 ops = comfy.ops.disable_weight_init
@@ -97,7 +97,7 @@ def get_activation(activation: Literal["elu", "snake", "none"], antialias=False,
        raise ValueError(f"Unknown activation {activation}")

    if antialias:
-        act = Activation1d(act)
+        act = Activation1d(act)  # noqa: F821 Activation1d is not defined

    return act

--- a/comfy/ldm/audio/dit.py
+++ b/comfy/ldm/audio/dit.py
@@ -158,7 +158,6 @@ class RotaryEmbedding(nn.Module):
    def forward(self, t):
        # device = self.inv_freq.device
        device = t.device
-        dtype = t.dtype

        # t = t.to(torch.float32)

@@ -170,7 +169,7 @@ class RotaryEmbedding(nn.Module):
        if self.scale is None:
            return freqs, 1.

-        power = (torch.arange(seq_len, device = device) - (seq_len // 2)) / self.scale_base
+        power = (torch.arange(seq_len, device = device) - (seq_len // 2)) / self.scale_base  # noqa: F821 seq_len is not defined
        scale = comfy.ops.cast_to_input(self.scale, t) ** rearrange(power, 'n -> n 1')
        scale = torch.cat((scale, scale), dim = -1)

@@ -229,9 +228,9 @@ class FeedForward(nn.Module):
            linear_in = GLU(dim, inner_dim, activation, dtype=dtype, device=device, operations=operations)
        else:
            linear_in = nn.Sequential(
-                Rearrange('b n d -> b d n') if use_conv else nn.Identity(),
+                rearrange('b n d -> b d n') if use_conv else nn.Identity(),
                operations.Linear(dim, inner_dim, bias = not no_bias, dtype=dtype, device=device) if not use_conv else operations.Conv1d(dim, inner_dim, conv_kernel_size, padding = (conv_kernel_size // 2), bias = not no_bias, dtype=dtype, device=device),
-                Rearrange('b n d -> b d n') if use_conv else nn.Identity(),
+                rearrange('b n d -> b d n') if use_conv else nn.Identity(),
                activation
            )

@@ -246,9 +245,9 @@ class FeedForward(nn.Module):

        self.ff = nn.Sequential(
            linear_in,
-            Rearrange('b d n -> b n d') if use_conv else nn.Identity(),
+            rearrange('b d n -> b n d') if use_conv else nn.Identity(),
            linear_out,
-            Rearrange('b n d -> b d n') if use_conv else nn.Identity(),
+            rearrange('b n d -> b d n') if use_conv else nn.Identity(),
        )

    def forward(self, x):
@@ -346,18 +345,13 @@ class Attention(nn.Module):

        # determine masking
        masks = []
-        final_attn_mask = None # The mask that will be applied to the attention matrix, taking all masks into account

        if input_mask is not None:
            input_mask = rearrange(input_mask, 'b j -> b 1 1 j')
            masks.append(~input_mask)

        # Other masks will be added here later
-
-        if len(masks) > 0:
-            final_attn_mask = ~or_reduce(masks)
-
-        n, device = q.shape[-2], q.device
+        n = q.shape[-2]

        causal = self.causal if causal is None else causal

--- a/comfy/ldm/audio/embedders.py
+++ b/comfy/ldm/audio/embedders.py
@@ -2,8 +2,8 @@

 import torch
 import torch.nn as nn
-from torch import Tensor, einsum
-from typing import Any, Callable, Dict, List, Optional, Sequence, Tuple, TypeVar, Union
+from torch import Tensor
+from typing import List, Union
 from einops import rearrange
 import math
 import comfy.ops
--- a/comfy/ldm/aura/mmdit.py
+++ b/comfy/ldm/aura/mmdit.py
@@ -147,7 +147,6 @@ class DoubleAttention(nn.Module):

        bsz, seqlen1, _ = c.shape
        bsz, seqlen2, _ = x.shape
-        seqlen = seqlen1 + seqlen2

        cq, ck, cv = self.w1q(c), self.w1k(c), self.w1v(c)
        cq = cq.view(bsz, seqlen1, self.n_heads, self.head_dim)
--- a/comfy/ldm/cascade/controlnet.py
+++ b/comfy/ldm/cascade/controlnet.py
@@ -16,7 +16,6 @@
    along with this program.  If not, see <https://www.gnu.org/licenses/>.
 """

-import torch
 import torchvision
 from torch import nn
 from .common import LayerNorm2d_op
--- a/comfy/ldm/common_dit.py
+++ b/comfy/ldm/common_dit.py
@@ -4,9 +4,12 @@ import comfy.ops
 def pad_to_patch_size(img, patch_size=(2, 2), padding_mode="circular"):
    if padding_mode == "circular" and (torch.jit.is_tracing() or torch.jit.is_scripting()):
        padding_mode = "reflect"
-    pad_h = (patch_size[0] - img.shape[-2] % patch_size[0]) % patch_size[0]
-    pad_w = (patch_size[1] - img.shape[-1] % patch_size[1]) % patch_size[1]
-    return torch.nn.functional.pad(img, (0, pad_w, 0, pad_h), mode=padding_mode)
+
+    pad = ()
+    for i in range(img.ndim - 2):
+        pad = (0, (patch_size[i] - img.shape[i + 2] % patch_size[i]) % patch_size[i]) + pad
+
+    return torch.nn.functional.pad(img, pad, mode=padding_mode)

 try:
    rms_norm_torch = torch.nn.functional.rms_norm
--- a/comfy/ldm/flux/controlnet.py
+++ b/comfy/ldm/flux/controlnet.py
@@ -6,9 +6,7 @@ import math
 from torch import Tensor, nn
 from einops import rearrange, repeat

-from .layers import (DoubleStreamBlock, EmbedND, LastLayer,
-                                 MLPEmbedder, SingleStreamBlock,
-                                 timestep_embedding)
+from .layers import (timestep_embedding)

 from .model import Flux
 import comfy.ldm.common_dit
--- a/comfy/ldm/flux/layers.py
+++ b/comfy/ldm/flux/layers.py
@@ -114,7 +114,7 @@ class Modulation(nn.Module):


 class DoubleStreamBlock(nn.Module):
-    def __init__(self, hidden_size: int, num_heads: int, mlp_ratio: float, qkv_bias: bool = False, dtype=None, device=None, operations=None):
+    def __init__(self, hidden_size: int, num_heads: int, mlp_ratio: float, qkv_bias: bool = False, flipped_img_txt=False, dtype=None, device=None, operations=None):
        super().__init__()

        mlp_hidden_dim = int(hidden_size * mlp_ratio)
@@ -141,8 +141,9 @@ class DoubleStreamBlock(nn.Module):
            nn.GELU(approximate="tanh"),
            operations.Linear(mlp_hidden_dim, hidden_size, bias=True, dtype=dtype, device=device),
        )
+        self.flipped_img_txt = flipped_img_txt

-    def forward(self, img: Tensor, txt: Tensor, vec: Tensor, pe: Tensor):
+    def forward(self, img: Tensor, txt: Tensor, vec: Tensor, pe: Tensor, attn_mask=None):
        img_mod1, img_mod2 = self.img_mod(vec)
        txt_mod1, txt_mod2 = self.txt_mod(vec)

@@ -160,12 +161,22 @@ class DoubleStreamBlock(nn.Module):
        txt_q, txt_k, txt_v = txt_qkv.view(txt_qkv.shape[0], txt_qkv.shape[1], 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
        txt_q, txt_k = self.txt_attn.norm(txt_q, txt_k, txt_v)

-        # run actual attention
-        attn = attention(torch.cat((txt_q, img_q), dim=2),
-                         torch.cat((txt_k, img_k), dim=2),
-                         torch.cat((txt_v, img_v), dim=2), pe=pe)
+        if self.flipped_img_txt:
+            # run actual attention
+            attn = attention(torch.cat((img_q, txt_q), dim=2),
+                             torch.cat((img_k, txt_k), dim=2),
+                             torch.cat((img_v, txt_v), dim=2),
+                             pe=pe, mask=attn_mask)

-        txt_attn, img_attn = attn[:, : txt.shape[1]], attn[:, txt.shape[1] :]
+            img_attn, txt_attn = attn[:, : img.shape[1]], attn[:, img.shape[1]:]
+        else:
+            # run actual attention
+            attn = attention(torch.cat((txt_q, img_q), dim=2),
+                             torch.cat((txt_k, img_k), dim=2),
+                             torch.cat((txt_v, img_v), dim=2),
+                             pe=pe, mask=attn_mask)
+
+            txt_attn, img_attn = attn[:, : txt.shape[1]], attn[:, txt.shape[1]:]

        # calculate the img bloks
        img = img + img_mod1.gate * self.img_attn.proj(img_attn)
@@ -217,7 +228,7 @@ class SingleStreamBlock(nn.Module):
        self.mlp_act = nn.GELU(approximate="tanh")
        self.modulation = Modulation(hidden_size, double=False, dtype=dtype, device=device, operations=operations)

-    def forward(self, x: Tensor, vec: Tensor, pe: Tensor) -> Tensor:
+    def forward(self, x: Tensor, vec: Tensor, pe: Tensor, attn_mask=None) -> Tensor:
        mod, _ = self.modulation(vec)
        x_mod = (1 + mod.scale) * self.pre_norm(x) + mod.shift
        qkv, mlp = torch.split(self.linear1(x_mod), [3 * self.hidden_size, self.mlp_hidden_dim], dim=-1)
@@ -226,7 +237,7 @@ class SingleStreamBlock(nn.Module):
        q, k = self.norm(q, k, v)

        # compute attention
-        attn = attention(q, k, v, pe=pe)
+        attn = attention(q, k, v, pe=pe, mask=attn_mask)
        # compute activation in mlp stream, cat again and run second linear layer
        output = self.linear2(torch.cat((attn, self.mlp_act(mlp)), 2))
        x += mod.gate * output
--- a/comfy/ldm/flux/math.py
+++ b/comfy/ldm/flux/math.py
@@ -1,14 +1,15 @@
 import torch
 from einops import rearrange
 from torch import Tensor
+
 from comfy.ldm.modules.attention import optimized_attention
 import comfy.model_management

-def attention(q: Tensor, k: Tensor, v: Tensor, pe: Tensor) -> Tensor:
+def attention(q: Tensor, k: Tensor, v: Tensor, pe: Tensor, mask=None) -> Tensor:
    q, k = apply_rope(q, k, pe)

    heads = q.shape[1]
-    x = optimized_attention(q, k, v, heads, skip_reshape=True)
+    x = optimized_attention(q, k, v, heads, skip_reshape=True, mask=mask)
    return x


@@ -33,3 +34,4 @@ def apply_rope(xq: Tensor, xk: Tensor, freqs_cis: Tensor):
    xq_out = freqs_cis[..., 0] * xq_[..., 0] + freqs_cis[..., 1] * xq_[..., 1]
    xk_out = freqs_cis[..., 0] * xk_[..., 0] + freqs_cis[..., 1] * xk_[..., 1]
    return xq_out.reshape(*xq.shape).type_as(xq), xk_out.reshape(*xk.shape).type_as(xk)
+
--- a/comfy/ldm/flux/model.py
+++ b/comfy/ldm/flux/model.py
@@ -4,6 +4,8 @@ from dataclasses import dataclass

 import torch
 from torch import Tensor, nn
+from einops import rearrange, repeat
+import comfy.ldm.common_dit

 from .layers import (
    DoubleStreamBlock,
@@ -14,9 +16,6 @@ from .layers import (
    timestep_embedding,
 )

-from einops import rearrange, repeat
-import comfy.ldm.common_dit
-
@dataclass
 class FluxParams:
    in_channels: int
@@ -98,8 +97,9 @@ class Flux(nn.Module):
        timesteps: Tensor,
        y: Tensor,
        guidance: Tensor = None,
-        control=None,
+        control = None,
        transformer_options={},
+        attn_mask: Tensor = None,
    ) -> Tensor:
        patches_replace = transformer_options.get("patches_replace", {})
        if img.ndim != 3 or txt.ndim != 3:
@@ -124,14 +124,27 @@ class Flux(nn.Module):
            if ("double_block", i) in blocks_replace:
                def block_wrap(args):
                    out = {}
-                    out["img"], out["txt"] = block(img=args["img"], txt=args["txt"], vec=args["vec"], pe=args["pe"])
+                    out["img"], out["txt"] = block(img=args["img"],
+                                                   txt=args["txt"],
+                                                   vec=args["vec"],
+                                                   pe=args["pe"],
+                                                   attn_mask=args.get("attn_mask"))
                    return out

-                out = blocks_replace[("double_block", i)]({"img": img, "txt": txt, "vec": vec, "pe": pe}, {"original_block": block_wrap})
+                out = blocks_replace[("double_block", i)]({"img": img,
+                                                           "txt": txt,
+                                                           "vec": vec,
+                                                           "pe": pe,
+                                                           "attn_mask": attn_mask},
+                                                          {"original_block": block_wrap})
                txt = out["txt"]
                img = out["img"]
            else:
-                img, txt = block(img=img, txt=txt, vec=vec, pe=pe)
+                img, txt = block(img=img,
+                                 txt=txt,
+                                 vec=vec,
+                                 pe=pe,
+                                 attn_mask=attn_mask)

            if control is not None: # Controlnet
                control_i = control.get("input")
@@ -146,13 +159,20 @@ class Flux(nn.Module):
            if ("single_block", i) in blocks_replace:
                def block_wrap(args):
                    out = {}
-                    out["img"] = block(args["img"], vec=args["vec"], pe=args["pe"])
+                    out["img"] = block(args["img"],
+                                       vec=args["vec"],
+                                       pe=args["pe"],
+                                       attn_mask=args.get("attn_mask"))
                    return out

-                out = blocks_replace[("single_block", i)]({"img": img, "vec": vec, "pe": pe}, {"original_block": block_wrap})
+                out = blocks_replace[("single_block", i)]({"img": img,
+                                                           "vec": vec,
+                                                           "pe": pe,
+                                                           "attn_mask": attn_mask}, 
+                                                          {"original_block": block_wrap})
                img = out["img"]
            else:
-                img = block(img, vec=vec, pe=pe)
+                img = block(img, vec=vec, pe=pe, attn_mask=attn_mask)

            if control is not None: # Controlnet
                control_o = control.get("output")
@@ -181,5 +201,5 @@ class Flux(nn.Module):
        img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs)

        txt_ids = torch.zeros((bs, context.shape[1], 3), device=x.device, dtype=x.dtype)
-        out = self.forward_orig(img, img_ids, context, txt_ids, timestep, y, guidance, control, transformer_options)
+        out = self.forward_orig(img, img_ids, context, txt_ids, timestep, y, guidance, control, transformer_options, attn_mask=kwargs.get("attention_mask", None))
        return rearrange(out, "b (h w) (c ph pw) -> b c (h ph) (w pw)", h=h_len, w=w_len, ph=2, pw=2)[:,:,:h,:w]
--- a/comfy/ldm/genmo/joint_model/asymm_models_joint.py
+++ b/comfy/ldm/genmo/joint_model/asymm_models_joint.py
@@ -461,8 +461,6 @@ class AsymmDiTJoint(nn.Module):
        pH, pW = H // self.patch_size, W // self.patch_size
        x = self.embed_x(x)  # (B, N, D), where N = T * H * W / patch_size ** 2
        assert x.ndim == 3
-        B = x.size(0)
-

        pH, pW = H // self.patch_size, W // self.patch_size
        N = T * pH * pW
--- a/comfy/ldm/genmo/joint_model/utils.py
+++ b/comfy/ldm/genmo/joint_model/utils.py
@@ -1,7 +1,7 @@
 #original code from https://github.com/genmoai/models under apache 2.0 license
 #adapted to ComfyUI

-from typing import Optional, Tuple
+from typing import Optional

 import torch
 import torch.nn as nn
--- a/comfy/ldm/genmo/vae/model.py
+++ b/comfy/ldm/genmo/vae/model.py
@@ -1,7 +1,7 @@
 #original code from https://github.com/genmoai/models under apache 2.0 license
 #adapted to ComfyUI

-from typing import Callable, List, Optional, Tuple, Union
+from typing import List, Optional, Tuple, Union
 from functools import partial
 import math

--- a/comfy/ldm/hunyuan_video/model.py
+++ b/comfy/ldm/hunyuan_video/model.py
@@ -0,0 +1,330 @@
+#Based on Flux code because of weird hunyuan video code license.
+
+import torch
+import comfy.ldm.flux.layers
+import comfy.ldm.modules.diffusionmodules.mmdit
+from comfy.ldm.modules.attention import optimized_attention
+
+
+from dataclasses import dataclass
+from einops import repeat
+
+from torch import Tensor, nn
+
+from comfy.ldm.flux.layers import (
+    DoubleStreamBlock,
+    EmbedND,
+    LastLayer,
+    MLPEmbedder,
+    SingleStreamBlock,
+    timestep_embedding
+)
+
+import comfy.ldm.common_dit
+
+
+@dataclass
+class HunyuanVideoParams:
+    in_channels: int
+    out_channels: int
+    vec_in_dim: int
+    context_in_dim: int
+    hidden_size: int
+    mlp_ratio: float
+    num_heads: int
+    depth: int
+    depth_single_blocks: int
+    axes_dim: list
+    theta: int
+    patch_size: list
+    qkv_bias: bool
+    guidance_embed: bool
+
+
+class SelfAttentionRef(nn.Module):
+    def __init__(self, dim: int, qkv_bias: bool = False, dtype=None, device=None, operations=None):
+        super().__init__()
+        self.qkv = operations.Linear(dim, dim * 3, bias=qkv_bias, dtype=dtype, device=device)
+        self.proj = operations.Linear(dim, dim, dtype=dtype, device=device)
+
+
+class TokenRefinerBlock(nn.Module):
+    def __init__(
+        self,
+        hidden_size,
+        heads,
+        dtype=None,
+        device=None,
+        operations=None
+    ):
+        super().__init__()
+        self.heads = heads
+        mlp_hidden_dim = hidden_size * 4
+
+        self.adaLN_modulation = nn.Sequential(
+            nn.SiLU(),
+            operations.Linear(hidden_size, 2 * hidden_size, bias=True, dtype=dtype, device=device),
+        )
+
+        self.norm1 = operations.LayerNorm(hidden_size, elementwise_affine=True, eps=1e-6, dtype=dtype, device=device)
+        self.self_attn = SelfAttentionRef(hidden_size, True, dtype=dtype, device=device, operations=operations)
+
+        self.norm2 = operations.LayerNorm(hidden_size, elementwise_affine=True, eps=1e-6, dtype=dtype, device=device)
+
+        self.mlp = nn.Sequential(
+            operations.Linear(hidden_size, mlp_hidden_dim, bias=True, dtype=dtype, device=device),
+            nn.SiLU(),
+            operations.Linear(mlp_hidden_dim, hidden_size, bias=True, dtype=dtype, device=device),
+        )
+
+    def forward(self, x, c, mask):
+        mod1, mod2 = self.adaLN_modulation(c).chunk(2, dim=1)
+
+        norm_x = self.norm1(x)
+        qkv = self.self_attn.qkv(norm_x)
+        q, k, v = qkv.reshape(qkv.shape[0], qkv.shape[1], 3, self.heads, -1).permute(2, 0, 3, 1, 4)
+        attn = optimized_attention(q, k, v, self.heads, mask=mask, skip_reshape=True)
+
+        x = x + self.self_attn.proj(attn) * mod1.unsqueeze(1)
+        x = x + self.mlp(self.norm2(x)) * mod2.unsqueeze(1)
+        return x
+
+
+class IndividualTokenRefiner(nn.Module):
+    def __init__(
+        self,
+        hidden_size,
+        heads,
+        num_blocks,
+        dtype=None,
+        device=None,
+        operations=None
+    ):
+        super().__init__()
+        self.blocks = nn.ModuleList(
+            [
+                TokenRefinerBlock(
+                    hidden_size=hidden_size,
+                    heads=heads,
+                    dtype=dtype,
+                    device=device,
+                    operations=operations
+                )
+                for _ in range(num_blocks)
+            ]
+        )
+
+    def forward(self, x, c, mask):
+        m = None
+        if mask is not None:
+            m = mask.view(mask.shape[0], 1, 1, mask.shape[1]).repeat(1, 1, mask.shape[1], 1)
+            m = m + m.transpose(2, 3)
+
+        for block in self.blocks:
+            x = block(x, c, m)
+        return x
+
+
+
+class TokenRefiner(nn.Module):
+    def __init__(
+        self,
+        text_dim,
+        hidden_size,
+        heads,
+        num_blocks,
+        dtype=None,
+        device=None,
+        operations=None
+    ):
+        super().__init__()
+
+        self.input_embedder = operations.Linear(text_dim, hidden_size, bias=True, dtype=dtype, device=device)
+        self.t_embedder = MLPEmbedder(256, hidden_size, dtype=dtype, device=device, operations=operations)
+        self.c_embedder = MLPEmbedder(text_dim, hidden_size, dtype=dtype, device=device, operations=operations)
+        self.individual_token_refiner = IndividualTokenRefiner(hidden_size, heads, num_blocks, dtype=dtype, device=device, operations=operations)
+
+    def forward(
+        self,
+        x,
+        timesteps,
+        mask,
+    ):
+        t = self.t_embedder(timestep_embedding(timesteps, 256, time_factor=1.0).to(x.dtype))
+        # m = mask.float().unsqueeze(-1)
+        # c = (x.float() * m).sum(dim=1) / m.sum(dim=1) #TODO: the following works when the x.shape is the same length as the tokens but might break otherwise
+        c = x.sum(dim=1) / x.shape[1]
+
+        c = t + self.c_embedder(c.to(x.dtype))
+        x = self.input_embedder(x)
+        x = self.individual_token_refiner(x, c, mask)
+        return x
+
+class HunyuanVideo(nn.Module):
+    """
+    Transformer model for flow matching on sequences.
+    """
+
+    def __init__(self, image_model=None, final_layer=True, dtype=None, device=None, operations=None, **kwargs):
+        super().__init__()
+        self.dtype = dtype
+        params = HunyuanVideoParams(**kwargs)
+        self.params = params
+        self.patch_size = params.patch_size
+        self.in_channels = params.in_channels
+        self.out_channels = params.out_channels
+        if params.hidden_size % params.num_heads != 0:
+            raise ValueError(
+                f"Hidden size {params.hidden_size} must be divisible by num_heads {params.num_heads}"
+            )
+        pe_dim = params.hidden_size // params.num_heads
+        if sum(params.axes_dim) != pe_dim:
+            raise ValueError(f"Got {params.axes_dim} but expected positional dim {pe_dim}")
+        self.hidden_size = params.hidden_size
+        self.num_heads = params.num_heads
+        self.pe_embedder = EmbedND(dim=pe_dim, theta=params.theta, axes_dim=params.axes_dim)
+
+        self.img_in = comfy.ldm.modules.diffusionmodules.mmdit.PatchEmbed(None, self.patch_size, self.in_channels, self.hidden_size, conv3d=True, dtype=dtype, device=device, operations=operations)
+        self.time_in = MLPEmbedder(in_dim=256, hidden_dim=self.hidden_size, dtype=dtype, device=device, operations=operations)
+        self.vector_in = MLPEmbedder(params.vec_in_dim, self.hidden_size, dtype=dtype, device=device, operations=operations)
+        self.guidance_in = (
+            MLPEmbedder(in_dim=256, hidden_dim=self.hidden_size, dtype=dtype, device=device, operations=operations) if params.guidance_embed else nn.Identity()
+        )
+
+        self.txt_in = TokenRefiner(params.context_in_dim, self.hidden_size, self.num_heads, 2, dtype=dtype, device=device, operations=operations)
+
+        self.double_blocks = nn.ModuleList(
+            [
+                DoubleStreamBlock(
+                    self.hidden_size,
+                    self.num_heads,
+                    mlp_ratio=params.mlp_ratio,
+                    qkv_bias=params.qkv_bias,
+                    flipped_img_txt=True,
+                    dtype=dtype, device=device, operations=operations
+                )
+                for _ in range(params.depth)
+            ]
+        )
+
+        self.single_blocks = nn.ModuleList(
+            [
+                SingleStreamBlock(self.hidden_size, self.num_heads, mlp_ratio=params.mlp_ratio, dtype=dtype, device=device, operations=operations)
+                for _ in range(params.depth_single_blocks)
+            ]
+        )
+
+        if final_layer:
+            self.final_layer = LastLayer(self.hidden_size, self.patch_size[-1], self.out_channels, dtype=dtype, device=device, operations=operations)
+
+    def forward_orig(
+        self,
+        img: Tensor,
+        img_ids: Tensor,
+        txt: Tensor,
+        txt_ids: Tensor,
+        txt_mask: Tensor,
+        timesteps: Tensor,
+        y: Tensor,
+        guidance: Tensor = None,
+        control=None,
+        transformer_options={},
+    ) -> Tensor:
+        patches_replace = transformer_options.get("patches_replace", {})
+
+        initial_shape = list(img.shape)
+        # running on sequences img
+        img = self.img_in(img)
+        vec = self.time_in(timestep_embedding(timesteps, 256, time_factor=1.0).to(img.dtype))
+
+        vec = vec + self.vector_in(y[:, :self.params.vec_in_dim])
+
+        if self.params.guidance_embed:
+            if guidance is None:
+                raise ValueError("Didn't get guidance strength for guidance distilled model.")
+            vec = vec + self.guidance_in(timestep_embedding(guidance, 256).to(img.dtype))
+
+        if txt_mask is not None and not torch.is_floating_point(txt_mask):
+            txt_mask = (txt_mask - 1).to(img.dtype) * torch.finfo(img.dtype).max
+
+        txt = self.txt_in(txt, timesteps, txt_mask)
+
+        ids = torch.cat((img_ids, txt_ids), dim=1)
+        pe = self.pe_embedder(ids)
+
+        img_len = img.shape[1]
+        if txt_mask is not None:
+            attn_mask_len = img_len + txt.shape[1]
+            attn_mask = torch.zeros((1, 1, attn_mask_len), dtype=img.dtype, device=img.device)
+            attn_mask[:, 0, img_len:] = txt_mask
+        else:
+            attn_mask = None
+
+        blocks_replace = patches_replace.get("dit", {})
+        for i, block in enumerate(self.double_blocks):
+            if ("double_block", i) in blocks_replace:
+                def block_wrap(args):
+                    out = {}
+                    out["img"], out["txt"] = block(img=args["img"], txt=args["txt"], vec=args["vec"], pe=args["pe"], attn_mask=args["attention_mask"])
+                    return out
+
+                out = blocks_replace[("double_block", i)]({"img": img, "txt": txt, "vec": vec, "pe": pe, "attention_mask": attn_mask}, {"original_block": block_wrap})
+                txt = out["txt"]
+                img = out["img"]
+            else:
+                img, txt = block(img=img, txt=txt, vec=vec, pe=pe, attn_mask=attn_mask)
+
+            if control is not None: # Controlnet
+                control_i = control.get("input")
+                if i < len(control_i):
+                    add = control_i[i]
+                    if add is not None:
+                        img += add
+
+        img = torch.cat((img, txt), 1)
+
+        for i, block in enumerate(self.single_blocks):
+            if ("single_block", i) in blocks_replace:
+                def block_wrap(args):
+                    out = {}
+                    out["img"] = block(args["img"], vec=args["vec"], pe=args["pe"], attn_mask=args["attention_mask"])
+                    return out
+
+                out = blocks_replace[("single_block", i)]({"img": img, "vec": vec, "pe": pe, "attention_mask": attn_mask}, {"original_block": block_wrap})
+                img = out["img"]
+            else:
+                img = block(img, vec=vec, pe=pe, attn_mask=attn_mask)
+
+            if control is not None: # Controlnet
+                control_o = control.get("output")
+                if i < len(control_o):
+                    add = control_o[i]
+                    if add is not None:
+                        img[:, : img_len] += add
+
+        img = img[:, : img_len]
+
+        img = self.final_layer(img, vec)  # (N, T, patch_size ** 2 * out_channels)
+
+        shape = initial_shape[-3:]
+        for i in range(len(shape)):
+            shape[i] = shape[i] // self.patch_size[i]
+        img = img.reshape([img.shape[0]] + shape + [self.out_channels] + self.patch_size)
+        img = img.permute(0, 4, 1, 5, 2, 6, 3, 7)
+        img = img.reshape(initial_shape)
+        return img
+
+    def forward(self, x, timestep, context, y, guidance, attention_mask=None, control=None, transformer_options={}, **kwargs):
+        bs, c, t, h, w = x.shape
+        patch_size = self.patch_size
+        t_len = ((t + (patch_size[0] // 2)) // patch_size[0])
+        h_len = ((h + (patch_size[1] // 2)) // patch_size[1])
+        w_len = ((w + (patch_size[2] // 2)) // patch_size[2])
+        img_ids = torch.zeros((t_len, h_len, w_len, 3), device=x.device, dtype=x.dtype)
+        img_ids[:, :, :, 0] = img_ids[:, :, :, 0] + torch.linspace(0, t_len - 1, steps=t_len, device=x.device, dtype=x.dtype).reshape(-1, 1, 1)
+        img_ids[:, :, :, 1] = img_ids[:, :, :, 1] + torch.linspace(0, h_len - 1, steps=h_len, device=x.device, dtype=x.dtype).reshape(1, -1, 1)
+        img_ids[:, :, :, 2] = img_ids[:, :, :, 2] + torch.linspace(0, w_len - 1, steps=w_len, device=x.device, dtype=x.dtype).reshape(1, 1, -1)
+        img_ids = repeat(img_ids, "t h w c -> b (t h w) c", b=bs)
+        txt_ids = torch.zeros((bs, context.shape[1], 3), device=x.device, dtype=x.dtype)
+        out = self.forward_orig(x, img_ids, context, txt_ids, attention_mask, timestep, y, guidance, control, transformer_options)
+        return out
--- a/comfy/ldm/hydit/controlnet.py
+++ b/comfy/ldm/hydit/controlnet.py
@@ -1,24 +1,17 @@
-from typing import Any, Optional

 import torch
 import torch.nn as nn
-import torch.nn.functional as F

-from torch.utils import checkpoint

 from comfy.ldm.modules.diffusionmodules.mmdit import (
-    Mlp,
    TimestepEmbedder,
    PatchEmbed,
-    RMSNorm,
 )
-from comfy.ldm.modules.diffusionmodules.util import timestep_embedding
 from .poolers import AttentionPool

 import comfy.latent_formats
 from .models import HunYuanDiTBlock, calc_rope

-from .posemb_layers import get_2d_rotary_pos_embed, get_fill_resize_and_crop


 class HunYuanControlNet(nn.Module):
@@ -171,9 +164,6 @@ class HunYuanControlNet(nn.Module):
            ),
        )

-        # Image embedding
-        num_patches = self.x_embedder.num_patches
-
        # HUnYuanDiT Blocks
        self.blocks = nn.ModuleList(
            [
--- a/comfy/ldm/hydit/models.py
+++ b/comfy/ldm/hydit/models.py
@@ -1,8 +1,6 @@
-from typing import Any

 import torch
 import torch.nn as nn
-import torch.nn.functional as F

 import comfy.ops
 from comfy.ldm.modules.diffusionmodules.mmdit import Mlp, TimestepEmbedder, PatchEmbed, RMSNorm
@@ -250,9 +248,6 @@ class HunYuanDiT(nn.Module):
            operations.Linear(hidden_size * 4, hidden_size, bias=True, dtype=dtype, device=device),
        )

-        # Image embedding
-        num_patches = self.x_embedder.num_patches
-
        # HUnYuanDiT Blocks
        self.blocks = nn.ModuleList([
            HunYuanDiTBlock(hidden_size=hidden_size,
--- a/comfy/ldm/hydit/poolers.py
+++ b/comfy/ldm/hydit/poolers.py
@@ -1,6 +1,5 @@
 import torch
 import torch.nn as nn
-import torch.nn.functional as F
 from comfy.ldm.modules.attention import optimized_attention
 import comfy.ops

--- a/comfy/ldm/lightricks/model.py
+++ b/comfy/ldm/lightricks/model.py
@@ -379,6 +379,7 @@ class LTXVModel(torch.nn.Module):
                 positional_embedding_max_pos=[20, 2048, 2048],
                 dtype=None, device=None, operations=None, **kwargs):
        super().__init__()
+        self.generator = None
        self.dtype = dtype
        self.out_channels = in_channels
        self.inner_dim = num_attention_heads * attention_head_dim
@@ -415,7 +416,7 @@ class LTXVModel(torch.nn.Module):

        self.patchifier = SymmetricPatchifier(1)

-    def forward(self, x, timestep, context, attention_mask, frame_rate=25, guiding_latent=None, transformer_options={}, **kwargs):
+    def forward(self, x, timestep, context, attention_mask, frame_rate=25, guiding_latent=None, guiding_latent_noise_scale=0, transformer_options={}, **kwargs):
        patches_replace = transformer_options.get("patches_replace", {})

        indices_grid = self.patchifier.get_grid(
@@ -431,10 +432,22 @@ class LTXVModel(torch.nn.Module):
            ts = torch.ones([x.shape[0], 1, x.shape[2], x.shape[3], x.shape[4]], device=x.device, dtype=x.dtype)
            input_ts = timestep.view([timestep.shape[0]] + [1] * (x.ndim - 1))
            ts *= input_ts
-            ts[:, :, 0] = 0.0
+            ts[:, :, 0] = guiding_latent_noise_scale * (input_ts[:, :, 0] ** 2)
            timestep = self.patchifier.patchify(ts)
            input_x = x.clone()
            x[:, :, 0] = guiding_latent[:, :, 0]
+            if guiding_latent_noise_scale > 0:
+                if self.generator is None:
+                    self.generator = torch.Generator(device=x.device).manual_seed(42)
+                elif self.generator.device != x.device:
+                    self.generator = torch.Generator(device=x.device).set_state(self.generator.get_state())
+
+                noise_shape = [guiding_latent.shape[0], guiding_latent.shape[1], 1, guiding_latent.shape[3], guiding_latent.shape[4]]
+                scale = guiding_latent_noise_scale * (input_ts ** 2)
+                guiding_noise = scale * torch.randn(size=noise_shape, device=x.device, generator=self.generator)
+
+                x[:, :, 0] = guiding_noise[:, :, 0] + x[:, :, 0] *  (1.0 - scale[:, :, 0])
+

        orig_shape = list(x.shape)

--- a/comfy/ldm/lightricks/vae/causal_video_autoencoder.py
+++ b/comfy/ldm/lightricks/vae/causal_video_autoencoder.py
@@ -3,7 +3,7 @@ from torch import nn
 from functools import partial
 import math
 from einops import rearrange
-from typing import Any, Mapping, Optional, Tuple, Union, List
+from typing import Optional, Tuple, Union
 from .conv_nd_factory import make_conv_nd, make_linear_nd
 from .pixel_norm import PixelNorm

--- a/comfy/ldm/lightricks/vae/conv_nd_factory.py
+++ b/comfy/ldm/lightricks/vae/conv_nd_factory.py
@@ -1,6 +1,5 @@
 from typing import Tuple, Union

-import torch

 from .dual_conv3d import DualConv3d
 from .causal_conv3d import CausalConv3d
--- a/comfy/ldm/models/autoencoder.py
+++ b/comfy/ldm/models/autoencoder.py
@@ -1,10 +1,12 @@
+import logging
+import math
 import torch
 from contextlib import contextmanager
-from typing import Any, Dict, List, Optional, Tuple, Union
+from typing import Any, Dict, Tuple, Union

 from comfy.ldm.modules.distributions.distributions import DiagonalGaussianDistribution

-from comfy.ldm.util import instantiate_from_config
+from comfy.ldm.util import get_obj_from_str, instantiate_from_config
 from comfy.ldm.modules.ema import LitEma
 import comfy.ops

@@ -52,7 +54,7 @@ class AbstractAutoencoder(torch.nn.Module):

        if self.use_ema:
            self.model_ema = LitEma(self, decay=ema_decay)
-            logpy.info(f"Keeping EMAs of {len(list(self.model_ema.buffers()))}.")
+            logging.info(f"Keeping EMAs of {len(list(self.model_ema.buffers()))}.")

    def get_input(self, batch) -> Any:
        raise NotImplementedError()
@@ -68,14 +70,14 @@ class AbstractAutoencoder(torch.nn.Module):
            self.model_ema.store(self.parameters())
            self.model_ema.copy_to(self)
            if context is not None:
-                logpy.info(f"{context}: Switched to EMA weights")
+                logging.info(f"{context}: Switched to EMA weights")
        try:
            yield None
        finally:
            if self.use_ema:
                self.model_ema.restore(self.parameters())
                if context is not None:
-                    logpy.info(f"{context}: Restored training weights")
+                    logging.info(f"{context}: Restored training weights")

    def encode(self, *args, **kwargs) -> torch.Tensor:
        raise NotImplementedError("encode()-method of abstract base class called")
@@ -84,7 +86,7 @@ class AbstractAutoencoder(torch.nn.Module):
        raise NotImplementedError("decode()-method of abstract base class called")

    def instantiate_optimizer_from_config(self, params, lr, cfg):
-        logpy.info(f"loading >>> {cfg['target']} <<< optimizer from config")
+        logging.info(f"loading >>> {cfg['target']} <<< optimizer from config")
        return get_obj_from_str(cfg["target"])(
            params, lr=lr, **cfg.get("params", dict())
        )
@@ -112,7 +114,7 @@ class AutoencodingEngine(AbstractAutoencoder):

        self.encoder: torch.nn.Module = instantiate_from_config(encoder_config)
        self.decoder: torch.nn.Module = instantiate_from_config(decoder_config)
-        self.regularization: AbstractRegularizer = instantiate_from_config(
+        self.regularization = instantiate_from_config(
            regularizer_config
        )

@@ -160,12 +162,19 @@ class AutoencodingEngineLegacy(AutoencodingEngine):
            },
            **kwargs,
        )
-        self.quant_conv = comfy.ops.disable_weight_init.Conv2d(
+
+        if ddconfig.get("conv3d", False):
+            conv_op = comfy.ops.disable_weight_init.Conv3d
+        else:
+            conv_op = comfy.ops.disable_weight_init.Conv2d
+
+        self.quant_conv = conv_op(
            (1 + ddconfig["double_z"]) * ddconfig["z_channels"],
            (1 + ddconfig["double_z"]) * embed_dim,
            1,
        )
-        self.post_quant_conv = comfy.ops.disable_weight_init.Conv2d(embed_dim, ddconfig["z_channels"], 1)
+
+        self.post_quant_conv = conv_op(embed_dim, ddconfig["z_channels"], 1)
        self.embed_dim = embed_dim

    def get_autoencoder_params(self) -> list:
--- a/comfy/ldm/modules/attention.py
+++ b/comfy/ldm/modules/attention.py
@@ -15,6 +15,9 @@ if model_management.xformers_enabled():
    import xformers
    import xformers.ops

+if model_management.sage_attention_enabled():
+    from sageattention import sageattn
+
 from comfy.cli_args import args
 import comfy.ops
 ops = comfy.ops.disable_weight_init
@@ -157,8 +160,6 @@ def attention_sub_quad(query, key, value, heads, mask=None, attn_precision=None,
        b, _, dim_head = query.shape
        dim_head //= heads

-    scale = dim_head ** -0.5
-
    if skip_reshape:
        query = query.reshape(b * heads, -1, dim_head)
        value = value.reshape(b * heads, -1, dim_head)
@@ -177,9 +178,8 @@ def attention_sub_quad(query, key, value, heads, mask=None, attn_precision=None,
        bytes_per_token = torch.finfo(query.dtype).bits//8
    batch_x_heads, q_tokens, _ = query.shape
    _, _, k_tokens = key.shape
-    qk_matmul_size_bytes = batch_x_heads * bytes_per_token * q_tokens * k_tokens

-    mem_free_total, mem_free_torch = model_management.get_free_memory(query.device, True)
+    mem_free_total, _ = model_management.get_free_memory(query.device, True)

    kv_chunk_size_min = None
    kv_chunk_size = None
@@ -230,7 +230,6 @@ def attention_split(q, k, v, heads, mask=None, attn_precision=None, skip_reshape

    scale = dim_head ** -0.5

-    h = heads
    if skip_reshape:
         q, k, v = map(
            lambda t: t.reshape(b * heads, -1, dim_head),
@@ -344,12 +343,9 @@ except:
    pass

 def attention_xformers(q, k, v, heads, mask=None, attn_precision=None, skip_reshape=False):
-    if skip_reshape:
-        b, _, _, dim_head = q.shape
-    else:
-        b, _, dim_head = q.shape
-        dim_head //= heads
-
+    b = q.shape[0]
+    dim_head = q.shape[-1]
+    # check to make sure xformers isn't broken
    disabled_xformers = False

    if BROKEN_XFORMERS:
@@ -364,35 +360,44 @@ def attention_xformers(q, k, v, heads, mask=None, attn_precision=None, skip_resh
        return attention_pytorch(q, k, v, heads, mask, skip_reshape=skip_reshape)

    if skip_reshape:
-         q, k, v = map(
-            lambda t: t.reshape(b * heads, -1, dim_head),
+        # b h k d -> b k h d
+        q, k, v = map(
+            lambda t: t.permute(0, 2, 1, 3),
            (q, k, v),
        )
+    # actually do the reshaping
    else:
+        dim_head //= heads
        q, k, v = map(
            lambda t: t.reshape(b, -1, heads, dim_head),
            (q, k, v),
        )

    if mask is not None:
+        # add a singleton batch dimension
+        if mask.ndim == 2:
+            mask = mask.unsqueeze(0)
+        # add a singleton heads dimension
+        if mask.ndim == 3:
+            mask = mask.unsqueeze(1)
+        # pad to a multiple of 8
        pad = 8 - mask.shape[-1] % 8
-        mask_out = torch.empty([q.shape[0], q.shape[2], q.shape[1], mask.shape[-1] + pad], dtype=q.dtype, device=q.device)
+        # the xformers docs says that it's allowed to have a mask of shape (1, Nq, Nk)
+        # but when using separated heads, the shape has to be (B, H, Nq, Nk)
+        # in flux, this matrix ends up being over 1GB
+        # here, we create a mask with the same batch/head size as the input mask (potentially singleton or full)
+        mask_out = torch.empty([mask.shape[0], mask.shape[1], q.shape[1], mask.shape[-1] + pad], dtype=q.dtype, device=q.device)
+
        mask_out[..., :mask.shape[-1]] = mask
+        # doesn't this remove the padding again??
        mask = mask_out[..., :mask.shape[-1]]
+        mask = mask.expand(b, heads, -1, -1)

    out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=mask)

-    if skip_reshape:
-        out = (
-            out.unsqueeze(0)
-            .reshape(b, heads, -1, dim_head)
-            .permute(0, 2, 1, 3)
-            .reshape(b, -1, heads * dim_head)
-        )
-    else:
-        out = (
-            out.reshape(b, -1, heads * dim_head)
-        )
+    out = (
+        out.reshape(b, -1, heads * dim_head)
+    )

    return out

@@ -414,32 +419,85 @@ def attention_pytorch(q, k, v, heads, mask=None, attn_precision=None, skip_resha
            (q, k, v),
        )

-    if SDP_BATCH_LIMIT >= q.shape[0]:
+    if mask is not None:
+        # add a batch dimension if there isn't already one
+        if mask.ndim == 2:
+            mask = mask.unsqueeze(0)
+        # add a heads dimension if there isn't already one
+        if mask.ndim == 3:
+            mask = mask.unsqueeze(1)
+
+    if SDP_BATCH_LIMIT >= b:
        out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
        out = (
            out.transpose(1, 2).reshape(b, -1, heads * dim_head)
        )
    else:
-        out = torch.empty((q.shape[0], q.shape[2], heads * dim_head), dtype=q.dtype, layout=q.layout, device=q.device)
-        for i in range(0, q.shape[0], SDP_BATCH_LIMIT):
-            out[i : i + SDP_BATCH_LIMIT] = torch.nn.functional.scaled_dot_product_attention(q[i : i + SDP_BATCH_LIMIT], k[i : i + SDP_BATCH_LIMIT], v[i : i + SDP_BATCH_LIMIT], attn_mask=mask, dropout_p=0.0, is_causal=False).transpose(1, 2).reshape(-1, q.shape[2], heads * dim_head)
+        out = torch.empty((b, q.shape[2], heads * dim_head), dtype=q.dtype, layout=q.layout, device=q.device)
+        for i in range(0, b, SDP_BATCH_LIMIT):
+            m = mask
+            if mask is not None:
+                if mask.shape[0] > 1:
+                    m = mask[i : i + SDP_BATCH_LIMIT]
+
+            out[i : i + SDP_BATCH_LIMIT] = torch.nn.functional.scaled_dot_product_attention(
+                q[i : i + SDP_BATCH_LIMIT],
+                k[i : i + SDP_BATCH_LIMIT],
+                v[i : i + SDP_BATCH_LIMIT],
+                attn_mask=m,
+                dropout_p=0.0, is_causal=False
+            ).transpose(1, 2).reshape(-1, q.shape[2], heads * dim_head)
+    return out
+
+
+def attention_sage(q, k, v, heads, mask=None, attn_precision=None, skip_reshape=False):
+    if skip_reshape:
+        b, _, _, dim_head = q.shape
+        tensor_layout="HND"
+    else:
+        b, _, dim_head = q.shape
+        dim_head //= heads
+        q, k, v = map(
+            lambda t: t.view(b, -1, heads, dim_head),
+            (q, k, v),
+        )
+        tensor_layout="NHD"
+
+    if mask is not None:
+        # add a batch dimension if there isn't already one
+        if mask.ndim == 2:
+            mask = mask.unsqueeze(0)
+        # add a heads dimension if there isn't already one
+        if mask.ndim == 3:
+            mask = mask.unsqueeze(1)
+
+    out = sageattn(q, k, v, attn_mask=mask, is_causal=False, tensor_layout=tensor_layout)
+    if tensor_layout == "HND":
+        out = (
+            out.transpose(1, 2).reshape(b, -1, heads * dim_head)
+        )
+    else:
+        out = out.reshape(b, -1, heads * dim_head)
    return out


 optimized_attention = attention_basic

-if model_management.xformers_enabled():
-    logging.info("Using xformers cross attention")
+if model_management.sage_attention_enabled():
+    logging.info("Using sage attention")
+    optimized_attention = attention_sage
+elif model_management.xformers_enabled():
+    logging.info("Using xformers attention")
    optimized_attention = attention_xformers
 elif model_management.pytorch_attention_enabled():
-    logging.info("Using pytorch cross attention")
+    logging.info("Using pytorch attention")
    optimized_attention = attention_pytorch
 else:
    if args.use_split_cross_attention:
-        logging.info("Using split optimization for cross attention")
+        logging.info("Using split optimization for attention")
        optimized_attention = attention_split
    else:
-        logging.info("Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention")
+        logging.info("Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention")
        optimized_attention = attention_sub_quad

 optimized_attention_masked = optimized_attention
--- a/comfy/ldm/modules/diffusionmodules/mmdit.py
+++ b/comfy/ldm/modules/diffusionmodules/mmdit.py
@@ -1,5 +1,4 @@
-import logging
-import math
+from functools import partial
 from typing import Dict, Optional, List

 import numpy as np
@@ -72,45 +71,33 @@ class PatchEmbed(nn.Module):
            strict_img_size: bool = True,
            dynamic_img_pad: bool = True,
            padding_mode='circular',
+            conv3d=False,
            dtype=None,
            device=None,
            operations=None,
    ):
        super().__init__()
-        self.patch_size = (patch_size, patch_size)
+        try:
+            len(patch_size)
+            self.patch_size = patch_size
+        except:
+            if conv3d:
+                self.patch_size = (patch_size, patch_size, patch_size)
+            else:
+                self.patch_size = (patch_size, patch_size)
        self.padding_mode = padding_mode
-        if img_size is not None:
-            self.img_size = (img_size, img_size)
-            self.grid_size = tuple([s // p for s, p in zip(self.img_size, self.patch_size)])
-            self.num_patches = self.grid_size[0] * self.grid_size[1]
-        else:
-            self.img_size = None
-            self.grid_size = None
-            self.num_patches = None

        # flatten spatial dim and transpose to channels last, kept for bwd compat
        self.flatten = flatten
        self.strict_img_size = strict_img_size
        self.dynamic_img_pad = dynamic_img_pad
-
-        self.proj = operations.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size, bias=bias, dtype=dtype, device=device)
+        if conv3d:
+            self.proj = operations.Conv3d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size, bias=bias, dtype=dtype, device=device)
+        else:
+            self.proj = operations.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size, bias=bias, dtype=dtype, device=device)
        self.norm = norm_layer(embed_dim) if norm_layer else nn.Identity()

    def forward(self, x):
-        # B, C, H, W = x.shape
-        # if self.img_size is not None:
-        #     if self.strict_img_size:
-        #         _assert(H == self.img_size[0], f"Input height ({H}) doesn't match model ({self.img_size[0]}).")
-        #         _assert(W == self.img_size[1], f"Input width ({W}) doesn't match model ({self.img_size[1]}).")
-        #     elif not self.dynamic_img_pad:
-        #         _assert(
-        #             H % self.patch_size[0] == 0,
-        #             f"Input height ({H}) should be divisible by patch size ({self.patch_size[0]})."
-        #         )
-        #         _assert(
-        #             W % self.patch_size[1] == 0,
-        #             f"Input width ({W}) should be divisible by patch size ({self.patch_size[1]})."
-        #         )
        if self.dynamic_img_pad:
            x = comfy.ldm.common_dit.pad_to_patch_size(x, self.patch_size, padding_mode=self.padding_mode)
        x = self.proj(x)
--- a/comfy/ldm/modules/diffusionmodules/model.py
+++ b/comfy/ldm/modules/diffusionmodules/model.py
@@ -3,7 +3,6 @@ import math
 import torch
 import torch.nn as nn
 import numpy as np
-from typing import Optional, Any
 import logging

 from comfy import model_management
@@ -44,51 +43,100 @@ def Normalize(in_channels, num_groups=32):
    return ops.GroupNorm(num_groups=num_groups, num_channels=in_channels, eps=1e-6, affine=True)


+class VideoConv3d(nn.Module):
+    def __init__(self, n_channels, out_channels, kernel_size, stride=1, dilation=1, padding_mode='replicate', padding=1, **kwargs):
+        super().__init__()
+
+        self.padding_mode = padding_mode
+        if padding != 0:
+            padding = (padding, padding, padding, padding, kernel_size - 1, 0)
+        else:
+            kwargs["padding"] = padding
+
+        self.padding = padding
+        self.conv = ops.Conv3d(n_channels, out_channels, kernel_size, stride=stride, dilation=dilation, **kwargs)
+
+    def forward(self, x):
+        if self.padding != 0:
+            x = torch.nn.functional.pad(x, self.padding, mode=self.padding_mode)
+        return self.conv(x)
+
+def interpolate_up(x, scale_factor):
+    try:
+        return torch.nn.functional.interpolate(x, scale_factor=scale_factor, mode="nearest")
+    except: #operation not implemented for bf16
+        orig_shape = list(x.shape)
+        out_shape = orig_shape[:2]
+        for i in range(len(orig_shape) - 2):
+            out_shape.append(round(orig_shape[i + 2] * scale_factor[i]))
+        out = torch.empty(out_shape, dtype=x.dtype, layout=x.layout, device=x.device)
+        split = 8
+        l = out.shape[1] // split
+        for i in range(0, out.shape[1], l):
+            out[:,i:i+l] = torch.nn.functional.interpolate(x[:,i:i+l].to(torch.float32), scale_factor=scale_factor, mode="nearest").to(x.dtype)
+        return out
+
 class Upsample(nn.Module):
-    def __init__(self, in_channels, with_conv):
+    def __init__(self, in_channels, with_conv, conv_op=ops.Conv2d, scale_factor=2.0):
        super().__init__()
        self.with_conv = with_conv
+        self.scale_factor = scale_factor
+
        if self.with_conv:
-            self.conv = ops.Conv2d(in_channels,
+            self.conv = conv_op(in_channels,
                                        in_channels,
                                        kernel_size=3,
                                        stride=1,
                                        padding=1)

    def forward(self, x):
-        try:
-            x = torch.nn.functional.interpolate(x, scale_factor=2.0, mode="nearest")
-        except: #operation not implemented for bf16
-            b, c, h, w = x.shape
-            out = torch.empty((b, c, h*2, w*2), dtype=x.dtype, layout=x.layout, device=x.device)
-            split = 8
-            l = out.shape[1] // split
-            for i in range(0, out.shape[1], l):
-                out[:,i:i+l] = torch.nn.functional.interpolate(x[:,i:i+l].to(torch.float32), scale_factor=2.0, mode="nearest").to(x.dtype)
-            del x
-            x = out
+        scale_factor = self.scale_factor
+        if isinstance(scale_factor, (int, float)):
+            scale_factor = (scale_factor,) * (x.ndim - 2)

+        if x.ndim == 5 and scale_factor[0] > 1.0:
+            t = x.shape[2]
+            if t > 1:
+                a, b = x.split((1, t - 1), dim=2)
+                del x
+                b = interpolate_up(b, scale_factor)
+            else:
+                a = x
+
+            a = interpolate_up(a.squeeze(2), scale_factor=scale_factor[1:]).unsqueeze(2)
+            if t > 1:
+                x = torch.cat((a, b), dim=2)
+            else:
+                x = a
+        else:
+            x = interpolate_up(x, scale_factor)
        if self.with_conv:
            x = self.conv(x)
        return x


 class Downsample(nn.Module):
-    def __init__(self, in_channels, with_conv):
+    def __init__(self, in_channels, with_conv, stride=2, conv_op=ops.Conv2d):
        super().__init__()
        self.with_conv = with_conv
        if self.with_conv:
            # no asymmetric padding in torch conv, must do it ourselves
-            self.conv = ops.Conv2d(in_channels,
+            self.conv = conv_op(in_channels,
                                        in_channels,
                                        kernel_size=3,
-                                        stride=2,
+                                        stride=stride,
                                        padding=0)

    def forward(self, x):
        if self.with_conv:
-            pad = (0,1,0,1)
-            x = torch.nn.functional.pad(x, pad, mode="constant", value=0)
+            if x.ndim == 4:
+                pad = (0, 1, 0, 1)
+                mode = "constant"
+                x = torch.nn.functional.pad(x, pad, mode=mode, value=0)
+            elif x.ndim == 5:
+                pad = (1, 1, 1, 1, 2, 0)
+                mode = "replicate"
+                x = torch.nn.functional.pad(x, pad, mode=mode)
            x = self.conv(x)
        else:
            x = torch.nn.functional.avg_pool2d(x, kernel_size=2, stride=2)
@@ -97,7 +145,7 @@ class Downsample(nn.Module):

 class ResnetBlock(nn.Module):
    def __init__(self, *, in_channels, out_channels=None, conv_shortcut=False,
-                 dropout, temb_channels=512):
+                 dropout, temb_channels=512, conv_op=ops.Conv2d):
        super().__init__()
        self.in_channels = in_channels
        out_channels = in_channels if out_channels is None else out_channels
@@ -106,7 +154,7 @@ class ResnetBlock(nn.Module):

        self.swish = torch.nn.SiLU(inplace=True)
        self.norm1 = Normalize(in_channels)
-        self.conv1 = ops.Conv2d(in_channels,
+        self.conv1 = conv_op(in_channels,
                                     out_channels,
                                     kernel_size=3,
                                     stride=1,
@@ -116,20 +164,20 @@ class ResnetBlock(nn.Module):
                                             out_channels)
        self.norm2 = Normalize(out_channels)
        self.dropout = torch.nn.Dropout(dropout, inplace=True)
-        self.conv2 = ops.Conv2d(out_channels,
+        self.conv2 = conv_op(out_channels,
                                     out_channels,
                                     kernel_size=3,
                                     stride=1,
                                     padding=1)
        if self.in_channels != self.out_channels:
            if self.use_conv_shortcut:
-                self.conv_shortcut = ops.Conv2d(in_channels,
+                self.conv_shortcut = conv_op(in_channels,
                                                     out_channels,
                                                     kernel_size=3,
                                                     stride=1,
                                                     padding=1)
            else:
-                self.nin_shortcut = ops.Conv2d(in_channels,
+                self.nin_shortcut = conv_op(in_channels,
                                                    out_channels,
                                                    kernel_size=1,
                                                    stride=1,
@@ -163,7 +211,6 @@ def slice_attention(q, k, v):

    mem_free_total = model_management.get_free_memory(q.device)

-    gb = 1024 ** 3
    tensor_size = q.shape[0] * q.shape[1] * k.shape[2] * q.element_size()
    modifier = 3 if q.element_size() == 2 else 2.5
    mem_required = tensor_size * modifier
@@ -196,21 +243,25 @@ def slice_attention(q, k, v):

 def normal_attention(q, k, v):
    # compute attention
-    b,c,h,w = q.shape
+    orig_shape = q.shape
+    b = orig_shape[0]
+    c = orig_shape[1]

-    q = q.reshape(b,c,h*w)
-    q = q.permute(0,2,1)   # b,hw,c
-    k = k.reshape(b,c,h*w) # b,c,hw
-    v = v.reshape(b,c,h*w)
+    q = q.reshape(b, c, -1)
+    q = q.permute(0, 2, 1)   # b,hw,c
+    k = k.reshape(b, c, -1) # b,c,hw
+    v = v.reshape(b, c, -1)

    r1 = slice_attention(q, k, v)
-    h_ = r1.reshape(b,c,h,w)
+    h_ = r1.reshape(orig_shape)
    del r1
    return h_

 def xformers_attention(q, k, v):
    # compute attention
-    B, C, H, W = q.shape
+    orig_shape = q.shape
+    B = orig_shape[0]
+    C = orig_shape[1]
    q, k, v = map(
        lambda t: t.view(B, C, -1).transpose(1, 2).contiguous(),
        (q, k, v),
@@ -218,14 +269,16 @@ def xformers_attention(q, k, v):

    try:
        out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None)
-        out = out.transpose(1, 2).reshape(B, C, H, W)
-    except NotImplementedError as e:
-        out = slice_attention(q.view(B, -1, C), k.view(B, -1, C).transpose(1, 2), v.view(B, -1, C).transpose(1, 2)).reshape(B, C, H, W)
+        out = out.transpose(1, 2).reshape(orig_shape)
+    except NotImplementedError:
+        out = slice_attention(q.view(B, -1, C), k.view(B, -1, C).transpose(1, 2), v.view(B, -1, C).transpose(1, 2)).reshape(orig_shape)
    return out

 def pytorch_attention(q, k, v):
    # compute attention
-    B, C, H, W = q.shape
+    orig_shape = q.shape
+    B = orig_shape[0]
+    C = orig_shape[1]
    q, k, v = map(
        lambda t: t.view(B, 1, C, -1).transpose(2, 3).contiguous(),
        (q, k, v),
@@ -233,35 +286,35 @@ def pytorch_attention(q, k, v):

    try:
        out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=None, dropout_p=0.0, is_causal=False)
-        out = out.transpose(2, 3).reshape(B, C, H, W)
-    except model_management.OOM_EXCEPTION as e:
+        out = out.transpose(2, 3).reshape(orig_shape)
+    except model_management.OOM_EXCEPTION:
        logging.warning("scaled_dot_product_attention OOMed: switched to slice attention")
-        out = slice_attention(q.view(B, -1, C), k.view(B, -1, C).transpose(1, 2), v.view(B, -1, C).transpose(1, 2)).reshape(B, C, H, W)
+        out = slice_attention(q.view(B, -1, C), k.view(B, -1, C).transpose(1, 2), v.view(B, -1, C).transpose(1, 2)).reshape(orig_shape)
    return out


 class AttnBlock(nn.Module):
-    def __init__(self, in_channels):
+    def __init__(self, in_channels, conv_op=ops.Conv2d):
        super().__init__()
        self.in_channels = in_channels

        self.norm = Normalize(in_channels)
-        self.q = ops.Conv2d(in_channels,
+        self.q = conv_op(in_channels,
                                 in_channels,
                                 kernel_size=1,
                                 stride=1,
                                 padding=0)
-        self.k = ops.Conv2d(in_channels,
+        self.k = conv_op(in_channels,
                                 in_channels,
                                 kernel_size=1,
                                 stride=1,
                                 padding=0)
-        self.v = ops.Conv2d(in_channels,
+        self.v = conv_op(in_channels,
                                 in_channels,
                                 kernel_size=1,
                                 stride=1,
                                 padding=0)
-        self.proj_out = ops.Conv2d(in_channels,
+        self.proj_out = conv_op(in_channels,
                                        in_channels,
                                        kernel_size=1,
                                        stride=1,
@@ -291,8 +344,8 @@ class AttnBlock(nn.Module):
        return x+h_


-def make_attn(in_channels, attn_type="vanilla", attn_kwargs=None):
-    return AttnBlock(in_channels)
+def make_attn(in_channels, attn_type="vanilla", attn_kwargs=None, conv_op=ops.Conv2d):
+    return AttnBlock(in_channels, conv_op=conv_op)


 class Model(nn.Module):
@@ -451,6 +504,7 @@ class Encoder(nn.Module):
    def __init__(self, *, ch, out_ch, ch_mult=(1,2,4,8), num_res_blocks,
                 attn_resolutions, dropout=0.0, resamp_with_conv=True, in_channels,
                 resolution, z_channels, double_z=True, use_linear_attn=False, attn_type="vanilla",
+                 conv3d=False, time_compress=None,
                 **ignore_kwargs):
        super().__init__()
        if use_linear_attn: attn_type = "linear"
@@ -461,8 +515,15 @@ class Encoder(nn.Module):
        self.resolution = resolution
        self.in_channels = in_channels

+        if conv3d:
+            conv_op = VideoConv3d
+            mid_attn_conv_op = ops.Conv3d
+        else:
+            conv_op = ops.Conv2d
+            mid_attn_conv_op = ops.Conv2d
+
        # downsampling
-        self.conv_in = ops.Conv2d(in_channels,
+        self.conv_in = conv_op(in_channels,
                                       self.ch,
                                       kernel_size=3,
                                       stride=1,
@@ -481,15 +542,20 @@ class Encoder(nn.Module):
                block.append(ResnetBlock(in_channels=block_in,
                                         out_channels=block_out,
                                         temb_channels=self.temb_ch,
-                                         dropout=dropout))
+                                         dropout=dropout,
+                                         conv_op=conv_op))
                block_in = block_out
                if curr_res in attn_resolutions:
-                    attn.append(make_attn(block_in, attn_type=attn_type))
+                    attn.append(make_attn(block_in, attn_type=attn_type, conv_op=conv_op))
            down = nn.Module()
            down.block = block
            down.attn = attn
            if i_level != self.num_resolutions-1:
-                down.downsample = Downsample(block_in, resamp_with_conv)
+                stride = 2
+                if time_compress is not None:
+                    if (self.num_resolutions - 1 - i_level) > math.log2(time_compress):
+                        stride = (1, 2, 2)
+                down.downsample = Downsample(block_in, resamp_with_conv, stride=stride, conv_op=conv_op)
                curr_res = curr_res // 2
            self.down.append(down)

@@ -498,16 +564,18 @@ class Encoder(nn.Module):
        self.mid.block_1 = ResnetBlock(in_channels=block_in,
                                       out_channels=block_in,
                                       temb_channels=self.temb_ch,
-                                       dropout=dropout)
-        self.mid.attn_1 = make_attn(block_in, attn_type=attn_type)
+                                       dropout=dropout,
+                                       conv_op=conv_op)
+        self.mid.attn_1 = make_attn(block_in, attn_type=attn_type, conv_op=mid_attn_conv_op)
        self.mid.block_2 = ResnetBlock(in_channels=block_in,
                                       out_channels=block_in,
                                       temb_channels=self.temb_ch,
-                                       dropout=dropout)
+                                       dropout=dropout,
+                                       conv_op=conv_op)

        # end
        self.norm_out = Normalize(block_in)
-        self.conv_out = ops.Conv2d(block_in,
+        self.conv_out = conv_op(block_in,
                                        2*z_channels if double_z else z_channels,
                                        kernel_size=3,
                                        stride=1,
@@ -545,9 +613,10 @@ class Decoder(nn.Module):
                 conv_out_op=ops.Conv2d,
                 resnet_op=ResnetBlock,
                 attn_op=AttnBlock,
+                 conv3d=False,
+                 time_compress=None,
                **ignorekwargs):
        super().__init__()
-        if use_linear_attn: attn_type = "linear"
        self.ch = ch
        self.temb_ch = 0
        self.num_resolutions = len(ch_mult)
@@ -557,8 +626,15 @@ class Decoder(nn.Module):
        self.give_pre_end = give_pre_end
        self.tanh_out = tanh_out

-        # compute in_ch_mult, block_in and curr_res at lowest res
-        in_ch_mult = (1,)+tuple(ch_mult)
+        if conv3d:
+            conv_op = VideoConv3d
+            conv_out_op = VideoConv3d
+            mid_attn_conv_op = ops.Conv3d
+        else:
+            conv_op = ops.Conv2d
+            mid_attn_conv_op = ops.Conv2d
+
+        # compute block_in and curr_res at lowest res
        block_in = ch*ch_mult[self.num_resolutions-1]
        curr_res = resolution // 2**(self.num_resolutions-1)
        self.z_shape = (1,z_channels,curr_res,curr_res)
@@ -566,7 +642,7 @@ class Decoder(nn.Module):
            self.z_shape, np.prod(self.z_shape)))

        # z to block_in
-        self.conv_in = ops.Conv2d(z_channels,
+        self.conv_in = conv_op(z_channels,
                                       block_in,
                                       kernel_size=3,
                                       stride=1,
@@ -577,12 +653,14 @@ class Decoder(nn.Module):
        self.mid.block_1 = resnet_op(in_channels=block_in,
                                       out_channels=block_in,
                                       temb_channels=self.temb_ch,
-                                       dropout=dropout)
-        self.mid.attn_1 = attn_op(block_in)
+                                       dropout=dropout,
+                                       conv_op=conv_op)
+        self.mid.attn_1 = attn_op(block_in, conv_op=mid_attn_conv_op)
        self.mid.block_2 = resnet_op(in_channels=block_in,
                                       out_channels=block_in,
                                       temb_channels=self.temb_ch,
-                                       dropout=dropout)
+                                       dropout=dropout,
+                                       conv_op=conv_op)

        # upsampling
        self.up = nn.ModuleList()
@@ -594,15 +672,21 @@ class Decoder(nn.Module):
                block.append(resnet_op(in_channels=block_in,
                                         out_channels=block_out,
                                         temb_channels=self.temb_ch,
-                                         dropout=dropout))
+                                         dropout=dropout,
+                                         conv_op=conv_op))
                block_in = block_out
                if curr_res in attn_resolutions:
-                    attn.append(attn_op(block_in))
+                    attn.append(attn_op(block_in, conv_op=conv_op))
            up = nn.Module()
            up.block = block
            up.attn = attn
            if i_level != 0:
-                up.upsample = Upsample(block_in, resamp_with_conv)
+                scale_factor = 2.0
+                if time_compress is not None:
+                    if i_level > math.log2(time_compress):
+                        scale_factor = (1.0, 2.0, 2.0)
+
+                up.upsample = Upsample(block_in, resamp_with_conv, conv_op=conv_op, scale_factor=scale_factor)
                curr_res = curr_res * 2
            self.up.insert(0, up) # prepend to get consistent order

--- a/comfy/ldm/modules/diffusionmodules/openaimodel.py
+++ b/comfy/ldm/modules/diffusionmodules/openaimodel.py
@@ -9,12 +9,12 @@ import logging
 from .util import (
    checkpoint,
    avg_pool_nd,
-    zero_module,
    timestep_embedding,
    AlphaBlender,
 )
 from ..attention import SpatialTransformer, SpatialVideoTransformer, default
 from comfy.ldm.util import exists
+import comfy.patcher_extension
 import comfy.ops
 ops = comfy.ops.disable_weight_init

@@ -47,6 +47,15 @@ def forward_timestep_embed(ts, x, emb, context=None, transformer_options={}, out
        elif isinstance(layer, Upsample):
            x = layer(x, output_shape=output_shape)
        else:
+            if "patches" in transformer_options and "forward_timestep_embed_patch" in transformer_options["patches"]:
+                found_patched = False
+                for class_type, handler in transformer_options["patches"]["forward_timestep_embed_patch"]:
+                    if isinstance(layer, class_type):
+                        x = handler(layer, x, emb, context, transformer_options, output_shape, time_context, num_video_frames, image_only_indicator)
+                        found_patched = True
+                        break
+                if found_patched:
+                    continue
            x = layer(x)
    return x

@@ -819,6 +828,13 @@ class UNetModel(nn.Module):
        )

    def forward(self, x, timesteps=None, context=None, y=None, control=None, transformer_options={}, **kwargs):
+        return comfy.patcher_extension.WrapperExecutor.new_class_executor(
+            self._forward,
+            self,
+            comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.DIFFUSION_MODEL, transformer_options)
+        ).execute(x, timesteps, context, y, control, transformer_options, **kwargs)
+
+    def _forward(self, x, timesteps=None, context=None, y=None, control=None, transformer_options={}, **kwargs):
        """
        Apply the model to an input batch.
        :param x: an [N x C x ...] Tensor of inputs.
--- a/comfy/ldm/modules/diffusionmodules/upscaling.py
+++ b/comfy/ldm/modules/diffusionmodules/upscaling.py
@@ -4,7 +4,6 @@ import numpy as np
 from functools import partial

 from .util import extract_into_tensor, make_beta_schedule
-from comfy.ldm.util import default


 class AbstractLowScaleModel(nn.Module):
--- a/comfy/ldm/modules/diffusionmodules/util.py
+++ b/comfy/ldm/modules/diffusionmodules/util.py
@@ -8,7 +8,6 @@
 # thanks!


-import os
 import math
 import torch
 import torch.nn as nn
--- a/comfy/ldm/modules/distributions/distributions.py
+++ b/comfy/ldm/modules/distributions/distributions.py
@@ -30,10 +30,10 @@ class DiagonalGaussianDistribution(object):
        self.std = torch.exp(0.5 * self.logvar)
        self.var = torch.exp(self.logvar)
        if self.deterministic:
-            self.var = self.std = torch.zeros_like(self.mean).to(device=self.parameters.device)
+            self.var = self.std = torch.zeros_like(self.mean, device=self.parameters.device)

    def sample(self):
-        x = self.mean + self.std * torch.randn(self.mean.shape).to(device=self.parameters.device)
+        x = self.mean + self.std * torch.randn(self.mean.shape, device=self.parameters.device)
        return x

    def kl(self, other=None):
--- a/comfy/ldm/modules/sub_quadratic_attention.py
+++ b/comfy/ldm/modules/sub_quadratic_attention.py
@@ -22,7 +22,6 @@ except ImportError:
 	from typing import Optional, NamedTuple, List
 	from typing_extensions import Protocol

-from torch import Tensor
 from typing import List

 from comfy import model_management
@@ -172,7 +171,7 @@ def _get_attention_scores_no_kv_chunking(
        del attn_scores
    except model_management.OOM_EXCEPTION:
        logging.warning("ran out of memory while running softmax in  _get_attention_scores_no_kv_chunking, trying slower in place softmax instead")
-        attn_scores -= attn_scores.max(dim=-1, keepdim=True).values
+        attn_scores -= attn_scores.max(dim=-1, keepdim=True).values # noqa: F821 attn_scores is not defined
        torch.exp(attn_scores, out=attn_scores)
        summed = torch.sum(attn_scores, dim=-1, keepdim=True)
        attn_scores /= summed
--- a/comfy/ldm/modules/temporal_ae.py
+++ b/comfy/ldm/modules/temporal_ae.py
@@ -1,5 +1,5 @@
 import functools
-from typing import Callable, Iterable, Union
+from typing import Iterable, Union

 import torch
 from einops import rearrange, repeat
@@ -194,6 +194,7 @@ def make_time_attn(
    attn_kwargs=None,
    alpha: float = 0,
    merge_strategy: str = "learned",
+    conv_op=ops.Conv2d,
 ):
    return partialclass(
        AttnVideoBlock, in_channels, alpha=alpha, merge_strategy=merge_strategy
--- a/comfy/ldm/util.py
+++ b/comfy/ldm/util.py
@@ -133,7 +133,6 @@ class AdamWwithEMAandWings(optim.Optimizer):
            exp_avgs = []
            exp_avg_sqs = []
            ema_params_with_grad = []
-            state_sums = []
            max_exp_avg_sqs = []
            state_steps = []
            amsgrad = group['amsgrad']
--- a/comfy/lora.py
+++ b/comfy/lora.py
@@ -33,7 +33,7 @@ LORA_CLIP_MAP = {
 }


-def load_lora(lora, to_load):
+def load_lora(lora, to_load, log_missing=True):
    patch_dict = {}
    loaded_keys = set()
    for x in to_load:
@@ -213,9 +213,10 @@ def load_lora(lora, to_load):
            patch_dict[to_load[x]] = ("set", (set_weight,))
            loaded_keys.add(set_weight_name)

-    for x in lora.keys():
-        if x not in loaded_keys:
-            logging.warning("lora key not loaded: {}".format(x))
+    if log_missing:
+        for x in lora.keys():
+            if x not in loaded_keys:
+                logging.warning("lora key not loaded: {}".format(x))

    return patch_dict

@@ -373,6 +374,18 @@ def model_lora_keys_unet(model, key_map={}):
                key_lora = k[len("diffusion_model."):-len(".weight")]
                key_map["{}".format(key_lora)] = k

+    if isinstance(model, comfy.model_base.HunyuanVideo):
+        for k in sdk:
+            if k.startswith("diffusion_model.") and k.endswith(".weight"):
+                # diffusion-pipe lora format
+                key_lora = k
+                key_lora = key_lora.replace("_mod.lin.", "_mod.linear.").replace("_attn.qkv.", "_attn_qkv.").replace("_attn.proj.", "_attn_proj.")
+                key_lora = key_lora.replace("mlp.0.", "mlp.fc1.").replace("mlp.2.", "mlp.fc2.")
+                key_lora = key_lora.replace(".modulation.lin.", ".modulation.linear.")
+                key_lora = key_lora[len("diffusion_model."):-len(".weight")]
+                key_map["transformer.{}".format(key_lora)] = k
+                key_map["diffusion_model.{}".format(key_lora)] = k  # Old loras
+
    return key_map


@@ -429,7 +442,7 @@ def pad_tensor_to_shape(tensor: torch.Tensor, new_shape: list[int]) -> torch.Ten

    return padded_tensor

-def calculate_weight(patches, weight, key, intermediate_dtype=torch.float32):
+def calculate_weight(patches, weight, key, intermediate_dtype=torch.float32, original_weights=None):
    for p in patches:
        strength = p[0]
        v = p[1]
@@ -471,6 +484,11 @@ def calculate_weight(patches, weight, key, intermediate_dtype=torch.float32):
                    weight += function(strength * comfy.model_management.cast_to_device(diff, weight.device, weight.dtype))
        elif patch_type == "set":
            weight.copy_(v[0])
+        elif patch_type == "model_as_lora":
+            target_weight: torch.Tensor = v[0]
+            diff_weight = comfy.model_management.cast_to_device(target_weight, weight.device, intermediate_dtype) - \
+                          comfy.model_management.cast_to_device(original_weights[key][0][0], weight.device, intermediate_dtype)
+            weight += function(strength * comfy.model_management.cast_to_device(diff_weight, weight.device, weight.dtype))
        elif patch_type == "lora": #lora/locon
            mat1 = comfy.model_management.cast_to_device(v[0], weight.device, intermediate_dtype)
            mat2 = comfy.model_management.cast_to_device(v[1], weight.device, intermediate_dtype)
--- a/comfy/model_base.py
+++ b/comfy/model_base.py
@@ -31,14 +31,19 @@ import comfy.ldm.audio.dit
 import comfy.ldm.audio.embedders
 import comfy.ldm.flux.model
 import comfy.ldm.lightricks.model
+import comfy.ldm.hunyuan_video.model

 import comfy.model_management
+import comfy.patcher_extension
 import comfy.conds
 import comfy.ops
 from enum import Enum
 from . import utils
 import comfy.latent_formats
 import math
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    from comfy.model_patcher import ModelPatcher

 class ModelType(Enum):
    EPS = 1
@@ -95,6 +100,7 @@ class BaseModel(torch.nn.Module):
        self.model_config = model_config
        self.manual_cast_dtype = model_config.manual_cast_dtype
        self.device = device
+        self.current_patcher: 'ModelPatcher' = None

        if not unet_config.get("disable_unet_model_creation", False):
            if model_config.custom_operations is None:
@@ -120,6 +126,13 @@ class BaseModel(torch.nn.Module):
        self.memory_usage_factor = model_config.memory_usage_factor

    def apply_model(self, x, t, c_concat=None, c_crossattn=None, control=None, transformer_options={}, **kwargs):
+        return comfy.patcher_extension.WrapperExecutor.new_class_executor(
+            self._apply_model,
+            self,
+            comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.APPLY_MODEL, transformer_options)
+        ).execute(x, t, c_concat, c_crossattn, control, transformer_options, **kwargs)
+
+    def _apply_model(self, x, t, c_concat=None, c_crossattn=None, control=None, transformer_options={}, **kwargs):
        sigma = t
        xc = self.model_sampling.calculate_input(sigma, x)
        if c_concat is not None:
@@ -415,7 +428,6 @@ class SVD_img2vid(BaseModel):

        latent_image = kwargs.get("concat_latent_image", None)
        noise = kwargs.get("noise", None)
-        device = kwargs["device"]

        if latent_image is None:
            latent_image = torch.zeros_like(noise)
@@ -675,6 +687,7 @@ class StableAudio1(BaseModel):
                sd["{}{}".format(k, l)] = s[l]
        return sd

+
 class HunyuanDiT(BaseModel):
    def __init__(self, model_config, model_type=ModelType.V_PREDICTION, device=None):
        super().__init__(model_config, model_type, device=device, unet_model=comfy.ldm.hydit.models.HunYuanDiT)
@@ -699,8 +712,6 @@ class HunyuanDiT(BaseModel):

        width = kwargs.get("width", 768)
        height = kwargs.get("height", 768)
-        crop_w = kwargs.get("crop_w", 0)
-        crop_h = kwargs.get("crop_h", 0)
        target_width = kwargs.get("target_width", width)
        target_height = kwargs.get("target_height", height)

@@ -757,6 +768,16 @@ class Flux(BaseModel):
        cross_attn = kwargs.get("cross_attn", None)
        if cross_attn is not None:
            out['c_crossattn'] = comfy.conds.CONDRegular(cross_attn)
+        # upscale the attention mask, since now we 
+        attention_mask = kwargs.get("attention_mask", None)
+        if attention_mask is not None:
+            shape = kwargs["noise"].shape
+            mask_ref_size = kwargs["attention_mask_img_shape"]
+            # the model will pad to the patch size, and then divide
+            # essentially dividing and rounding up
+            (h_tok, w_tok) = (math.ceil(shape[2] / self.diffusion_model.patch_size), math.ceil(shape[3] / self.diffusion_model.patch_size))
+            attention_mask = utils.upscale_dit_mask(attention_mask, mask_ref_size, (h_tok, w_tok))
+            out['attention_mask'] = comfy.conds.CONDRegular(attention_mask)
        out['guidance'] = comfy.conds.CONDRegular(torch.FloatTensor([kwargs.get("guidance", 3.5)]))
        return out

@@ -792,5 +813,27 @@ class LTXV(BaseModel):
        if guiding_latent is not None:
            out['guiding_latent'] = comfy.conds.CONDRegular(guiding_latent)

+        guiding_latent_noise_scale = kwargs.get("guiding_latent_noise_scale", None)
+        if guiding_latent_noise_scale is not None:
+            out["guiding_latent_noise_scale"] = comfy.conds.CONDConstant(guiding_latent_noise_scale)
+
        out['frame_rate'] = comfy.conds.CONDConstant(kwargs.get("frame_rate", 25))
        return out
+
+class HunyuanVideo(BaseModel):
+    def __init__(self, model_config, model_type=ModelType.FLOW, device=None):
+        super().__init__(model_config, model_type, device=device, unet_model=comfy.ldm.hunyuan_video.model.HunyuanVideo)
+
+    def encode_adm(self, **kwargs):
+        return kwargs["pooled_output"]
+
+    def extra_conds(self, **kwargs):
+        out = super().extra_conds(**kwargs)
+        attention_mask = kwargs.get("attention_mask", None)
+        if attention_mask is not None:
+            out['attention_mask'] = comfy.conds.CONDRegular(attention_mask)
+        cross_attn = kwargs.get("cross_attn", None)
+        if cross_attn is not None:
+            out['c_crossattn'] = comfy.conds.CONDRegular(cross_attn)
+        out['guidance'] = comfy.conds.CONDRegular(torch.FloatTensor([kwargs.get("guidance", 6.0)]))
+        return out
--- a/comfy/model_detection.py
+++ b/comfy/model_detection.py
@@ -133,6 +133,26 @@ def detect_unet_config(state_dict, key_prefix):
            unet_config["image_model"] = "hydit1"
        return unet_config

+    if '{}txt_in.individual_token_refiner.blocks.0.norm1.weight'.format(key_prefix) in state_dict_keys: #Hunyuan Video
+        dit_config = {}
+        dit_config["image_model"] = "hunyuan_video"
+        dit_config["in_channels"] = 16
+        dit_config["patch_size"] = [1, 2, 2]
+        dit_config["out_channels"] = 16
+        dit_config["vec_in_dim"] = 768
+        dit_config["context_in_dim"] = 4096
+        dit_config["hidden_size"] = 3072
+        dit_config["mlp_ratio"] = 4.0
+        dit_config["num_heads"] = 24
+        dit_config["depth"] = count_blocks(state_dict_keys, '{}double_blocks.'.format(key_prefix) + '{}.')
+        dit_config["depth_single_blocks"] = count_blocks(state_dict_keys, '{}single_blocks.'.format(key_prefix) + '{}.')
+        dit_config["axes_dim"] = [16, 56, 56]
+        dit_config["theta"] = 256
+        dit_config["qkv_bias"] = True
+        guidance_keys = list(filter(lambda a: a.startswith("{}guidance_in.".format(key_prefix)), state_dict_keys))
+        dit_config["guidance_embed"] = len(guidance_keys) > 0
+        return dit_config
+
    if '{}double_blocks.0.img_attn.norm.key_norm.scale'.format(key_prefix) in state_dict_keys: #Flux
        dit_config = {}
        dit_config["image_model"] = "flux"
@@ -216,7 +236,6 @@ def detect_unet_config(state_dict, key_prefix):

    num_res_blocks = []
    channel_mult = []
-    attention_resolutions = []
    transformer_depth = []
    transformer_depth_output = []
    context_dim = None
@@ -388,7 +407,6 @@ def convert_config(unet_config):
            t_out += [d] * (res + 1)
            s *= 2
        transformer_depth = t_in
-        transformer_depth_output = t_out
        new_config["transformer_depth"] = t_in
        new_config["transformer_depth_output"] = t_out
        new_config["transformer_depth_middle"] = transformer_depth_middle
--- a/comfy/model_management.py
+++ b/comfy/model_management.py
@@ -23,6 +23,8 @@ from comfy.cli_args import args
 import torch
 import sys
 import platform
+import weakref
+import gc

 class VRAMState(Enum):
    DISABLED = 0    #No vram present: no need to move models to vram
@@ -287,11 +289,27 @@ def module_size(module):

 class LoadedModel:
    def __init__(self, model):
-        self.model = model
+        self._set_model(model)
        self.device = model.load_device
-        self.weights_loaded = False
        self.real_model = None
        self.currently_used = True
+        self.model_finalizer = None
+        self._patcher_finalizer = None
+
+    def _set_model(self, model):
+        self._model = weakref.ref(model)
+        if model.parent is not None:
+            self._parent_model = weakref.ref(model.parent)
+            self._patcher_finalizer = weakref.finalize(model, self._switch_parent)
+
+    def _switch_parent(self):
+        model = self._parent_model()
+        if model is not None:
+            self._set_model(model)
+
+    @property
+    def model(self):
+        return self._model()

    def model_memory(self):
        return self.model.model_size()
@@ -306,32 +324,23 @@ class LoadedModel:
            return self.model_memory()

    def model_load(self, lowvram_model_memory=0, force_patch_weights=False):
-        patch_model_to = self.device
-
        self.model.model_patches_to(self.device)
        self.model.model_patches_to(self.model.model_dtype())

-        load_weights = not self.weights_loaded
+        # if self.model.loaded_size() > 0:
+        use_more_vram = lowvram_model_memory
+        if use_more_vram == 0:
+            use_more_vram = 1e32
+        self.model_use_more_vram(use_more_vram, force_patch_weights=force_patch_weights)
+        real_model = self.model.model

-        if self.model.loaded_size() > 0:
-            use_more_vram = lowvram_model_memory
-            if use_more_vram == 0:
-                use_more_vram = 1e32
-            self.model_use_more_vram(use_more_vram)
-        else:
-            try:
-                self.real_model = self.model.patch_model(device_to=patch_model_to, lowvram_model_memory=lowvram_model_memory, load_weights=load_weights, force_patch_weights=force_patch_weights)
-            except Exception as e:
-                self.model.unpatch_model(self.model.offload_device)
-                self.model_unload()
-                raise e
-
-        if is_intel_xpu() and not args.disable_ipex_optimize and 'ipex' in globals() and self.real_model is not None:
+        if is_intel_xpu() and not args.disable_ipex_optimize and 'ipex' in globals() and real_model is not None:
            with torch.no_grad():
-                self.real_model = ipex.optimize(self.real_model.eval(), inplace=True, graph_mode=True, concat_linear=True)
+                real_model = ipex.optimize(real_model.eval(), inplace=True, graph_mode=True, concat_linear=True)

-        self.weights_loaded = True
-        return self.real_model
+        self.real_model = weakref.ref(real_model)
+        self.model_finalizer = weakref.finalize(real_model, cleanup_models)
+        return real_model

    def should_reload_model(self, force_patch_weights=False):
        if force_patch_weights and self.model.lowvram_patch_counter() > 0:
@@ -344,18 +353,26 @@ class LoadedModel:
                freed = self.model.partially_unload(self.model.offload_device, memory_to_free)
                if freed >= memory_to_free:
                    return False
-        self.model.unpatch_model(self.model.offload_device, unpatch_weights=unpatch_weights)
-        self.model.model_patches_to(self.model.offload_device)
-        self.weights_loaded = self.weights_loaded and not unpatch_weights
+        self.model.detach(unpatch_weights)
+        self.model_finalizer.detach()
+        self.model_finalizer = None
        self.real_model = None
        return True

-    def model_use_more_vram(self, extra_memory):
-        return self.model.partially_load(self.device, extra_memory)
+    def model_use_more_vram(self, extra_memory, force_patch_weights=False):
+        return self.model.partially_load(self.device, extra_memory, force_patch_weights=force_patch_weights)

    def __eq__(self, other):
        return self.model is other.model

+    def __del__(self):
+        if self._patcher_finalizer is not None:
+            self._patcher_finalizer.detach()
+
+    def is_dead(self):
+        return self.real_model() is not None and self.model is None
+
+
 def use_more_memory(extra_memory, loaded_models, device):
    for m in loaded_models:
        if m.device == device:
@@ -386,38 +403,8 @@ def extra_reserved_memory():
 def minimum_inference_memory():
    return (1024 * 1024 * 1024) * 0.8 + extra_reserved_memory()

-def unload_model_clones(model, unload_weights_only=True, force_unload=True):
-    to_unload = []
-    for i in range(len(current_loaded_models)):
-        if model.is_clone(current_loaded_models[i].model):
-            to_unload = [i] + to_unload
-
-    if len(to_unload) == 0:
-        return True
-
-    same_weights = 0
-    for i in to_unload:
-        if model.clone_has_same_weights(current_loaded_models[i].model):
-            same_weights += 1
-
-    if same_weights == len(to_unload):
-        unload_weight = False
-    else:
-        unload_weight = True
-
-    if not force_unload:
-        if unload_weights_only and unload_weight == False:
-            return None
-    else:
-        unload_weight = True
-
-    for i in to_unload:
-        logging.debug("unload clone {} {}".format(i, unload_weight))
-        current_loaded_models.pop(i).model_unload(unpatch_weights=unload_weight)
-
-    return unload_weight
-
 def free_memory(memory_required, device, keep_loaded=[]):
+    cleanup_models_gc()
    unloaded_model = []
    can_unload = []
    unloaded_models = []
@@ -425,7 +412,7 @@ def free_memory(memory_required, device, keep_loaded=[]):
    for i in range(len(current_loaded_models) -1, -1, -1):
        shift_model = current_loaded_models[i]
        if shift_model.device == device:
-            if shift_model not in keep_loaded:
+            if shift_model not in keep_loaded and not shift_model.is_dead():
                can_unload.append((-shift_model.model_offloaded_memory(), sys.getrefcount(shift_model.model), shift_model.model_memory(), i))
                shift_model.currently_used = False

@@ -454,6 +441,7 @@ def free_memory(memory_required, device, keep_loaded=[]):
    return unloaded_models

 def load_models_gpu(models, memory_required=0, force_patch_weights=False, minimum_memory_required=None, force_full_load=False):
+    cleanup_models_gc()
    global vram_state

    inference_memory = minimum_inference_memory()
@@ -466,11 +454,9 @@ def load_models_gpu(models, memory_required=0, force_patch_weights=False, minimu
    models = set(models)

    models_to_load = []
-    models_already_loaded = []
+
    for x in models:
        loaded_model = LoadedModel(x)
-        loaded = None
-
        try:
            loaded_model_index = current_loaded_models.index(loaded_model)
        except:
@@ -478,51 +464,35 @@ def load_models_gpu(models, memory_required=0, force_patch_weights=False, minimu

        if loaded_model_index is not None:
            loaded = current_loaded_models[loaded_model_index]
-            if loaded.should_reload_model(force_patch_weights=force_patch_weights): #TODO: cleanup this model reload logic
-                current_loaded_models.pop(loaded_model_index).model_unload(unpatch_weights=True)
-                loaded = None
-            else:
-                loaded.currently_used = True
-                models_already_loaded.append(loaded)
-
-        if loaded is None:
+            loaded.currently_used = True
+            models_to_load.append(loaded)
+        else:
            if hasattr(x, "model"):
                logging.info(f"Requested to load {x.model.__class__.__name__}")
            models_to_load.append(loaded_model)

-    if len(models_to_load) == 0:
-        devs = set(map(lambda a: a.device, models_already_loaded))
-        for d in devs:
-            if d != torch.device("cpu"):
-                free_memory(extra_mem + offloaded_memory(models_already_loaded, d), d, models_already_loaded)
-                free_mem = get_free_memory(d)
-                if free_mem < minimum_memory_required:
-                    logging.info("Unloading models for lowram load.") #TODO: partial model unloading when this case happens, also handle the opposite case where models can be unlowvramed.
-                    models_to_load = free_memory(minimum_memory_required, d)
-                    logging.info("{} models unloaded.".format(len(models_to_load)))
-                else:
-                    use_more_memory(free_mem - minimum_memory_required, models_already_loaded, d)
-        if len(models_to_load) == 0:
-            return
-
-    logging.info(f"Loading {len(models_to_load)} new model{'s' if len(models_to_load) > 1 else ''}")
+    for loaded_model in models_to_load:
+        to_unload = []
+        for i in range(len(current_loaded_models)):
+            if loaded_model.model.is_clone(current_loaded_models[i].model):
+                to_unload = [i] + to_unload
+        for i in to_unload:
+            current_loaded_models.pop(i).model.detach(unpatch_all=False)

    total_memory_required = {}
    for loaded_model in models_to_load:
-        unload_model_clones(loaded_model.model, unload_weights_only=True, force_unload=False) #unload clones where the weights are different
        total_memory_required[loaded_model.device] = total_memory_required.get(loaded_model.device, 0) + loaded_model.model_memory_required(loaded_model.device)

-    for loaded_model in models_already_loaded:
-        total_memory_required[loaded_model.device] = total_memory_required.get(loaded_model.device, 0) + loaded_model.model_memory_required(loaded_model.device)
-
-    for loaded_model in models_to_load:
-        weights_unloaded = unload_model_clones(loaded_model.model, unload_weights_only=False, force_unload=False) #unload the rest of the clones where the weights can stay loaded
-        if weights_unloaded is not None:
-            loaded_model.weights_loaded = not weights_unloaded
-
    for device in total_memory_required:
        if device != torch.device("cpu"):
-            free_memory(total_memory_required[device] * 1.1 + extra_mem, device, models_already_loaded)
+            free_memory(total_memory_required[device] * 1.1 + extra_mem, device)
+
+    for device in total_memory_required:
+        if device != torch.device("cpu"):
+            free_mem = get_free_memory(device)
+            if free_mem < minimum_memory_required:
+                models_l = free_memory(minimum_memory_required, device)
+                logging.info("{} models unloaded.".format(len(models_l)))

    for loaded_model in models_to_load:
        model = loaded_model.model
@@ -542,19 +512,10 @@ def load_models_gpu(models, memory_required=0, force_patch_weights=False, minimu
        if vram_set_state == VRAMState.NO_VRAM:
            lowvram_model_memory = 64 * 1024 * 1024

-        cur_loaded_model = loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights)
+        loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights)
        current_loaded_models.insert(0, loaded_model)
-
-
-    devs = set(map(lambda a: a.device, models_already_loaded))
-    for d in devs:
-        if d != torch.device("cpu"):
-            free_mem = get_free_memory(d)
-            if free_mem > minimum_memory_required:
-                use_more_memory(free_mem - minimum_memory_required, models_already_loaded, d)
    return

-
 def load_model_gpu(model):
    return load_models_gpu([model])

@@ -568,21 +529,35 @@ def loaded_models(only_currently_used=False):
        output.append(m.model)
    return output

-def cleanup_models(keep_clone_weights_loaded=False):
+
+def cleanup_models_gc():
+    do_gc = False
+    for i in range(len(current_loaded_models)):
+        cur = current_loaded_models[i]
+        if cur.is_dead():
+            logging.info("Potential memory leak detected with model {}, doing a full garbage collect, for maximum performance avoid circular references in the model code.".format(cur.real_model().__class__.__name__))
+            do_gc = True
+            break
+
+    if do_gc:
+        gc.collect()
+        soft_empty_cache()
+
+        for i in range(len(current_loaded_models)):
+            cur = current_loaded_models[i]
+            if cur.is_dead():
+                logging.warning("WARNING, memory leak with model {}. Please make sure it is not being referenced from somewhere.".format(cur.real_model().__class__.__name__))
+
+
+
+def cleanup_models():
    to_delete = []
    for i in range(len(current_loaded_models)):
-        #TODO: very fragile function needs improvement
-        num_refs = sys.getrefcount(current_loaded_models[i].model)
-        if num_refs <= 2:
-            if not keep_clone_weights_loaded:
-                to_delete = [i] + to_delete
-            #TODO: find a less fragile way to do this.
-            elif sys.getrefcount(current_loaded_models[i].real_model) <= 3: #references from .real_model + the .model
-                to_delete = [i] + to_delete
+        if current_loaded_models[i].real_model() is None:
+            to_delete = [i] + to_delete

    for i in to_delete:
        x = current_loaded_models.pop(i)
-        x.model_unload()
        del x

 def dtype_size(dtype):
@@ -606,7 +581,7 @@ def unet_offload_device():

 def unet_inital_load_device(parameters, dtype):
    torch_dev = get_torch_device()
-    if vram_state == VRAMState.HIGH_VRAM:
+    if vram_state == VRAMState.HIGH_VRAM or vram_state == VRAMState.SHARED:
        return torch_dev

    cpu_dev = torch.device("cpu")
@@ -720,7 +695,7 @@ def text_encoder_initial_device(load_device, offload_device, model_size=0):
        return offload_device

    if is_device_mps(load_device):
-        return offload_device
+        return load_device

    mem_l = get_free_memory(load_device)
    mem_o = get_free_memory(offload_device)
@@ -862,6 +837,8 @@ def cast_to_device(tensor, device, dtype, copy=False):
    non_blocking = device_supports_non_blocking(device)
    return cast_to(tensor, dtype=dtype, device=device, non_blocking=non_blocking, copy=copy)

+def sage_attention_enabled():
+    return args.use_sage_attention

 def xformers_enabled():
    global directml_enabled
--- a/comfy/model_patcher.py
+++ b/comfy/model_patcher.py
@@ -16,6 +16,8 @@
    along with this program.  If not, see <https://www.gnu.org/licenses/>.
 """

+from __future__ import annotations
+from typing import Optional, Callable
 import torch
 import copy
 import inspect
@@ -28,6 +30,9 @@ import comfy.utils
 import comfy.float
 import comfy.model_management
 import comfy.lora
+import comfy.hooks
+import comfy.patcher_extension
+from comfy.patcher_extension import CallbacksMP, WrappersMP, PatcherInjection
 from comfy.comfy_types import UnetWrapperFunction

 def string_to_seed(data):
@@ -76,6 +81,17 @@ def set_model_options_pre_cfg_function(model_options, pre_cfg_function, disable_
        model_options["disable_cfg1_optimization"] = True
    return model_options

+def create_model_options_clone(orig_model_options: dict):
+    return comfy.patcher_extension.copy_nested_dicts(orig_model_options)
+        
+def create_hook_patches_clone(orig_hook_patches):
+    new_hook_patches = {}
+    for hook_ref in orig_hook_patches:
+        new_hook_patches[hook_ref] = {}
+        for k in orig_hook_patches[hook_ref]:
+            new_hook_patches[hook_ref][k] = orig_hook_patches[hook_ref][k][:]
+    return new_hook_patches
+
 def wipe_lowvram_weight(m):
    if hasattr(m, "prev_comfy_cast_weights"):
        m.comfy_cast_weights = m.prev_comfy_cast_weights
@@ -119,6 +135,49 @@ def get_key_weight(model, key):

    return weight, set_func, convert_func

+class AutoPatcherEjector:
+    def __init__(self, model: 'ModelPatcher', skip_and_inject_on_exit_only=False):
+        self.model = model
+        self.was_injected = False
+        self.prev_skip_injection = False
+        self.skip_and_inject_on_exit_only = skip_and_inject_on_exit_only
+    
+    def __enter__(self):
+        self.was_injected = False
+        self.prev_skip_injection = self.model.skip_injection
+        if self.skip_and_inject_on_exit_only:
+            self.model.skip_injection = True
+        if self.model.is_injected:
+            self.model.eject_model()
+            self.was_injected = True
+
+    def __exit__(self, *args):
+        if self.skip_and_inject_on_exit_only:
+            self.model.skip_injection = self.prev_skip_injection
+            self.model.inject_model()
+        if self.was_injected and not self.model.skip_injection:
+            self.model.inject_model()
+        self.model.skip_injection = self.prev_skip_injection
+
+class MemoryCounter:
+    def __init__(self, initial: int, minimum=0):
+        self.value = initial
+        self.minimum = minimum
+        # TODO: add a safe limit besides 0
+    
+    def use(self, weight: torch.Tensor):
+        weight_size = weight.nelement() * weight.element_size()
+        if self.is_useable(weight_size):
+            self.decrement(weight_size)
+            return True
+        return False
+
+    def is_useable(self, used: int):
+        return self.value - used > self.minimum
+
+    def decrement(self, used: int):
+        self.value -= used
+
 class ModelPatcher:
    def __init__(self, model, load_device, offload_device, size=0, weight_inplace_update=False):
        self.size = size
@@ -139,6 +198,25 @@ class ModelPatcher:
        self.offload_device = offload_device
        self.weight_inplace_update = weight_inplace_update
        self.patches_uuid = uuid.uuid4()
+        self.parent = None
+
+        self.attachments: dict[str] = {}
+        self.additional_models: dict[str, list[ModelPatcher]] = {}
+        self.callbacks: dict[str, dict[str, list[Callable]]] = CallbacksMP.init_callbacks()
+        self.wrappers: dict[str, dict[str, list[Callable]]] = WrappersMP.init_wrappers()
+
+        self.is_injected = False
+        self.skip_injection = False
+        self.injections: dict[str, list[PatcherInjection]] = {}
+
+        self.hook_patches: dict[comfy.hooks._HookRef] = {}
+        self.hook_patches_backup: dict[comfy.hooks._HookRef] = {}
+        self.hook_backup: dict[str, tuple[torch.Tensor, torch.device]] = {}
+        self.cached_hook_patches: dict[comfy.hooks.HookGroup, dict[str, torch.Tensor]] = {}
+        self.current_hooks: Optional[comfy.hooks.HookGroup] = None
+        self.forced_hooks: Optional[comfy.hooks.HookGroup] = None  # NOTE: only used for CLIP at this time
+        self.is_clip = False
+        self.hook_mode = comfy.hooks.EnumHookMode.MaxSpeed

        if not hasattr(self.model, 'model_loaded_weight_memory'):
            self.model.model_loaded_weight_memory = 0
@@ -149,6 +227,9 @@ class ModelPatcher:
        if not hasattr(self.model, 'model_lowvram'):
            self.model.model_lowvram = False

+        if not hasattr(self.model, 'current_weight_patches_uuid'):
+            self.model.current_weight_patches_uuid = None
+
    def model_size(self):
        if self.size > 0:
            return self.size
@@ -162,7 +243,7 @@ class ModelPatcher:
        return self.model.lowvram_patch_counter

    def clone(self):
-        n = ModelPatcher(self.model, self.load_device, self.offload_device, self.size, weight_inplace_update=self.weight_inplace_update)
+        n = self.__class__(self.model, self.load_device, self.offload_device, self.size, weight_inplace_update=self.weight_inplace_update)
        n.patches = {}
        for k in self.patches:
            n.patches[k] = self.patches[k][:]
@@ -172,6 +253,48 @@ class ModelPatcher:
        n.model_options = copy.deepcopy(self.model_options)
        n.backup = self.backup
        n.object_patches_backup = self.object_patches_backup
+        n.parent = self
+
+        # attachments
+        n.attachments = {}
+        for k in self.attachments:
+            if hasattr(self.attachments[k], "on_model_patcher_clone"):
+                n.attachments[k] = self.attachments[k].on_model_patcher_clone()
+            else:
+                n.attachments[k] = self.attachments[k]
+        # additional models
+        for k, c in self.additional_models.items():
+            n.additional_models[k] = [x.clone() for x in c]
+        # callbacks
+        for k, c in self.callbacks.items():
+            n.callbacks[k] = {}
+            for k1, c1 in c.items():
+                n.callbacks[k][k1] = c1.copy()
+        # sample wrappers
+        for k, w in self.wrappers.items():
+            n.wrappers[k] = {}
+            for k1, w1 in w.items():
+                n.wrappers[k][k1] = w1.copy()
+        # injection
+        n.is_injected = self.is_injected
+        n.skip_injection = self.skip_injection
+        for k, i in self.injections.items():
+            n.injections[k] = i.copy()
+        # hooks
+        n.hook_patches = create_hook_patches_clone(self.hook_patches)
+        n.hook_patches_backup = create_hook_patches_clone(self.hook_patches_backup)
+        for group in self.cached_hook_patches:
+            n.cached_hook_patches[group] = {}
+            for k in self.cached_hook_patches[group]:
+                n.cached_hook_patches[group][k] = self.cached_hook_patches[group][k]
+        n.hook_backup = self.hook_backup
+        n.current_hooks = self.current_hooks.clone() if self.current_hooks else self.current_hooks
+        n.forced_hooks = self.forced_hooks.clone() if self.forced_hooks else self.forced_hooks
+        n.is_clip = self.is_clip
+        n.hook_mode = self.hook_mode
+
+        for callback in self.get_all_callbacks(CallbacksMP.ON_CLONE):
+            callback(self, n)
        return n

    def is_clone(self, other):
@@ -179,10 +302,29 @@ class ModelPatcher:
            return True
        return False

-    def clone_has_same_weights(self, clone):
+    def clone_has_same_weights(self, clone: 'ModelPatcher'):
        if not self.is_clone(clone):
            return False

+        if self.current_hooks != clone.current_hooks:
+            return False
+        if self.forced_hooks != clone.forced_hooks:
+            return False
+        if self.hook_patches.keys() != clone.hook_patches.keys():
+            return False
+        if self.attachments.keys() != clone.attachments.keys():
+            return False
+        if self.additional_models.keys() != clone.additional_models.keys():
+            return False
+        for key in self.callbacks:
+            if len(self.callbacks[key]) != len(clone.callbacks[key]):
+                return False
+        for key in self.wrappers:
+            if len(self.wrappers[key]) != len(clone.wrappers[key]):
+                return False
+        if self.injections.keys() != clone.injections.keys():
+            return False
+
        if len(self.patches) == 0 and len(clone.patches) == 0:
            return True

@@ -251,6 +393,12 @@ class ModelPatcher:
    def set_model_output_block_patch(self, patch):
        self.set_model_patch(patch, "output_block_patch")

+    def set_model_emb_patch(self, patch):
+        self.set_model_patch(patch, "emb_patch")
+
+    def set_model_forward_timestep_embed_patch(self, patch):
+        self.set_model_patch(patch, "forward_timestep_embed_patch")
+
    def add_object_patch(self, name, obj):
        self.object_patches[name] = obj

@@ -289,27 +437,28 @@ class ModelPatcher:
            return self.model.get_dtype()

    def add_patches(self, patches, strength_patch=1.0, strength_model=1.0):
-        p = set()
-        model_sd = self.model.state_dict()
-        for k in patches:
-            offset = None
-            function = None
-            if isinstance(k, str):
-                key = k
-            else:
-                offset = k[1]
-                key = k[0]
-                if len(k) > 2:
-                    function = k[2]
+        with self.use_ejected():
+            p = set()
+            model_sd = self.model.state_dict()
+            for k in patches:
+                offset = None
+                function = None
+                if isinstance(k, str):
+                    key = k
+                else:
+                    offset = k[1]
+                    key = k[0]
+                    if len(k) > 2:
+                        function = k[2]

-            if key in model_sd:
-                p.add(k)
-                current_patches = self.patches.get(key, [])
-                current_patches.append((strength_patch, patches[k], strength_model, offset, function))
-                self.patches[key] = current_patches
+                if key in model_sd:
+                    p.add(k)
+                    current_patches = self.patches.get(key, [])
+                    current_patches.append((strength_patch, patches[k], strength_model, offset, function))
+                    self.patches[key] = current_patches

-        self.patches_uuid = uuid.uuid4()
-        return list(p)
+            self.patches_uuid = uuid.uuid4()
+            return list(p)

    def get_key_patches(self, filter_prefix=None):
        model_sd = self.model_state_dict()
@@ -319,9 +468,12 @@ class ModelPatcher:
                if not k.startswith(filter_prefix):
                    continue
            bk = self.backup.get(k, None)
+            hbk = self.hook_backup.get(k, None)
            weight, set_func, convert_func = get_key_weight(self.model, k)
            if bk is not None:
                weight = bk.weight
+            if hbk is not None:
+                weight = hbk[0]
            if convert_func is None:
                convert_func = lambda a, **kwargs: a

@@ -332,13 +484,14 @@ class ModelPatcher:
        return p

    def model_state_dict(self, filter_prefix=None):
-        sd = self.model.state_dict()
-        keys = list(sd.keys())
-        if filter_prefix is not None:
-            for k in keys:
-                if not k.startswith(filter_prefix):
-                    sd.pop(k)
-        return sd
+        with self.use_ejected():
+            sd = self.model.state_dict()
+            keys = list(sd.keys())
+            if filter_prefix is not None:
+                for k in keys:
+                    if not k.startswith(filter_prefix):
+                        sd.pop(k)
+            return sd

    def patch_weight_to_device(self, key, device_to=None, inplace_update=False):
        if key not in self.patches:
@@ -383,105 +536,117 @@ class ModelPatcher:
        return loading

    def load(self, device_to=None, lowvram_model_memory=0, force_patch_weights=False, full_load=False):
-        mem_counter = 0
-        patch_counter = 0
-        lowvram_counter = 0
-        loading = self._load_list()
+        with self.use_ejected():
+            self.unpatch_hooks()
+            mem_counter = 0
+            patch_counter = 0
+            lowvram_counter = 0
+            loading = self._load_list()

-        load_completely = []
-        loading.sort(reverse=True)
-        for x in loading:
-            n = x[1]
-            m = x[2]
-            params = x[3]
-            module_mem = x[0]
+            load_completely = []
+            loading.sort(reverse=True)
+            for x in loading:
+                n = x[1]
+                m = x[2]
+                params = x[3]
+                module_mem = x[0]

-            lowvram_weight = False
+                lowvram_weight = False

-            if not full_load and hasattr(m, "comfy_cast_weights"):
-                if mem_counter + module_mem >= lowvram_model_memory:
-                    lowvram_weight = True
-                    lowvram_counter += 1
-                    if hasattr(m, "prev_comfy_cast_weights"): #Already lowvramed
+                if not full_load and hasattr(m, "comfy_cast_weights"):
+                    if mem_counter + module_mem >= lowvram_model_memory:
+                        lowvram_weight = True
+                        lowvram_counter += 1
+                        if hasattr(m, "prev_comfy_cast_weights"): #Already lowvramed
+                            continue
+
+                weight_key = "{}.weight".format(n)
+                bias_key = "{}.bias".format(n)
+
+                if lowvram_weight:
+                    if weight_key in self.patches:
+                        if force_patch_weights:
+                            self.patch_weight_to_device(weight_key)
+                        else:
+                            m.weight_function = LowVramPatch(weight_key, self.patches)
+                            patch_counter += 1
+                    if bias_key in self.patches:
+                        if force_patch_weights:
+                            self.patch_weight_to_device(bias_key)
+                        else:
+                            m.bias_function = LowVramPatch(bias_key, self.patches)
+                            patch_counter += 1
+
+                    m.prev_comfy_cast_weights = m.comfy_cast_weights
+                    m.comfy_cast_weights = True
+                else:
+                    if hasattr(m, "comfy_cast_weights"):
+                        if m.comfy_cast_weights:
+                            wipe_lowvram_weight(m)
+
+                    if full_load or mem_counter + module_mem < lowvram_model_memory:
+                        mem_counter += module_mem
+                        load_completely.append((module_mem, n, m, params))
+
+            load_completely.sort(reverse=True)
+            for x in load_completely:
+                n = x[1]
+                m = x[2]
+                params = x[3]
+                if hasattr(m, "comfy_patched_weights"):
+                    if m.comfy_patched_weights == True:
                        continue

-            weight_key = "{}.weight".format(n)
-            bias_key = "{}.bias".format(n)
+                for param in params:
+                    self.patch_weight_to_device("{}.{}".format(n, param), device_to=device_to)

-            if lowvram_weight:
-                if weight_key in self.patches:
-                    if force_patch_weights:
-                        self.patch_weight_to_device(weight_key)
-                    else:
-                        m.weight_function = LowVramPatch(weight_key, self.patches)
-                        patch_counter += 1
-                if bias_key in self.patches:
-                    if force_patch_weights:
-                        self.patch_weight_to_device(bias_key)
-                    else:
-                        m.bias_function = LowVramPatch(bias_key, self.patches)
-                        patch_counter += 1
+                logging.debug("lowvram: loaded module regularly {} {}".format(n, m))
+                m.comfy_patched_weights = True

-                m.prev_comfy_cast_weights = m.comfy_cast_weights
-                m.comfy_cast_weights = True
+            for x in load_completely:
+                x[2].to(device_to)
+
+            if lowvram_counter > 0:
+                logging.info("loaded partially {} {} {}".format(lowvram_model_memory / (1024 * 1024), mem_counter / (1024 * 1024), patch_counter))
+                self.model.model_lowvram = True
            else:
-                if hasattr(m, "comfy_cast_weights"):
-                    if m.comfy_cast_weights:
-                        wipe_lowvram_weight(m)
+                logging.info("loaded completely {} {} {}".format(lowvram_model_memory / (1024 * 1024), mem_counter / (1024 * 1024), full_load))
+                self.model.model_lowvram = False
+                if full_load:
+                    self.model.to(device_to)
+                    mem_counter = self.model_size()

-                if full_load or mem_counter + module_mem < lowvram_model_memory:
-                    mem_counter += module_mem
-                    load_completely.append((module_mem, n, m, params))
+            self.model.lowvram_patch_counter += patch_counter
+            self.model.device = device_to
+            self.model.model_loaded_weight_memory = mem_counter
+            self.model.current_weight_patches_uuid = self.patches_uuid

-        load_completely.sort(reverse=True)
-        for x in load_completely:
-            n = x[1]
-            m = x[2]
-            params = x[3]
-            if hasattr(m, "comfy_patched_weights"):
-                if m.comfy_patched_weights == True:
-                    continue
+            for callback in self.get_all_callbacks(CallbacksMP.ON_LOAD):
+                callback(self, device_to, lowvram_model_memory, force_patch_weights, full_load)

-            for param in params:
-                self.patch_weight_to_device("{}.{}".format(n, param), device_to=device_to)
-
-            logging.debug("lowvram: loaded module regularly {} {}".format(n, m))
-            m.comfy_patched_weights = True
-
-        for x in load_completely:
-            x[2].to(device_to)
-
-        if lowvram_counter > 0:
-            logging.info("loaded partially {} {} {}".format(lowvram_model_memory / (1024 * 1024), mem_counter / (1024 * 1024), patch_counter))
-            self.model.model_lowvram = True
-        else:
-            logging.info("loaded completely {} {} {}".format(lowvram_model_memory / (1024 * 1024), mem_counter / (1024 * 1024), full_load))
-            self.model.model_lowvram = False
-            if full_load:
-                self.model.to(device_to)
-                mem_counter = self.model_size()
-
-        self.model.lowvram_patch_counter += patch_counter
-        self.model.device = device_to
-        self.model.model_loaded_weight_memory = mem_counter
+            self.apply_hooks(self.forced_hooks, force_apply=True)

    def patch_model(self, device_to=None, lowvram_model_memory=0, load_weights=True, force_patch_weights=False):
-        for k in self.object_patches:
-            old = comfy.utils.set_attr(self.model, k, self.object_patches[k])
-            if k not in self.object_patches_backup:
-                self.object_patches_backup[k] = old
+        with self.use_ejected():
+            for k in self.object_patches:
+                old = comfy.utils.set_attr(self.model, k, self.object_patches[k])
+                if k not in self.object_patches_backup:
+                    self.object_patches_backup[k] = old

-        if lowvram_model_memory == 0:
-            full_load = True
-        else:
-            full_load = False
+            if lowvram_model_memory == 0:
+                full_load = True
+            else:
+                full_load = False

-        if load_weights:
-            self.load(device_to, lowvram_model_memory=lowvram_model_memory, force_patch_weights=force_patch_weights, full_load=full_load)
+            if load_weights:
+                self.load(device_to, lowvram_model_memory=lowvram_model_memory, force_patch_weights=force_patch_weights, full_load=full_load)
+        self.inject_model()
        return self.model

    def unpatch_model(self, device_to=None, unpatch_weights=True):
+        self.eject_model()
        if unpatch_weights:
+            self.unpatch_hooks()
            if self.model.model_lowvram:
                for m in self.model.modules():
                    wipe_lowvram_weight(m)
@@ -498,6 +663,7 @@ class ModelPatcher:
                else:
                    comfy.utils.set_attr_param(self.model, k, bk.weight)

+            self.model.current_weight_patches_uuid = None
            self.backup.clear()

            if device_to is not None:
@@ -516,69 +682,92 @@ class ModelPatcher:
        self.object_patches_backup.clear()

    def partially_unload(self, device_to, memory_to_free=0):
-        memory_freed = 0
-        patch_counter = 0
-        unload_list = self._load_list()
-        unload_list.sort()
-        for unload in unload_list:
-            if memory_to_free < memory_freed:
-                break
-            module_mem = unload[0]
-            n = unload[1]
-            m = unload[2]
-            params = unload[3]
+        with self.use_ejected():
+            memory_freed = 0
+            patch_counter = 0
+            unload_list = self._load_list()
+            unload_list.sort()
+            for unload in unload_list:
+                if memory_to_free < memory_freed:
+                    break
+                module_mem = unload[0]
+                n = unload[1]
+                m = unload[2]
+                params = unload[3]

-            lowvram_possible = hasattr(m, "comfy_cast_weights")
-            if hasattr(m, "comfy_patched_weights") and m.comfy_patched_weights == True:
-                move_weight = True
-                for param in params:
-                    key = "{}.{}".format(n, param)
-                    bk = self.backup.get(key, None)
-                    if bk is not None:
-                        if not lowvram_possible:
-                            move_weight = False
-                            break
+                lowvram_possible = hasattr(m, "comfy_cast_weights")
+                if hasattr(m, "comfy_patched_weights") and m.comfy_patched_weights == True:
+                    move_weight = True
+                    for param in params:
+                        key = "{}.{}".format(n, param)
+                        bk = self.backup.get(key, None)
+                        if bk is not None:
+                            if not lowvram_possible:
+                                move_weight = False
+                                break

-                        if bk.inplace_update:
-                            comfy.utils.copy_to_param(self.model, key, bk.weight)
-                        else:
-                            comfy.utils.set_attr_param(self.model, key, bk.weight)
-                        self.backup.pop(key)
+                            if bk.inplace_update:
+                                comfy.utils.copy_to_param(self.model, key, bk.weight)
+                            else:
+                                comfy.utils.set_attr_param(self.model, key, bk.weight)
+                            self.backup.pop(key)
+                    
+                    weight_key = "{}.weight".format(n)
+                    bias_key = "{}.bias".format(n)
+                    if move_weight:
+                        m.to(device_to)
+                        if lowvram_possible:
+                            if weight_key in self.patches:
+                                m.weight_function = LowVramPatch(weight_key, self.patches)
+                                patch_counter += 1
+                            if bias_key in self.patches:
+                                m.bias_function = LowVramPatch(bias_key, self.patches)
+                                patch_counter += 1

-                weight_key = "{}.weight".format(n)
-                bias_key = "{}.bias".format(n)
-                if move_weight:
-                    m.to(device_to)
-                    if lowvram_possible:
-                        if weight_key in self.patches:
-                            m.weight_function = LowVramPatch(weight_key, self.patches)
-                            patch_counter += 1
-                        if bias_key in self.patches:
-                            m.bias_function = LowVramPatch(bias_key, self.patches)
-                            patch_counter += 1
+                            m.prev_comfy_cast_weights = m.comfy_cast_weights
+                            m.comfy_cast_weights = True
+                        m.comfy_patched_weights = False
+                        memory_freed += module_mem
+                        logging.debug("freed {}".format(n))

-                        m.prev_comfy_cast_weights = m.comfy_cast_weights
-                        m.comfy_cast_weights = True
-                    m.comfy_patched_weights = False
-                    memory_freed += module_mem
-                    logging.debug("freed {}".format(n))
+            self.model.model_lowvram = True
+            self.model.lowvram_patch_counter += patch_counter
+            self.model.model_loaded_weight_memory -= memory_freed
+            return memory_freed

-        self.model.model_lowvram = True
-        self.model.lowvram_patch_counter += patch_counter
-        self.model.model_loaded_weight_memory -= memory_freed
-        return memory_freed
+    def partially_load(self, device_to, extra_memory=0, force_patch_weights=False):
+        with self.use_ejected(skip_and_inject_on_exit_only=True):
+            unpatch_weights = self.model.current_weight_patches_uuid is not None and (self.model.current_weight_patches_uuid != self.patches_uuid or force_patch_weights)
+            # TODO: force_patch_weights should not unload + reload full model
+            used = self.model.model_loaded_weight_memory
+            self.unpatch_model(self.offload_device, unpatch_weights=unpatch_weights)
+            if unpatch_weights:
+                extra_memory += (used - self.model.model_loaded_weight_memory)

-    def partially_load(self, device_to, extra_memory=0):
-        self.unpatch_model(unpatch_weights=False)
-        self.patch_model(load_weights=False)
-        full_load = False
-        if self.model.model_lowvram == False:
-            return 0
-        if self.model.model_loaded_weight_memory + extra_memory > self.model_size():
-            full_load = True
-        current_used = self.model.model_loaded_weight_memory
-        self.load(device_to, lowvram_model_memory=current_used + extra_memory, full_load=full_load)
-        return self.model.model_loaded_weight_memory - current_used
+            self.patch_model(load_weights=False)
+            full_load = False
+            if self.model.model_lowvram == False and self.model.model_loaded_weight_memory > 0:
+                self.apply_hooks(self.forced_hooks, force_apply=True)
+                return 0
+            if self.model.model_loaded_weight_memory + extra_memory > self.model_size():
+                full_load = True
+            current_used = self.model.model_loaded_weight_memory
+            try:
+                self.load(device_to, lowvram_model_memory=current_used + extra_memory, force_patch_weights=force_patch_weights, full_load=full_load)
+            except Exception as e:
+                self.detach()
+                raise e
+
+            return self.model.model_loaded_weight_memory - current_used
+
+    def detach(self, unpatch_all=True):
+        self.eject_model()
+        self.model_patches_to(self.offload_device)
+        if unpatch_all:
+            self.unpatch_model(self.offload_device, unpatch_weights=unpatch_all)
+        for callback in self.get_all_callbacks(CallbacksMP.ON_DETACH):
+            callback(self, unpatch_all)
+        return self.model

    def current_loaded_device(self):
        return self.model.device
@@ -586,3 +775,346 @@ class ModelPatcher:
    def calculate_weight(self, patches, weight, key, intermediate_dtype=torch.float32):
        print("WARNING the ModelPatcher.calculate_weight function is deprecated, please use: comfy.lora.calculate_weight instead")
        return comfy.lora.calculate_weight(patches, weight, key, intermediate_dtype=intermediate_dtype)
+
+    def cleanup(self):
+        self.clean_hooks()
+        if hasattr(self.model, "current_patcher"):
+            self.model.current_patcher = None
+        for callback in self.get_all_callbacks(CallbacksMP.ON_CLEANUP):
+            callback(self)
+
+    def add_callback(self, call_type: str, callback: Callable):
+        self.add_callback_with_key(call_type, None, callback)
+
+    def add_callback_with_key(self, call_type: str, key: str, callback: Callable):
+        c = self.callbacks.setdefault(call_type, {}).setdefault(key, [])
+        c.append(callback)
+    
+    def remove_callbacks_with_key(self, call_type: str, key: str):
+        c = self.callbacks.get(call_type, {})
+        if key in c:
+            c.pop(key)
+
+    def get_callbacks(self, call_type: str, key: str):
+        return self.callbacks.get(call_type, {}).get(key, [])
+    
+    def get_all_callbacks(self, call_type: str):
+        c_list = []
+        for c in self.callbacks.get(call_type, {}).values():
+            c_list.extend(c)
+        return c_list
+
+    def add_wrapper(self, wrapper_type: str, wrapper: Callable):
+        self.add_wrapper_with_key(wrapper_type, None, wrapper)
+
+    def add_wrapper_with_key(self, wrapper_type: str, key: str, wrapper: Callable):
+        w = self.wrappers.setdefault(wrapper_type, {}).setdefault(key, [])
+        w.append(wrapper)
+    
+    def remove_wrappers_with_key(self, wrapper_type: str, key: str):
+        w = self.wrappers.get(wrapper_type, {})
+        if key in w:
+            w.pop(key)
+
+    def get_wrappers(self, wrapper_type: str, key: str):
+        return self.wrappers.get(wrapper_type, {}).get(key, [])
+
+    def get_all_wrappers(self, wrapper_type: str):
+        w_list = []
+        for w in self.wrappers.get(wrapper_type, {}).values():
+            w_list.extend(w)
+        return w_list
+
+    def set_attachments(self, key: str, attachment):
+        self.attachments[key] = attachment
+
+    def remove_attachments(self, key: str):
+        if key in self.attachments:
+            self.attachments.pop(key)
+    
+    def get_attachment(self, key: str):
+        return self.attachments.get(key, None)
+
+    def set_injections(self, key: str, injections: list[PatcherInjection]):
+        self.injections[key] = injections
+
+    def remove_injections(self, key: str):
+        if key in self.injections:
+            self.injections.pop(key)
+
+    def set_additional_models(self, key: str, models: list['ModelPatcher']):
+        self.additional_models[key] = models
+
+    def remove_additional_models(self, key: str):
+        if key in self.additional_models:
+            self.additional_models.pop(key)
+
+    def get_additional_models_with_key(self, key: str):
+        return self.additional_models.get(key, [])
+    
+    def get_additional_models(self):
+        all_models = []
+        for models in self.additional_models.values():
+            all_models.extend(models)
+        return all_models
+
+    def get_nested_additional_models(self):
+        def _evaluate_sub_additional_models(prev_models: list[ModelPatcher], cache_set: set[ModelPatcher]):
+            '''Make sure circular references do not cause infinite recursion.'''
+            next_models = []
+            for model in prev_models:
+                candidates = model.get_additional_models()
+                for c in candidates:
+                    if c not in cache_set:
+                        next_models.append(c)
+                        cache_set.add(c)
+            if len(next_models) == 0:
+                return prev_models
+            return prev_models + _evaluate_sub_additional_models(next_models, cache_set)
+
+        all_models = self.get_additional_models()
+        models_set = set(all_models)
+        real_all_models = _evaluate_sub_additional_models(prev_models=all_models, cache_set=models_set)
+        return real_all_models
+
+    def use_ejected(self, skip_and_inject_on_exit_only=False):
+        return AutoPatcherEjector(self, skip_and_inject_on_exit_only=skip_and_inject_on_exit_only)
+
+    def inject_model(self):
+        if self.is_injected or self.skip_injection:
+            return
+        for injections in self.injections.values():
+            for inj in injections:
+                inj.inject(self)
+                self.is_injected = True
+        if self.is_injected:
+            for callback in self.get_all_callbacks(CallbacksMP.ON_INJECT_MODEL):
+                callback(self)
+
+    def eject_model(self):
+        if not self.is_injected:
+            return
+        for injections in self.injections.values():
+            for inj in injections:
+                inj.eject(self)
+        self.is_injected = False
+        for callback in self.get_all_callbacks(CallbacksMP.ON_EJECT_MODEL):
+            callback(self)
+
+    def pre_run(self):
+        if hasattr(self.model, "current_patcher"):
+            self.model.current_patcher = self
+        for callback in self.get_all_callbacks(CallbacksMP.ON_PRE_RUN):
+            callback(self)
+    
+    def prepare_state(self, timestep):
+        for callback in self.get_all_callbacks(CallbacksMP.ON_PREPARE_STATE):
+            callback(self, timestep)
+
+    def restore_hook_patches(self):
+        if len(self.hook_patches_backup) > 0:
+            self.hook_patches = self.hook_patches_backup
+            self.hook_patches_backup = {}
+
+    def set_hook_mode(self, hook_mode: comfy.hooks.EnumHookMode):
+        self.hook_mode = hook_mode
+    
+    def prepare_hook_patches_current_keyframe(self, t: torch.Tensor, hook_group: comfy.hooks.HookGroup):
+        curr_t = t[0]
+        reset_current_hooks = False
+        for hook in hook_group.hooks:
+            changed = hook.hook_keyframe.prepare_current_keyframe(curr_t=curr_t)
+            # if keyframe changed, remove any cached HookGroups that contain hook with the same hook_ref;
+            # this will cause the weights to be recalculated when sampling
+            if changed:
+                # reset current_hooks if contains hook that changed
+                if self.current_hooks is not None:
+                    for current_hook in self.current_hooks.hooks:
+                        if current_hook == hook:
+                            reset_current_hooks = True
+                            break
+                for cached_group in list(self.cached_hook_patches.keys()):
+                    if cached_group.contains(hook):
+                        self.cached_hook_patches.pop(cached_group)
+        if reset_current_hooks:
+            self.patch_hooks(None)
+
+    def register_all_hook_patches(self, hooks_dict: dict[comfy.hooks.EnumHookType, dict[comfy.hooks.Hook, None]], target: comfy.hooks.EnumWeightTarget, model_options: dict=None):
+        self.restore_hook_patches()
+        registered_hooks: list[comfy.hooks.Hook] = []
+        # handle WrapperHooks, if model_options provided
+        if model_options is not None:
+            for hook in hooks_dict.get(comfy.hooks.EnumHookType.Wrappers, {}):
+                hook.add_hook_patches(self, model_options, target, registered_hooks)
+        # handle WeightHooks
+        weight_hooks_to_register: list[comfy.hooks.WeightHook] = []
+        for hook in hooks_dict.get(comfy.hooks.EnumHookType.Weight, {}):
+            if hook.hook_ref not in self.hook_patches:
+                weight_hooks_to_register.append(hook)
+        if len(weight_hooks_to_register) > 0:
+            # clone hook_patches to become backup so that any non-dynamic hooks will return to their original state
+            self.hook_patches_backup = create_hook_patches_clone(self.hook_patches)
+            for hook in weight_hooks_to_register:
+                hook.add_hook_patches(self, model_options, target, registered_hooks)
+        for callback in self.get_all_callbacks(CallbacksMP.ON_REGISTER_ALL_HOOK_PATCHES):
+            callback(self, hooks_dict, target)
+
+    def add_hook_patches(self, hook: comfy.hooks.WeightHook, patches, strength_patch=1.0, strength_model=1.0):
+        with self.use_ejected():
+            # NOTE: this mirrors behavior of add_patches func
+            current_hook_patches: dict[str,list] = self.hook_patches.get(hook.hook_ref, {})
+            p = set()
+            model_sd = self.model.state_dict()
+            for k in patches:
+                offset = None
+                function = None
+                if isinstance(k, str):
+                    key = k
+                else:
+                    offset = k[1]
+                    key = k[0]
+                    if len(k) > 2:
+                        function = k[2]
+                
+                if key in model_sd:
+                    p.add(k)
+                    current_patches: list[tuple] = current_hook_patches.get(key, [])
+                    current_patches.append((strength_patch, patches[k], strength_model, offset, function))
+                    current_hook_patches[key] = current_patches
+            self.hook_patches[hook.hook_ref] = current_hook_patches
+            # since should care about these patches too to determine if same model, reroll patches_uuid
+            self.patches_uuid = uuid.uuid4()
+            return list(p)
+
+    def get_combined_hook_patches(self, hooks: comfy.hooks.HookGroup):
+        # combined_patches will contain  weights of all relevant hooks, per key
+        combined_patches = {}
+        if hooks is not None:
+            for hook in hooks.hooks:
+                hook_patches: dict = self.hook_patches.get(hook.hook_ref, {})
+                for key in hook_patches.keys():
+                    current_patches: list[tuple] = combined_patches.get(key, [])
+                    if math.isclose(hook.strength, 1.0):
+                        current_patches.extend(hook_patches[key])
+                    else:
+                        # patches are stored as tuples: (strength_patch, (tuple_with_weights,), strength_model)
+                        for patch in hook_patches[key]:
+                            new_patch = list(patch)
+                            new_patch[0] *= hook.strength
+                            current_patches.append(tuple(new_patch))
+                    combined_patches[key] = current_patches
+        return combined_patches
+
+    def apply_hooks(self, hooks: comfy.hooks.HookGroup, transformer_options: dict=None, force_apply=False):
+        # TODO: return transformer_options dict with any additions from hooks
+        if self.current_hooks == hooks and (not force_apply or (not self.is_clip and hooks is None)):
+            return {}
+        self.patch_hooks(hooks=hooks)
+        for callback in self.get_all_callbacks(CallbacksMP.ON_APPLY_HOOKS):
+            callback(self, hooks)
+        return {}
+
+    def patch_hooks(self, hooks: comfy.hooks.HookGroup):
+        with self.use_ejected():
+            self.unpatch_hooks()
+            if hooks is not None:
+                model_sd_keys = list(self.model_state_dict().keys())
+                memory_counter = None
+                if self.hook_mode == comfy.hooks.EnumHookMode.MaxSpeed:
+                    # TODO: minimum_counter should have a minimum that conforms to loaded model requirements
+                    memory_counter = MemoryCounter(initial=comfy.model_management.get_free_memory(self.load_device),
+                                                minimum=comfy.model_management.minimum_inference_memory()*2)
+                # if have cached weights for hooks, use it
+                cached_weights = self.cached_hook_patches.get(hooks, None)
+                if cached_weights is not None:
+                    for key in cached_weights:
+                        if key not in model_sd_keys:
+                            print(f"WARNING cached hook could not patch. key does not exist in model: {key}")
+                            continue
+                        self.patch_cached_hook_weights(cached_weights=cached_weights, key=key, memory_counter=memory_counter)
+                else:
+                    relevant_patches = self.get_combined_hook_patches(hooks=hooks)
+                    original_weights = None
+                    if len(relevant_patches) > 0:
+                        original_weights = self.get_key_patches()
+                    for key in relevant_patches:
+                        if key not in model_sd_keys:
+                            print(f"WARNING cached hook would not patch. key does not exist in model: {key}")
+                            continue
+                        self.patch_hook_weight_to_device(hooks=hooks, combined_patches=relevant_patches, key=key, original_weights=original_weights,
+                                                            memory_counter=memory_counter)
+            self.current_hooks = hooks
+
+    def patch_cached_hook_weights(self, cached_weights: dict, key: str, memory_counter: MemoryCounter):
+        if key not in self.hook_backup:
+            weight: torch.Tensor = comfy.utils.get_attr(self.model, key)
+            target_device = self.offload_device
+            if self.hook_mode == comfy.hooks.EnumHookMode.MaxSpeed:
+                used = memory_counter.use(weight)
+                if used:
+                    target_device = weight.device
+            self.hook_backup[key] = (weight.to(device=target_device, copy=True), weight.device)
+        comfy.utils.copy_to_param(self.model, key, cached_weights[key][0].to(device=cached_weights[key][1]))
+
+    def clear_cached_hook_weights(self):
+        self.cached_hook_patches.clear()
+        self.patch_hooks(None)
+
+    def patch_hook_weight_to_device(self, hooks: comfy.hooks.HookGroup, combined_patches: dict, key: str, original_weights: dict, memory_counter: MemoryCounter):
+        if key not in combined_patches:
+            return
+        
+        weight, set_func, convert_func = get_key_weight(self.model, key)
+        weight: torch.Tensor
+        if key not in self.hook_backup:
+            target_device = self.offload_device
+            if self.hook_mode == comfy.hooks.EnumHookMode.MaxSpeed:
+                used = memory_counter.use(weight)
+                if used:
+                    target_device = weight.device
+            self.hook_backup[key] = (weight.to(device=target_device, copy=True), weight.device)
+        # TODO: properly handle LowVramPatch, if it ends up an issue
+        temp_weight = comfy.model_management.cast_to_device(weight, weight.device, torch.float32, copy=True)
+        if convert_func is not None:
+            temp_weight = convert_func(temp_weight, inplace=True)
+
+        out_weight = comfy.lora.calculate_weight(combined_patches[key],
+                                                 temp_weight,
+                                                 key, original_weights=original_weights)
+        del original_weights[key]
+        if set_func is None:
+            out_weight = comfy.float.stochastic_rounding(out_weight, weight.dtype, seed=string_to_seed(key))
+            comfy.utils.copy_to_param(self.model, key, out_weight)
+        else:
+            set_func(out_weight, inplace_update=True, seed=string_to_seed(key))
+        if self.hook_mode == comfy.hooks.EnumHookMode.MaxSpeed:
+            # TODO: disable caching if not enough system RAM to do so
+            target_device = self.offload_device
+            used = memory_counter.use(weight)
+            if used:
+                target_device = weight.device
+            self.cached_hook_patches.setdefault(hooks, {})
+            self.cached_hook_patches[hooks][key] = (out_weight.to(device=target_device, copy=False), weight.device)
+        del temp_weight
+        del out_weight
+        del weight
+    
+    def unpatch_hooks(self) -> None:
+        with self.use_ejected():
+            if len(self.hook_backup) == 0:
+                self.current_hooks = None
+                return
+            keys = list(self.hook_backup.keys())
+            for k in keys:
+                comfy.utils.copy_to_param(self.model, k, self.hook_backup[k][0].to(device=self.hook_backup[k][1]))
+                    
+            self.hook_backup.clear()
+            self.current_hooks = None
+
+    def clean_hooks(self):
+        self.unpatch_hooks()
+        self.clear_cached_hook_weights()
+
+    def __del__(self):
+        self.detach(unpatch_all=False)
+
--- a/comfy/model_sampling.py
+++ b/comfy/model_sampling.py
@@ -243,7 +243,7 @@ class ModelSamplingDiscreteFlow(torch.nn.Module):
            return 1.0
        if percent >= 1.0:
            return 0.0
-        return 1.0 - percent
+        return time_snr_shift(self.shift, 1.0 - percent)

 class StableCascadeSampling(ModelSamplingDiscrete):
    def __init__(self, model_config=None):
@@ -336,4 +336,4 @@ class ModelSamplingFlux(torch.nn.Module):
            return 1.0
        if percent >= 1.0:
            return 0.0
-        return 1.0 - percent
+        return flux_time_shift(self.shift, 1.0, 1.0 - percent)
--- a/comfy/ops.py
+++ b/comfy/ops.py
@@ -269,7 +269,7 @@ def fp8_linear(self, input):

        if scale_input is None:
            scale_input = torch.ones((), device=input.device, dtype=torch.float32)
-            inn = input.reshape(-1, input.shape[2]).to(dtype)
+            inn = torch.clamp(input, min=-448, max=448).reshape(-1, input.shape[2]).to(dtype)
        else:
            scale_input = scale_input.to(input.device)
            inn = (input * (1.0 / scale_input).to(input.dtype)).reshape(-1, input.shape[2]).to(dtype)
--- a/comfy/patcher_extension.py
+++ b/comfy/patcher_extension.py
@@ -0,0 +1,156 @@
+from __future__ import annotations
+from typing import Callable
+
+class CallbacksMP:
+    ON_CLONE = "on_clone"
+    ON_LOAD = "on_load_after"
+    ON_DETACH = "on_detach_after"
+    ON_CLEANUP = "on_cleanup"
+    ON_PRE_RUN = "on_pre_run"
+    ON_PREPARE_STATE = "on_prepare_state"
+    ON_APPLY_HOOKS = "on_apply_hooks"
+    ON_REGISTER_ALL_HOOK_PATCHES = "on_register_all_hook_patches"
+    ON_INJECT_MODEL = "on_inject_model"
+    ON_EJECT_MODEL = "on_eject_model"
+
+    # callbacks dict is in the format:
+    # {"call_type": {"key": [Callable1, Callable2, ...]} }
+    @classmethod
+    def init_callbacks(cls) -> dict[str, dict[str, list[Callable]]]:
+        return {}
+
+def add_callback(call_type: str, callback: Callable, transformer_options: dict, is_model_options=False):
+    add_callback_with_key(call_type, None, callback, transformer_options, is_model_options)
+
+def add_callback_with_key(call_type: str, key: str, callback: Callable, transformer_options: dict, is_model_options=False):
+    if is_model_options:
+        transformer_options = transformer_options.setdefault("transformer_options", {})
+    callbacks: dict[str, dict[str, list]] = transformer_options.setdefault("callbacks", {})
+    c = callbacks.setdefault(call_type, {}).setdefault(key, [])
+    c.append(callback)
+
+def get_callbacks_with_key(call_type: str, key: str, transformer_options: dict, is_model_options=False):
+    if is_model_options:
+        transformer_options = transformer_options.get("transformer_options", {})
+    c_list = []
+    callbacks: dict[str, list] = transformer_options.get("callbacks", {})
+    c_list.extend(callbacks.get(call_type, {}).get(key, []))
+    return c_list
+
+def get_all_callbacks(call_type: str, transformer_options: dict, is_model_options=False):
+    if is_model_options:
+        transformer_options = transformer_options.get("transformer_options", {})
+    c_list = []
+    callbacks: dict[str, list] = transformer_options.get("callbacks", {})
+    for c in callbacks.get(call_type, {}).values():
+        c_list.extend(c)
+    return c_list
+
+class WrappersMP:
+    OUTER_SAMPLE = "outer_sample"
+    SAMPLER_SAMPLE = "sampler_sample"
+    CALC_COND_BATCH = "calc_cond_batch"
+    APPLY_MODEL = "apply_model"
+    DIFFUSION_MODEL = "diffusion_model"
+
+    # wrappers dict is in the format:
+    # {"wrapper_type": {"key": [Callable1, Callable2, ...]} }
+    @classmethod
+    def init_wrappers(cls) -> dict[str, dict[str, list[Callable]]]:
+        return {}
+
+def add_wrapper(wrapper_type: str, wrapper: Callable, transformer_options: dict, is_model_options=False):
+    add_wrapper_with_key(wrapper_type, None, wrapper, transformer_options, is_model_options)
+
+def add_wrapper_with_key(wrapper_type: str, key: str, wrapper: Callable, transformer_options: dict, is_model_options=False):
+    if is_model_options:
+        transformer_options = transformer_options.setdefault("transformer_options", {})
+    wrappers: dict[str, dict[str, list]] = transformer_options.setdefault("wrappers", {})
+    w = wrappers.setdefault(wrapper_type, {}).setdefault(key, [])
+    w.append(wrapper)
+
+def get_wrappers_with_key(wrapper_type: str, key: str, transformer_options: dict, is_model_options=False):
+    if is_model_options:
+        transformer_options = transformer_options.get("transformer_options", {})
+    w_list = []
+    wrappers: dict[str, list] = transformer_options.get("wrappers", {})
+    w_list.extend(wrappers.get(wrapper_type, {}).get(key, []))
+    return w_list
+
+def get_all_wrappers(wrapper_type: str, transformer_options: dict, is_model_options=False):
+    if is_model_options:
+        transformer_options = transformer_options.get("transformer_options", {})
+    w_list = []
+    wrappers: dict[str, list] = transformer_options.get("wrappers", {})
+    for w in wrappers.get(wrapper_type, {}).values():
+        w_list.extend(w)
+    return w_list
+
+class WrapperExecutor:
+    """Handles call stack of wrappers around a function in an ordered manner."""
+    def __init__(self, original: Callable, class_obj: object, wrappers: list[Callable], idx: int):
+        # NOTE: class_obj exists so that wrappers surrounding a class method can access
+        #       the class instance at runtime via executor.class_obj
+        self.original = original
+        self.class_obj = class_obj
+        self.wrappers = wrappers.copy()
+        self.idx = idx
+        self.is_last = idx == len(wrappers)
+    
+    def __call__(self, *args, **kwargs):
+        """Calls the next wrapper or original function, whichever is appropriate."""
+        new_executor = self._create_next_executor()
+        return new_executor.execute(*args, **kwargs)
+    
+    def execute(self, *args, **kwargs):
+        """Used to initiate executor internally - DO NOT use this if you received executor in wrapper."""
+        args = list(args)
+        kwargs = dict(kwargs)
+        if self.is_last:
+            return self.original(*args, **kwargs)
+        return self.wrappers[self.idx](self, *args, **kwargs)
+
+    def _create_next_executor(self) -> 'WrapperExecutor':
+        new_idx = self.idx + 1
+        if new_idx > len(self.wrappers):
+            raise Exception("Wrapper idx exceeded available wrappers; something went very wrong.")
+        if self.class_obj is None:
+            return WrapperExecutor.new_executor(self.original, self.wrappers, new_idx)
+        return WrapperExecutor.new_class_executor(self.original, self.class_obj, self.wrappers, new_idx)
+
+    @classmethod
+    def new_executor(cls, original: Callable, wrappers: list[Callable], idx=0):
+        return cls(original, class_obj=None, wrappers=wrappers, idx=idx)
+    
+    @classmethod
+    def new_class_executor(cls, original: Callable, class_obj: object, wrappers: list[Callable], idx=0):
+        return cls(original, class_obj, wrappers, idx=idx)
+
+class PatcherInjection:
+    def __init__(self, inject: Callable, eject: Callable):
+        self.inject = inject
+        self.eject = eject
+
+def copy_nested_dicts(input_dict: dict):
+    new_dict = input_dict.copy()
+    for key, value in input_dict.items():
+        if isinstance(value, dict):
+            new_dict[key] = copy_nested_dicts(value)
+        elif isinstance(value, list):
+            new_dict[key] = value.copy()
+    return new_dict
+
+def merge_nested_dicts(dict1: dict, dict2: dict, copy_dict1=True):
+    if copy_dict1:
+        merged_dict = copy_nested_dicts(dict1)
+    else:
+        merged_dict = dict1
+    for key, value in dict2.items():
+        if isinstance(value, dict):
+            curr_value = merged_dict.setdefault(key, {})
+            merged_dict[key] = merge_nested_dicts(value, curr_value)
+        elif isinstance(value, list):
+            merged_dict.setdefault(key, []).extend(value)
+        else:
+            merged_dict[key] = value
+    return merged_dict
--- a/comfy/sampler_helpers.py
+++ b/comfy/sampler_helpers.py
@@ -1,7 +1,15 @@
-import torch
+from __future__ import annotations
+import uuid
 import comfy.model_management
 import comfy.conds
 import comfy.utils
+import comfy.hooks
+import comfy.patcher_extension
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    from comfy.model_patcher import ModelPatcher
+    from comfy.model_base import BaseModel
+    from comfy.controlnet import ControlBase

 def prepare_mask(noise_mask, shape, device):
    return comfy.utils.reshape_mask(noise_mask, shape).to(device)
@@ -10,9 +18,43 @@ def get_models_from_cond(cond, model_type):
    models = []
    for c in cond:
        if model_type in c:
-            models += [c[model_type]]
+            if isinstance(c[model_type], list):
+                models += c[model_type]
+            else:
+                models += [c[model_type]]
    return models

+def get_hooks_from_cond(cond, hooks_dict: dict[comfy.hooks.EnumHookType, dict[comfy.hooks.Hook, None]]):
+    # get hooks from conds, and collect cnets so they can be checked for extra_hooks
+    cnets: list[ControlBase] = []
+    for c in cond:
+        if 'hooks' in c:
+            for hook in c['hooks'].hooks:
+                hook: comfy.hooks.Hook
+                with_type = hooks_dict.setdefault(hook.hook_type, {})
+                with_type[hook] = None
+        if 'control' in c:
+            cnets.append(c['control'])
+
+    def get_extra_hooks_from_cnet(cnet: ControlBase, _list: list):
+        if cnet.extra_hooks is not None:
+            _list.append(cnet.extra_hooks)
+        if cnet.previous_controlnet is None:
+            return _list
+        return get_extra_hooks_from_cnet(cnet.previous_controlnet, _list)
+        
+    hooks_list = []
+    cnets = set(cnets)
+    for base_cnet in cnets:
+        get_extra_hooks_from_cnet(base_cnet, hooks_list)
+    extra_hooks = comfy.hooks.HookGroup.combine_all_hooks(hooks_list)
+    if extra_hooks is not None:
+        for hook in extra_hooks.hooks:
+            with_type = hooks_dict.setdefault(hook.hook_type, {})
+            with_type[hook] = None
+
+    return hooks_dict
+
 def convert_cond(cond):
    out = []
    for c in cond:
@@ -22,17 +64,22 @@ def convert_cond(cond):
            model_conds["c_crossattn"] = comfy.conds.CONDCrossAttn(c[0]) #TODO: remove
            temp["cross_attn"] = c[0]
        temp["model_conds"] = model_conds
+        temp["uuid"] = uuid.uuid4()
        out.append(temp)
    return out

 def get_additional_models(conds, dtype):
    """loads additional models in conditioning"""
-    cnets = []
+    cnets: list[ControlBase] = []
    gligen = []
+    add_models = []
+    hooks: dict[comfy.hooks.EnumHookType, dict[comfy.hooks.Hook, None]] = {}

    for k in conds:
        cnets += get_models_from_cond(conds[k], "control")
        gligen += get_models_from_cond(conds[k], "gligen")
+        add_models += get_models_from_cond(conds[k], "additional_models")
+        get_hooks_from_cond(conds[k], hooks)

    control_nets = set(cnets)

@@ -43,7 +90,9 @@ def get_additional_models(conds, dtype):
        inference_memory += m.inference_memory_requirements(dtype)

    gligen = [x[1] for x in gligen]
-    models = control_models + gligen
+    hook_models = [x.model for x in hooks.get(comfy.hooks.EnumHookType.AddModels, {}).keys()]
+    models = control_models + gligen + add_models + hook_models
+
    return models, inference_memory

 def cleanup_additional_models(models):
@@ -53,10 +102,10 @@ def cleanup_additional_models(models):
            m.cleanup()


-def prepare_sampling(model, noise_shape, conds):
-    device = model.load_device
-    real_model = None
+def prepare_sampling(model: 'ModelPatcher', noise_shape, conds):
+    real_model: 'BaseModel' = None
    models, inference_memory = get_additional_models(conds, model.model_dtype())
+    models += model.get_nested_additional_models()  # TODO: does this require inference_memory update?
    memory_required = model.memory_required([noise_shape[0] * 2] + list(noise_shape[1:])) + inference_memory
    minimum_memory_required = model.memory_required([noise_shape[0]] + list(noise_shape[1:])) + inference_memory
    comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required, minimum_memory_required=minimum_memory_required)
@@ -72,3 +121,14 @@ def cleanup_models(conds, models):
        control_cleanup += get_models_from_cond(conds[k], "control")

    cleanup_additional_models(set(control_cleanup))
+
+def prepare_model_patcher(model: 'ModelPatcher', conds, model_options: dict):
+    # check for hooks in conds - if not registered, see if can be applied
+    hooks = {}
+    for k in conds:
+        get_hooks_from_cond(conds[k], hooks)
+    # add wrappers and callbacks from ModelPatcher to transformer_options
+    model_options["transformer_options"]["wrappers"] = comfy.patcher_extension.copy_nested_dicts(model.wrappers)
+    model_options["transformer_options"]["callbacks"] = comfy.patcher_extension.copy_nested_dicts(model.callbacks)
+    # register hooks on model/model_options
+    model.register_all_hook_patches(hooks, comfy.hooks.EnumWeightTarget.Model, model_options)
--- a/comfy/samplers.py
+++ b/comfy/samplers.py
@@ -1,11 +1,21 @@
+from __future__ import annotations
 from .k_diffusion import sampling as k_diffusion_sampling
 from .extra_samplers import uni_pc
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    from comfy.model_patcher import ModelPatcher
+    from comfy.model_base import BaseModel
+    from comfy.controlnet import ControlBase
 import torch
 import collections
 from comfy import model_management
 import math
 import logging
+import comfy.samplers
 import comfy.sampler_helpers
+import comfy.model_patcher
+import comfy.patcher_extension
+import comfy.hooks
 import scipy.stats
 import numpy

@@ -70,6 +80,7 @@ def get_area_and_mult(conds, x_in, timestep_in):
    for c in model_conds:
        conditioning[c] = model_conds[c].process_cond(batch_size=x_in.shape[0], device=x_in.device, area=area)

+    hooks = conds.get('hooks', None)
    control = conds.get('control', None)

    patches = None
@@ -85,8 +96,8 @@ def get_area_and_mult(conds, x_in, timestep_in):

        patches['middle_patch'] = [gligen_patch]

-    cond_obj = collections.namedtuple('cond_obj', ['input_x', 'mult', 'conditioning', 'area', 'control', 'patches'])
-    return cond_obj(input_x, mult, conditioning, area, control, patches)
+    cond_obj = collections.namedtuple('cond_obj', ['input_x', 'mult', 'conditioning', 'area', 'control', 'patches', 'uuid', 'hooks'])
+    return cond_obj(input_x, mult, conditioning, area, control, patches, conds['uuid'], hooks)

 def cond_equal_size(c1, c2):
    if c1 is c2:
@@ -119,11 +130,6 @@ def can_concat_cond(c1, c2):
    return cond_equal_size(c1.conditioning, c2.conditioning)

 def cond_cat(c_list):
-    c_crossattn = []
-    c_concat = []
-    c_adm = []
-    crossattn_max_len = 0
-
    temp = {}
    for x in c_list:
        for k in x:
@@ -138,110 +144,184 @@ def cond_cat(c_list):

    return out

-def calc_cond_batch(model, conds, x_in, timestep, model_options):
+def finalize_default_conds(model: 'BaseModel', hooked_to_run: dict[comfy.hooks.HookGroup,list[tuple[tuple,int]]], default_conds: list[list[dict]], x_in, timestep):
+    # need to figure out remaining unmasked area for conds
+    default_mults = []
+    for _ in default_conds:
+        default_mults.append(torch.ones_like(x_in))
+    # look through each finalized cond in hooked_to_run for 'mult' and subtract it from each cond
+    for lora_hooks, to_run in hooked_to_run.items():
+        for cond_obj, i in to_run:
+            # if no default_cond for cond_type, do nothing
+            if len(default_conds[i]) == 0:
+                continue
+            area: list[int] = cond_obj.area
+            if area is not None:
+                curr_default_mult: torch.Tensor = default_mults[i]
+                dims = len(area) // 2
+                for i in range(dims):
+                    curr_default_mult = curr_default_mult.narrow(i + 2, area[i + dims], area[i])
+                curr_default_mult -= cond_obj.mult
+            else:
+                default_mults[i] -= cond_obj.mult
+    # for each default_mult, ReLU to make negatives=0, and then check for any nonzeros
+    for i, mult in enumerate(default_mults):
+        # if no default_cond for cond type, do nothing
+        if len(default_conds[i]) == 0:
+            continue
+        torch.nn.functional.relu(mult, inplace=True)
+        # if mult is all zeros, then don't add default_cond
+        if torch.max(mult) == 0.0:
+            continue
+
+        cond = default_conds[i]
+        for x in cond:
+            # do get_area_and_mult to get all the expected values
+            p = comfy.samplers.get_area_and_mult(x, x_in, timestep)
+            if p is None:
+                continue
+            # replace p's mult with calculated mult
+            p = p._replace(mult=mult)
+            if p.hooks is not None:
+                model.current_patcher.prepare_hook_patches_current_keyframe(timestep, p.hooks)
+            hooked_to_run.setdefault(p.hooks, list())
+            hooked_to_run[p.hooks] += [(p, i)]
+
+def calc_cond_batch(model: 'BaseModel', conds: list[list[dict]], x_in: torch.Tensor, timestep, model_options):
+    executor = comfy.patcher_extension.WrapperExecutor.new_executor(
+        _calc_cond_batch,
+        comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.CALC_COND_BATCH, model_options, is_model_options=True)
+    )
+    return executor.execute(model, conds, x_in, timestep, model_options)
+
+def _calc_cond_batch(model: 'BaseModel', conds: list[list[dict]], x_in: torch.Tensor, timestep, model_options):
    out_conds = []
    out_counts = []
-    to_run = []
+    # separate conds by matching hooks
+    hooked_to_run: dict[comfy.hooks.HookGroup,list[tuple[tuple,int]]] = {}
+    default_conds = []
+    has_default_conds = False

    for i in range(len(conds)):
        out_conds.append(torch.zeros_like(x_in))
        out_counts.append(torch.ones_like(x_in) * 1e-37)

        cond = conds[i]
+        default_c = []
        if cond is not None:
            for x in cond:
-                p = get_area_and_mult(x, x_in, timestep)
+                if 'default' in x:
+                    default_c.append(x)
+                    has_default_conds = True
+                    continue
+                p = comfy.samplers.get_area_and_mult(x, x_in, timestep)
                if p is None:
                    continue
+                if p.hooks is not None:
+                    model.current_patcher.prepare_hook_patches_current_keyframe(timestep, p.hooks)
+                hooked_to_run.setdefault(p.hooks, list())
+                hooked_to_run[p.hooks] += [(p, i)]
+        default_conds.append(default_c)

-                to_run += [(p, i)]
+    if has_default_conds:
+        finalize_default_conds(model, hooked_to_run, default_conds, x_in, timestep)

-    while len(to_run) > 0:
-        first = to_run[0]
-        first_shape = first[0][0].shape
-        to_batch_temp = []
-        for x in range(len(to_run)):
-            if can_concat_cond(to_run[x][0], first[0]):
-                to_batch_temp += [x]
+    model.current_patcher.prepare_state(timestep)

-        to_batch_temp.reverse()
-        to_batch = to_batch_temp[:1]
+    # run every hooked_to_run separately
+    for hooks, to_run in hooked_to_run.items():
+        while len(to_run) > 0:
+            first = to_run[0]
+            first_shape = first[0][0].shape
+            to_batch_temp = []
+            for x in range(len(to_run)):
+                if can_concat_cond(to_run[x][0], first[0]):
+                    to_batch_temp += [x]

-        free_memory = model_management.get_free_memory(x_in.device)
-        for i in range(1, len(to_batch_temp) + 1):
-            batch_amount = to_batch_temp[:len(to_batch_temp)//i]
-            input_shape = [len(batch_amount) * first_shape[0]] + list(first_shape)[1:]
-            if model.memory_required(input_shape) * 1.5 < free_memory:
-                to_batch = batch_amount
-                break
+            to_batch_temp.reverse()
+            to_batch = to_batch_temp[:1]

-        input_x = []
-        mult = []
-        c = []
-        cond_or_uncond = []
-        area = []
-        control = None
-        patches = None
-        for x in to_batch:
-            o = to_run.pop(x)
-            p = o[0]
-            input_x.append(p.input_x)
-            mult.append(p.mult)
-            c.append(p.conditioning)
-            area.append(p.area)
-            cond_or_uncond.append(o[1])
-            control = p.control
-            patches = p.patches
+            free_memory = model_management.get_free_memory(x_in.device)
+            for i in range(1, len(to_batch_temp) + 1):
+                batch_amount = to_batch_temp[:len(to_batch_temp)//i]
+                input_shape = [len(batch_amount) * first_shape[0]] + list(first_shape)[1:]
+                if model.memory_required(input_shape) * 1.5 < free_memory:
+                    to_batch = batch_amount
+                    break

-        batch_chunks = len(cond_or_uncond)
-        input_x = torch.cat(input_x)
-        c = cond_cat(c)
-        timestep_ = torch.cat([timestep] * batch_chunks)
+            input_x = []
+            mult = []
+            c = []
+            cond_or_uncond = []
+            uuids = []
+            area = []
+            control = None
+            patches = None
+            for x in to_batch:
+                o = to_run.pop(x)
+                p = o[0]
+                input_x.append(p.input_x)
+                mult.append(p.mult)
+                c.append(p.conditioning)
+                area.append(p.area)
+                cond_or_uncond.append(o[1])
+                uuids.append(p.uuid)
+                control = p.control
+                patches = p.patches

-        if control is not None:
-            c['control'] = control.get_control(input_x, timestep_, c, len(cond_or_uncond))
+            batch_chunks = len(cond_or_uncond)
+            input_x = torch.cat(input_x)
+            c = cond_cat(c)
+            timestep_ = torch.cat([timestep] * batch_chunks)

-        transformer_options = {}
-        if 'transformer_options' in model_options:
-            transformer_options = model_options['transformer_options'].copy()
+            transformer_options = model.current_patcher.apply_hooks(hooks=hooks)
+            if 'transformer_options' in model_options:
+                transformer_options = comfy.patcher_extension.merge_nested_dicts(transformer_options,
+                                                                                 model_options['transformer_options'],
+                                                                                 copy_dict1=False)

-        if patches is not None:
-            if "patches" in transformer_options:
-                cur_patches = transformer_options["patches"].copy()
-                for p in patches:
-                    if p in cur_patches:
-                        cur_patches[p] = cur_patches[p] + patches[p]
-                    else:
-                        cur_patches[p] = patches[p]
-                transformer_options["patches"] = cur_patches
+            if patches is not None:
+                # TODO: replace with merge_nested_dicts function
+                if "patches" in transformer_options:
+                    cur_patches = transformer_options["patches"].copy()
+                    for p in patches:
+                        if p in cur_patches:
+                            cur_patches[p] = cur_patches[p] + patches[p]
+                        else:
+                            cur_patches[p] = patches[p]
+                    transformer_options["patches"] = cur_patches
+                else:
+                    transformer_options["patches"] = patches
+
+            transformer_options["cond_or_uncond"] = cond_or_uncond[:]
+            transformer_options["uuids"] = uuids[:]
+            transformer_options["sigmas"] = timestep
+
+            c['transformer_options'] = transformer_options
+
+            if control is not None:
+                c['control'] = control.get_control(input_x, timestep_, c, len(cond_or_uncond), transformer_options)
+
+            if 'model_function_wrapper' in model_options:
+                output = model_options['model_function_wrapper'](model.apply_model, {"input": input_x, "timestep": timestep_, "c": c, "cond_or_uncond": cond_or_uncond}).chunk(batch_chunks)
            else:
-                transformer_options["patches"] = patches
+                output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)

-        transformer_options["cond_or_uncond"] = cond_or_uncond[:]
-        transformer_options["sigmas"] = timestep
-
-        c['transformer_options'] = transformer_options
-
-        if 'model_function_wrapper' in model_options:
-            output = model_options['model_function_wrapper'](model.apply_model, {"input": input_x, "timestep": timestep_, "c": c, "cond_or_uncond": cond_or_uncond}).chunk(batch_chunks)
-        else:
-            output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
-
-        for o in range(batch_chunks):
-            cond_index = cond_or_uncond[o]
-            a = area[o]
-            if a is None:
-                out_conds[cond_index] += output[o] * mult[o]
-                out_counts[cond_index] += mult[o]
-            else:
-                out_c = out_conds[cond_index]
-                out_cts = out_counts[cond_index]
-                dims = len(a) // 2
-                for i in range(dims):
-                    out_c = out_c.narrow(i + 2, a[i + dims], a[i])
-                    out_cts = out_cts.narrow(i + 2, a[i + dims], a[i])
-                out_c += output[o] * mult[o]
-                out_cts += mult[o]
+            for o in range(batch_chunks):
+                cond_index = cond_or_uncond[o]
+                a = area[o]
+                if a is None:
+                    out_conds[cond_index] += output[o] * mult[o]
+                    out_counts[cond_index] += mult[o]
+                else:
+                    out_c = out_conds[cond_index]
+                    out_cts = out_counts[cond_index]
+                    dims = len(a) // 2
+                    for i in range(dims):
+                        out_c = out_c.narrow(i + 2, a[i + dims], a[i])
+                        out_cts = out_cts.narrow(i + 2, a[i + dims], a[i])
+                    out_c += output[o] * mult[o]
+                    out_cts += mult[o]

    for i in range(len(out_conds)):
        out_conds[i] /= out_counts[i]
@@ -261,7 +341,7 @@ def cfg_function(model, cond_pred, uncond_pred, cond_scale, x, timestep, model_o
        cfg_result = uncond_pred + (cond_pred - uncond_pred) * cond_scale

    for fn in model_options.get("sampler_post_cfg_function", []):
-        args = {"denoised": cfg_result, "cond": cond, "uncond": uncond, "model": model, "uncond_denoised": uncond_pred, "cond_denoised": cond_pred,
+        args = {"denoised": cfg_result, "cond": cond, "uncond": uncond, "cond_scale": cond_scale, "model": model, "uncond_denoised": uncond_pred, "cond_denoised": cond_pred,
                "sigma": timestep, "model_options": model_options, "input": x}
        cfg_result = fn(args)

@@ -500,10 +580,15 @@ def calculate_start_end_timesteps(model, conds):

        timestep_start = None
        timestep_end = None
-        if 'start_percent' in x:
-            timestep_start = s.percent_to_sigma(x['start_percent'])
-        if 'end_percent' in x:
-            timestep_end = s.percent_to_sigma(x['end_percent'])
+        # handle clip hook schedule, if needed
+        if 'clip_start_percent' in x:
+            timestep_start = s.percent_to_sigma(max(x['clip_start_percent'], x.get('start_percent', 0.0)))
+            timestep_end = s.percent_to_sigma(min(x['clip_end_percent'], x.get('end_percent', 1.0)))
+        else:
+            if 'start_percent' in x:
+                timestep_start = s.percent_to_sigma(x['start_percent'])
+            if 'end_percent' in x:
+                timestep_end = s.percent_to_sigma(x['end_percent'])

        if (timestep_start is not None) or (timestep_end is not None):
            n = x.copy()
@@ -518,8 +603,6 @@ def pre_run_control(model, conds):
    for t in range(len(conds)):
        x = conds[t]

-        timestep_start = None
-        timestep_end = None
        percent_to_timestep_function = lambda a: s.percent_to_sigma(a)
        if 'control' in x:
            x['control'].pre_run(model, percent_to_timestep_function)
@@ -673,6 +756,12 @@ def process_conds(model, noise, conds, device, latent_image=None, denoise_mask=N
                if k != kk:
                    create_cond_with_same_area_if_none(conds[kk], c)

+    for k in conds:
+        for c in conds[k]:
+            if 'hooks' in c:
+                for hook in c['hooks'].hooks:
+                    hook.initialize_timesteps(model)
+
    for k in conds:
        pre_run_control(model, conds[k])

@@ -685,9 +774,46 @@ def process_conds(model, noise, conds, device, latent_image=None, denoise_mask=N

    return conds

+
+def preprocess_conds_hooks(conds: dict[str, list[dict[str]]]):
+    # determine which ControlNets have extra_hooks that should be combined with normal hooks
+    hook_replacement: dict[tuple[ControlBase, comfy.hooks.HookGroup], list[dict]] = {}
+    for k in conds:
+        for kk in conds[k]:
+            if 'control' in kk:
+                control: 'ControlBase' = kk['control']
+                extra_hooks = control.get_extra_hooks()
+                if len(extra_hooks) > 0:
+                    hooks: comfy.hooks.HookGroup = kk.get('hooks', None)
+                    to_replace = hook_replacement.setdefault((control, hooks), [])
+                    to_replace.append(kk)
+    # if nothing to replace, do nothing
+    if len(hook_replacement) == 0:
+        return
+
+    # for optimal sampling performance, common ControlNets + hook combos should have identical hooks
+    # on the cond dicts
+    for key, conds_to_modify in hook_replacement.items():
+        control = key[0]
+        hooks = key[1]
+        hooks = comfy.hooks.HookGroup.combine_all_hooks(control.get_extra_hooks() + [hooks])
+        # if combined hooks are not None, set as new hooks for all relevant conds
+        if hooks is not None:
+            for cond in conds_to_modify:
+                cond['hooks'] = hooks
+
+
+def get_total_hook_groups_in_conds(conds: dict[str, list[dict[str]]]):
+    hooks_set = set()
+    for k in conds:
+        for kk in conds[k]:
+            hooks_set.add(kk.get('hooks', None))
+    return len(hooks_set)
+
+
 class CFGGuider:
    def __init__(self, model_patcher):
-        self.model_patcher = model_patcher
+        self.model_patcher: 'ModelPatcher' = model_patcher
        self.model_options = model_patcher.model_options
        self.original_conds = {}
        self.cfg = 1.0
@@ -714,19 +840,17 @@ class CFGGuider:

        self.conds = process_conds(self.inner_model, noise, self.conds, device, latent_image, denoise_mask, seed)

-        extra_args = {"model_options": self.model_options, "seed":seed}
+        extra_args = {"model_options": comfy.model_patcher.create_model_options_clone(self.model_options), "seed": seed}

-        samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
+        executor = comfy.patcher_extension.WrapperExecutor.new_class_executor(
+            sampler.sample,
+            sampler,
+            comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.SAMPLER_SAMPLE, extra_args["model_options"], is_model_options=True)
+        )
+        samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
        return self.inner_model.process_latent_out(samples.to(torch.float32))

-    def sample(self, noise, latent_image, sampler, sigmas, denoise_mask=None, callback=None, disable_pbar=False, seed=None):
-        if sigmas.shape[-1] == 0:
-            return latent_image
-
-        self.conds = {}
-        for k in self.original_conds:
-            self.conds[k] = list(map(lambda a: a.copy(), self.original_conds[k]))
-
+    def outer_sample(self, noise, latent_image, sampler, sigmas, denoise_mask=None, callback=None, disable_pbar=False, seed=None):
        self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds)
        device = self.model_patcher.load_device

@@ -737,14 +861,48 @@ class CFGGuider:
        latent_image = latent_image.to(device)
        sigmas = sigmas.to(device)

-        output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
+        try:
+            self.model_patcher.pre_run()
+            output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
+        finally:
+            self.model_patcher.cleanup()

        comfy.sampler_helpers.cleanup_models(self.conds, self.loaded_models)
        del self.inner_model
-        del self.conds
        del self.loaded_models
        return output

+    def sample(self, noise, latent_image, sampler, sigmas, denoise_mask=None, callback=None, disable_pbar=False, seed=None):
+        if sigmas.shape[-1] == 0:
+            return latent_image
+
+        self.conds = {}
+        for k in self.original_conds:
+            self.conds[k] = list(map(lambda a: a.copy(), self.original_conds[k]))
+        preprocess_conds_hooks(self.conds)
+
+        try:
+            orig_model_options = self.model_options
+            self.model_options = comfy.model_patcher.create_model_options_clone(self.model_options)
+            # if one hook type (or just None), then don't bother caching weights for hooks (will never change after first step)
+            orig_hook_mode = self.model_patcher.hook_mode
+            if get_total_hook_groups_in_conds(self.conds) <= 1:
+                self.model_patcher.hook_mode = comfy.hooks.EnumHookMode.MinVram
+            comfy.sampler_helpers.prepare_model_patcher(self.model_patcher, self.conds, self.model_options)
+            executor = comfy.patcher_extension.WrapperExecutor.new_class_executor(
+                self.outer_sample,
+                self,
+                comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.OUTER_SAMPLE, self.model_options, is_model_options=True)
+            )
+            output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
+        finally:
+            self.model_options = orig_model_options
+            self.model_patcher.hook_mode = orig_hook_mode
+            self.model_patcher.restore_hook_patches()
+
+        del self.conds
+        return output
+

 def sample(model, noise, positive, negative, cfg, device, sampler, sigmas, model_options={}, latent_image=None, denoise_mask=None, callback=None, disable_pbar=False, seed=None):
    cfg_guider = CFGGuider(model)
--- a/comfy/sd.py
+++ b/comfy/sd.py
@@ -1,8 +1,10 @@
+from __future__ import annotations
 import torch
 from enum import Enum
 import logging

 from comfy import model_management
+from comfy.utils import ProgressBar
 from .ldm.models.autoencoder import AutoencoderKL, AutoencodingEngine
 from .ldm.cascade.stage_a import StageA
 from .ldm.cascade.stage_c_coder import StageC_coder
@@ -29,10 +31,12 @@ import comfy.text_encoders.flux
 import comfy.text_encoders.long_clipl
 import comfy.text_encoders.genmo
 import comfy.text_encoders.lt
+import comfy.text_encoders.hunyuan_video

 import comfy.model_patcher
 import comfy.lora
 import comfy.lora_convert
+import comfy.hooks
 import comfy.t2i_adapter.adapter
 import comfy.taesd.taesd

@@ -98,9 +102,13 @@ class CLIP:

        self.tokenizer = tokenizer(embedding_directory=embedding_directory, tokenizer_data=tokenizer_data)
        self.patcher = comfy.model_patcher.ModelPatcher(self.cond_stage_model, load_device=load_device, offload_device=offload_device)
+        self.patcher.hook_mode = comfy.hooks.EnumHookMode.MinVram
+        self.patcher.is_clip = True
+        self.apply_hooks_to_conds = None
        if params['device'] == load_device:
            model_management.load_models_gpu([self.patcher], force_full_load=True)
        self.layer_idx = None
+        self.use_clip_schedule = False
        logging.debug("CLIP model load device: {}, offload device: {}, current: {}".format(load_device, offload_device, params['device']))

    def clone(self):
@@ -109,6 +117,8 @@ class CLIP:
        n.cond_stage_model = self.cond_stage_model
        n.tokenizer = self.tokenizer
        n.layer_idx = self.layer_idx
+        n.use_clip_schedule = self.use_clip_schedule
+        n.apply_hooks_to_conds = self.apply_hooks_to_conds
        return n

    def add_patches(self, patches, strength_patch=1.0, strength_model=1.0):
@@ -120,6 +130,69 @@ class CLIP:
    def tokenize(self, text, return_word_ids=False):
        return self.tokenizer.tokenize_with_weights(text, return_word_ids)

+    def add_hooks_to_dict(self, pooled_dict: dict[str]):
+        if self.apply_hooks_to_conds:
+            pooled_dict["hooks"] = self.apply_hooks_to_conds
+        return pooled_dict
+
+    def encode_from_tokens_scheduled(self, tokens, unprojected=False, add_dict: dict[str]={}, show_pbar=True):
+        all_cond_pooled: list[tuple[torch.Tensor, dict[str]]] = []
+        all_hooks = self.patcher.forced_hooks
+        if all_hooks is None or not self.use_clip_schedule:
+            # if no hooks or shouldn't use clip schedule, do unscheduled encode_from_tokens and perform add_dict
+            return_pooled = "unprojected" if unprojected else True
+            pooled_dict = self.encode_from_tokens(tokens, return_pooled=return_pooled, return_dict=True)
+            cond = pooled_dict.pop("cond")
+            # add/update any keys with the provided add_dict
+            pooled_dict.update(add_dict)
+            all_cond_pooled.append([cond, pooled_dict])
+        else:
+            scheduled_keyframes = all_hooks.get_hooks_for_clip_schedule()
+
+            self.cond_stage_model.reset_clip_options()
+            if self.layer_idx is not None:
+                self.cond_stage_model.set_clip_options({"layer": self.layer_idx})
+            if unprojected:
+                self.cond_stage_model.set_clip_options({"projected_pooled": False})
+
+            self.load_model()
+            all_hooks.reset()
+            self.patcher.patch_hooks(None)
+            if show_pbar:
+                pbar = ProgressBar(len(scheduled_keyframes))
+
+            for scheduled_opts in scheduled_keyframes:
+                t_range = scheduled_opts[0]
+                # don't bother encoding any conds outside of start_percent and end_percent bounds
+                if "start_percent" in add_dict:
+                    if t_range[1] < add_dict["start_percent"]:
+                        continue
+                if "end_percent" in add_dict:
+                    if t_range[0] > add_dict["end_percent"]:
+                        continue
+                hooks_keyframes = scheduled_opts[1]
+                for hook, keyframe in hooks_keyframes:
+                    hook.hook_keyframe._current_keyframe = keyframe
+                # apply appropriate hooks with values that match new hook_keyframe
+                self.patcher.patch_hooks(all_hooks)
+                # perform encoding as normal
+                o = self.cond_stage_model.encode_token_weights(tokens)
+                cond, pooled = o[:2]
+                pooled_dict = {"pooled_output": pooled}
+                # add clip_start_percent and clip_end_percent in pooled
+                pooled_dict["clip_start_percent"] = t_range[0]
+                pooled_dict["clip_end_percent"] = t_range[1]
+                # add/update any keys with the provided add_dict
+                pooled_dict.update(add_dict)
+                # add hooks stored on clip
+                self.add_hooks_to_dict(pooled_dict)
+                all_cond_pooled.append([cond, pooled_dict])
+                if show_pbar:
+                    pbar.update(1)
+                model_management.throw_exception_if_processing_interrupted()
+            all_hooks.reset()
+        return all_cond_pooled
+
    def encode_from_tokens(self, tokens, return_pooled=False, return_dict=False):
        self.cond_stage_model.reset_clip_options()

@@ -137,6 +210,7 @@ class CLIP:
            if len(o) > 2:
                for k in o[2]:
                    out[k] = o[2][k]
+            self.add_hooks_to_dict(out)
            return out

        if return_pooled:
@@ -233,8 +307,8 @@ class VAE:
                    self.upscale_ratio = 4

                self.latent_channels = ddconfig['z_channels'] = sd["decoder.conv_in.weight"].shape[1]
-                if 'quant_conv.weight' in sd:
-                    self.first_stage_model = AutoencoderKL(ddconfig=ddconfig, embed_dim=4)
+                if 'post_quant_conv.weight' in sd:
+                    self.first_stage_model = AutoencoderKL(ddconfig=ddconfig, embed_dim=sd['post_quant_conv.weight'].shape[1])
                else:
                    self.first_stage_model = AutoencodingEngine(regularizer_config={'target': "comfy.ldm.models.autoencoder.DiagonalGaussianRegularizer"},
                                                                encoder_config={'target': "comfy.ldm.modules.diffusionmodules.model.Encoder", 'params': ddconfig},
@@ -271,6 +345,17 @@ class VAE:
                self.memory_used_encode = lambda shape, dtype: (70 * max(shape[2], 7) * shape[3] * shape[4]) * model_management.dtype_size(dtype)
                self.upscale_ratio = (lambda a: max(0, a * 8 - 7), 32, 32)
                self.working_dtypes = [torch.bfloat16, torch.float32]
+            elif "decoder.conv_in.conv.weight" in sd:
+                ddconfig = {'double_z': True, 'z_channels': 4, 'resolution': 256, 'in_channels': 3, 'out_ch': 3, 'ch': 128, 'ch_mult': [1, 2, 4, 4], 'num_res_blocks': 2, 'attn_resolutions': [], 'dropout': 0.0}
+                ddconfig["conv3d"] = True
+                ddconfig["time_compress"] = 4
+                self.upscale_ratio = (lambda a: max(0, a * 4 - 3), 8, 8)
+                self.latent_dim = 3
+                self.latent_channels = ddconfig['z_channels'] = sd["decoder.conv_in.conv.weight"].shape[1]
+                self.first_stage_model = AutoencoderKL(ddconfig=ddconfig, embed_dim=sd['post_quant_conv.weight'].shape[1])
+                self.memory_used_decode = lambda shape, dtype: (1500 * shape[2] * shape[3] * shape[4] * (4 * 8 * 8)) * model_management.dtype_size(dtype)
+                self.memory_used_encode = lambda shape, dtype: (900 * max(shape[2], 2) * shape[3] * shape[4]) * model_management.dtype_size(dtype)
+                self.working_dtypes = [torch.bfloat16, torch.float16, torch.float32]
            else:
                logging.warning("WARNING: No VAE weights detected, VAE not initalized.")
                self.first_stage_model = None
@@ -362,7 +447,7 @@ class VAE:
                if pixel_samples is None:
                    pixel_samples = torch.empty((samples_in.shape[0],) + tuple(out.shape[1:]), device=self.output_device)
                pixel_samples[x:x+batch_number] = out
-        except model_management.OOM_EXCEPTION as e:
+        except model_management.OOM_EXCEPTION:
            logging.warning("Warning: Ran out of memory when regular VAE decoding, retrying with tiled VAE decoding.")
            dims = samples_in.ndim - 2
            if dims == 1:
@@ -417,7 +502,7 @@ class VAE:
                    samples = torch.empty((pixel_samples.shape[0],) + tuple(out.shape[1:]), device=self.output_device)
                samples[x:x + batch_number] = out

-        except model_management.OOM_EXCEPTION as e:
+        except model_management.OOM_EXCEPTION:
            logging.warning("Warning: Ran out of memory when regular VAE encoding, retrying with tiled VAE encoding.")
            if len(pixel_samples.shape) == 3:
                samples = self.encode_tiled_1d(pixel_samples)
@@ -471,6 +556,7 @@ class CLIPType(Enum):
    FLUX = 6
    MOCHI = 7
    LTXV = 8
+    HUNYUAN_VIDEO = 9

 def load_clip(ckpt_paths, embedding_directory=None, clip_type=CLIPType.STABLE_DIFFUSION, model_options={}):
    clip_data = []
@@ -486,6 +572,7 @@ class TEModel(Enum):
    T5_XXL = 4
    T5_XL = 5
    T5_BASE = 6
+    LLAMA3_8 = 7

 def detect_te_model(sd):
    if "text_model.encoder.layers.30.mlp.fc1.weight" in sd:
@@ -502,6 +589,8 @@ def detect_te_model(sd):
            return TEModel.T5_XL
    if "encoder.block.0.layer.0.SelfAttention.k.weight" in sd:
        return TEModel.T5_BASE
+    if "model.layers.0.post_attention_layernorm.weight" in sd:
+        return TEModel.LLAMA3_8
    return None


@@ -514,6 +603,14 @@ def t5xxl_detect(clip_data):

    return {}

+def llama_detect(clip_data):
+    weight_name = "model.layers.0.self_attn.k_proj.weight"
+
+    for sd in clip_data:
+        if weight_name in sd:
+            return comfy.text_encoders.hunyuan_video.llama_detect(sd)
+
+    return {}

 def load_text_encoder_state_dicts(state_dicts=[], embedding_directory=None, clip_type=CLIPType.STABLE_DIFFUSION, model_options={}):
    clip_data = state_dicts
@@ -579,6 +676,9 @@ def load_text_encoder_state_dicts(state_dicts=[], embedding_directory=None, clip
        elif clip_type == CLIPType.FLUX:
            clip_target.clip = comfy.text_encoders.flux.flux_clip(**t5xxl_detect(clip_data))
            clip_target.tokenizer = comfy.text_encoders.flux.FluxTokenizer
+        elif clip_type == CLIPType.HUNYUAN_VIDEO:
+            clip_target.clip = comfy.text_encoders.hunyuan_video.hunyuan_video_clip(**llama_detect(clip_data))
+            clip_target.tokenizer = comfy.text_encoders.hunyuan_video.HunyuanVideoTokenizer
        else:
            clip_target.clip = sdxl_clip.SDXLClipModel
            clip_target.tokenizer = sdxl_clip.SDXLTokenizer
@@ -618,7 +718,6 @@ def load_checkpoint(config_path=None, ckpt_path=None, output_vae=True, output_cl
            config = yaml.safe_load(stream)
    model_config_params = config['model']['params']
    clip_config = model_config_params['cond_stage_config']
-    scale_factor = model_config_params['scale_factor']

    if "parameterization" in model_config_params:
        if model_config_params["parameterization"] == "v":
--- a/comfy/sd1_clip.py
+++ b/comfy/sd1_clip.py
@@ -10,6 +10,7 @@ import comfy.clip_model
 import json
 import logging
 import numbers
+import re

 def gen_empty_tokens(special_tokens, length):
    start_token = special_tokens.get("start", None)
@@ -90,8 +91,11 @@ class SDClipModel(torch.nn.Module, ClipTokenWeightEncoder):
        if textmodel_json_config is None:
            textmodel_json_config = os.path.join(os.path.dirname(os.path.realpath(__file__)), "sd1_clip_config.json")

-        with open(textmodel_json_config) as f:
-            config = json.load(f)
+        if isinstance(textmodel_json_config, dict):
+            config = textmodel_json_config
+        else:
+            with open(textmodel_json_config) as f:
+                config = json.load(f)

        operations = model_options.get("custom_operations", None)
        scaled_fp8 = None
@@ -196,11 +200,18 @@ class SDClipModel(torch.nn.Module, ClipTokenWeightEncoder):
        attention_mask = None
        if self.enable_attention_masks or self.zero_out_masked or self.return_attention_masks:
            attention_mask = torch.zeros_like(tokens)
-            end_token = self.special_tokens.get("end", -1)
+            end_token = self.special_tokens.get("end", None)
+            if end_token is None:
+                cmp_token = self.special_tokens.get("pad", -1)
+            else:
+                cmp_token = end_token
+
            for x in range(attention_mask.shape[0]):
                for y in range(attention_mask.shape[1]):
                    attention_mask[x, y] = 1
-                    if tokens[x, y] == end_token:
+                    if tokens[x, y] == cmp_token:
+                        if end_token is None:
+                            attention_mask[x, y] = 0
                        break

        attention_mask_model = None
@@ -326,7 +337,6 @@ def expand_directory_list(directories):
    return list(dirs)

 def bundled_embed(embed, prefix, suffix): #bundled embedding in lora format
-    i = 0
    out_list = []
    for k in embed:
        if k.startswith(prefix) and k.endswith(suffix):
@@ -382,7 +392,7 @@ def load_embed(embedding_name, embedding_directory, embedding_size, embed_key=No
                    embed_out = safe_load_embed_zip(embed_path)
            else:
                embed = torch.load(embed_path, map_location="cpu")
-    except Exception as e:
+    except Exception:
        logging.warning("{}\n\nerror loading embedding, skipping loading: {}".format(traceback.format_exc(), embedding_name))
        return None

@@ -411,22 +421,31 @@ def load_embed(embedding_name, embedding_directory, embedding_size, embed_key=No
    return embed_out

 class SDTokenizer:
-    def __init__(self, tokenizer_path=None, max_length=77, pad_with_end=True, embedding_directory=None, embedding_size=768, embedding_key='clip_l', tokenizer_class=CLIPTokenizer, has_start_token=True, pad_to_max_length=True, min_length=None, pad_token=None, tokenizer_data={}):
+    def __init__(self, tokenizer_path=None, max_length=77, pad_with_end=True, embedding_directory=None, embedding_size=768, embedding_key='clip_l', tokenizer_class=CLIPTokenizer, has_start_token=True, has_end_token=True, pad_to_max_length=True, min_length=None, pad_token=None, end_token=None, tokenizer_data={}):
        if tokenizer_path is None:
            tokenizer_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "sd1_tokenizer")
        self.tokenizer = tokenizer_class.from_pretrained(tokenizer_path)
        self.max_length = max_length
        self.min_length = min_length
+        self.end_token = None

        empty = self.tokenizer('')["input_ids"]
+        self.tokenizer_adds_end_token = has_end_token
        if has_start_token:
            self.tokens_start = 1
            self.start_token = empty[0]
-            self.end_token = empty[1]
+            if end_token is not None:
+                self.end_token = end_token
+            else:
+                if has_end_token:
+                    self.end_token = empty[1]
        else:
            self.tokens_start = 0
            self.start_token = None
-            self.end_token = empty[0]
+            if end_token is not None:
+                self.end_token = end_token
+            else:
+                self.end_token = empty[0]

        if pad_token is not None:
            self.pad_token = pad_token
@@ -451,13 +470,16 @@ class SDTokenizer:
        Takes a potential embedding name and tries to retrieve it.
        Returns a Tuple consisting of the embedding and any leftover string, embedding can be None.
        '''
+        split_embed = embedding_name.split()
+        embedding_name = split_embed[0]
+        leftover = ' '.join(split_embed[1:])
        embed = load_embed(embedding_name, self.embedding_directory, self.embedding_size, self.embedding_key)
        if embed is None:
            stripped = embedding_name.strip(',')
            if len(stripped) < len(embedding_name):
                embed = load_embed(stripped, self.embedding_directory, self.embedding_size, self.embedding_key)
-                return (embed, embedding_name[len(stripped):])
-        return (embed, "")
+                return (embed, "{} {}".format(embedding_name[len(stripped):], leftover))
+        return (embed, leftover)


    def tokenize_with_weights(self, text:str, return_word_ids=False):
@@ -471,13 +493,18 @@ class SDTokenizer:
        text = escape_important(text)
        parsed_weights = token_weights(text, 1.0)

-        #tokenize words
+        # tokenize words
        tokens = []
        for weighted_segment, weight in parsed_weights:
-            to_tokenize = unescape_important(weighted_segment).replace("\n", " ").split(' ')
+            to_tokenize = unescape_important(weighted_segment)
+            split = re.split(' {0}|\n{0}'.format(self.embedding_identifier), to_tokenize)
+            to_tokenize = [split[0]]
+            for i in range(1, len(split)):
+                to_tokenize.append("{}{}".format(self.embedding_identifier, split[i]))
+
            to_tokenize = [x for x in to_tokenize if x != ""]
            for word in to_tokenize:
-                #if we find an embedding, deal with the embedding
+                # if we find an embedding, deal with the embedding
                if word.startswith(self.embedding_identifier) and self.embedding_directory is not None:
                    embedding_name = word[len(self.embedding_identifier):].strip('\n')
                    embed, leftover = self._try_get_embedding(embedding_name)
@@ -493,8 +520,11 @@ class SDTokenizer:
                        word = leftover
                    else:
                        continue
+                end = 999999999999
+                if self.tokenizer_adds_end_token:
+                    end = -1
                #parse word
-                tokens.append([(t, weight) for t in self.tokenizer(word)["input_ids"][self.tokens_start:-1]])
+                tokens.append([(t, weight) for t in self.tokenizer(word)["input_ids"][self.tokens_start:end]])

        #reshape token array to CLIP input size
        batched_tokens = []
@@ -505,18 +535,24 @@ class SDTokenizer:
        for i, t_group in enumerate(tokens):
            #determine if we're going to try and keep the tokens in a single batch
            is_large = len(t_group) >= self.max_word_length
+            if self.end_token is not None:
+                has_end_token = 1
+            else:
+                has_end_token = 0

            while len(t_group) > 0:
-                if len(t_group) + len(batch) > self.max_length - 1:
-                    remaining_length = self.max_length - len(batch) - 1
+                if len(t_group) + len(batch) > self.max_length - has_end_token:
+                    remaining_length = self.max_length - len(batch) - has_end_token
                    #break word in two and add end token
                    if is_large:
                        batch.extend([(t,w,i+1) for t,w in t_group[:remaining_length]])
-                        batch.append((self.end_token, 1.0, 0))
+                        if self.end_token is not None:
+                            batch.append((self.end_token, 1.0, 0))
                        t_group = t_group[remaining_length:]
                    #add end token and pad
                    else:
-                        batch.append((self.end_token, 1.0, 0))
+                        if self.end_token is not None:
+                            batch.append((self.end_token, 1.0, 0))
                        if self.pad_to_max_length:
                            batch.extend([(self.pad_token, 1.0, 0)] * (remaining_length))
                    #start new batch
@@ -529,7 +565,8 @@ class SDTokenizer:
                    t_group = []

        #fill last batch
-        batch.append((self.end_token, 1.0, 0))
+        if self.end_token is not None:
+            batch.append((self.end_token, 1.0, 0))
        if self.pad_to_max_length:
            batch.extend([(self.pad_token, 1.0, 0)] * (self.max_length - len(batch)))
        if self.min_length is not None and len(batch) < self.min_length:
--- a/comfy/supported_models.py
+++ b/comfy/supported_models.py
@@ -12,6 +12,7 @@ import comfy.text_encoders.hydit
 import comfy.text_encoders.flux
 import comfy.text_encoders.genmo
 import comfy.text_encoders.lt
+import comfy.text_encoders.hunyuan_video

 from . import supported_models_base
 from . import latent_formats
@@ -224,7 +225,6 @@ class SDXL(supported_models_base.BASE):

    def process_clip_state_dict_for_saving(self, state_dict):
        replace_prefix = {}
-        keys_to_replace = {}
        state_dict_g = diffusers_convert.convert_text_enc_state_dict_v20(state_dict, "clip_g")
        for k in state_dict:
            if k.startswith("clip_l"):
@@ -527,7 +527,6 @@ class SD3(supported_models_base.BASE):
        clip_l = False
        clip_g = False
        t5 = False
-        dtype_t5 = None
        pref = self.text_encoder_key_prefix[0]
        if "{}clip_l.transformer.text_model.final_layer_norm.weight".format(pref) in state_dict:
            clip_l = True
@@ -740,6 +739,54 @@ class LTXV(supported_models_base.BASE):
        t5_detect = comfy.text_encoders.sd3_clip.t5_xxl_detect(state_dict, "{}t5xxl.transformer.".format(pref))
        return supported_models_base.ClipTarget(comfy.text_encoders.lt.LTXVT5Tokenizer, comfy.text_encoders.lt.ltxv_te(**t5_detect))

-models = [Stable_Zero123, SD15_instructpix2pix, SD15, SD20, SD21UnclipL, SD21UnclipH, SDXL_instructpix2pix, SDXLRefiner, SDXL, SSD1B, KOALA_700M, KOALA_1B, Segmind_Vega, SD_X4Upscaler, Stable_Cascade_C, Stable_Cascade_B, SV3D_u, SV3D_p, SD3, StableAudio, AuraFlow, HunyuanDiT, HunyuanDiT1, FluxInpaint, Flux, FluxSchnell, GenmoMochi, LTXV]
+class HunyuanVideo(supported_models_base.BASE):
+    unet_config = {
+        "image_model": "hunyuan_video",
+    }
+
+    sampling_settings = {
+        "shift": 7.0,
+    }
+
+    unet_extra_config = {}
+    latent_format = latent_formats.HunyuanVideo
+
+    memory_usage_factor = 2.0 #TODO
+
+    supported_inference_dtypes = [torch.bfloat16, torch.float32]
+
+    vae_key_prefix = ["vae."]
+    text_encoder_key_prefix = ["text_encoders."]
+
+    def get_model(self, state_dict, prefix="", device=None):
+        out = model_base.HunyuanVideo(self, device=device)
+        return out
+
+    def process_unet_state_dict(self, state_dict):
+        out_sd = {}
+        for k in list(state_dict.keys()):
+            key_out = k
+            key_out = key_out.replace("txt_in.t_embedder.mlp.0.", "txt_in.t_embedder.in_layer.").replace("txt_in.t_embedder.mlp.2.", "txt_in.t_embedder.out_layer.")
+            key_out = key_out.replace("txt_in.c_embedder.linear_1.", "txt_in.c_embedder.in_layer.").replace("txt_in.c_embedder.linear_2.", "txt_in.c_embedder.out_layer.")
+            key_out = key_out.replace("_mod.linear.", "_mod.lin.").replace("_attn_qkv.", "_attn.qkv.")
+            key_out = key_out.replace("mlp.fc1.", "mlp.0.").replace("mlp.fc2.", "mlp.2.")
+            key_out = key_out.replace("_attn_q_norm.weight", "_attn.norm.query_norm.scale").replace("_attn_k_norm.weight", "_attn.norm.key_norm.scale")
+            key_out = key_out.replace(".q_norm.weight", ".norm.query_norm.scale").replace(".k_norm.weight", ".norm.key_norm.scale")
+            key_out = key_out.replace("_attn_proj.", "_attn.proj.")
+            key_out = key_out.replace(".modulation.linear.", ".modulation.lin.")
+            key_out = key_out.replace("_in.mlp.2.", "_in.out_layer.").replace("_in.mlp.0.", "_in.in_layer.")
+            out_sd[key_out] = state_dict[k]
+        return out_sd
+
+    def process_unet_state_dict_for_saving(self, state_dict):
+        replace_prefix = {"": "model.model."}
+        return utils.state_dict_prefix_replace(state_dict, replace_prefix)
+
+    def clip_target(self, state_dict={}):
+        pref = self.text_encoder_key_prefix[0]
+        hunyuan_detect = comfy.text_encoders.hunyuan_video.llama_detect(state_dict, "{}llama.transformer.".format(pref))
+        return supported_models_base.ClipTarget(comfy.text_encoders.hunyuan_video.HunyuanVideoTokenizer, comfy.text_encoders.hunyuan_video.hunyuan_video_clip(**hunyuan_detect))
+
+models = [Stable_Zero123, SD15_instructpix2pix, SD15, SD20, SD21UnclipL, SD21UnclipH, SDXL_instructpix2pix, SDXLRefiner, SDXL, SSD1B, KOALA_700M, KOALA_1B, Segmind_Vega, SD_X4Upscaler, Stable_Cascade_C, Stable_Cascade_B, SV3D_u, SV3D_p, SD3, StableAudio, AuraFlow, HunyuanDiT, HunyuanDiT1, FluxInpaint, Flux, FluxSchnell, GenmoMochi, LTXV, HunyuanVideo]

 models += [SVD_img2vid]
--- a/comfy/text_encoders/hunyuan_video.py
+++ b/comfy/text_encoders/hunyuan_video.py
@@ -0,0 +1,112 @@
+from comfy import sd1_clip
+import comfy.model_management
+import comfy.text_encoders.llama
+from transformers import LlamaTokenizerFast
+import torch
+import os
+
+
+def llama_detect(state_dict, prefix=""):
+    out = {}
+    t5_key = "{}model.norm.weight".format(prefix)
+    if t5_key in state_dict:
+        out["dtype_llama"] = state_dict[t5_key].dtype
+
+    scaled_fp8_key = "{}scaled_fp8".format(prefix)
+    if scaled_fp8_key in state_dict:
+        out["llama_scaled_fp8"] = state_dict[scaled_fp8_key].dtype
+
+    return out
+
+
+class LLAMA3Tokenizer(sd1_clip.SDTokenizer):
+    def __init__(self, embedding_directory=None, tokenizer_data={}, min_length=256):
+        tokenizer_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "llama_tokenizer")
+        super().__init__(tokenizer_path, embedding_directory=embedding_directory, pad_with_end=False, embedding_size=4096, embedding_key='llama', tokenizer_class=LlamaTokenizerFast, has_start_token=True, has_end_token=False, pad_to_max_length=False, max_length=99999999, pad_token=128258, end_token=128009, min_length=min_length)
+
+class LLAMAModel(sd1_clip.SDClipModel):
+    def __init__(self, device="cpu", layer="hidden", layer_idx=-3, dtype=None, attention_mask=True, model_options={}):
+        llama_scaled_fp8 = model_options.get("llama_scaled_fp8", None)
+        if llama_scaled_fp8 is not None:
+            model_options = model_options.copy()
+            model_options["scaled_fp8"] = llama_scaled_fp8
+
+        super().__init__(device=device, layer=layer, layer_idx=layer_idx, textmodel_json_config={}, dtype=dtype, special_tokens={"start": 128000, "pad": 128258}, layer_norm_hidden_state=False, model_class=comfy.text_encoders.llama.Llama2, enable_attention_masks=attention_mask, return_attention_masks=attention_mask, model_options=model_options)
+
+
+class HunyuanVideoTokenizer:
+    def __init__(self, embedding_directory=None, tokenizer_data={}):
+        clip_l_tokenizer_class = tokenizer_data.get("clip_l_tokenizer_class", sd1_clip.SDTokenizer)
+        self.clip_l = clip_l_tokenizer_class(embedding_directory=embedding_directory)
+        self.llama_template = """<|start_header_id|>system<|end_header_id|>\n\nDescribe the video by detailing the following aspects: 1. The main content and theme of the video.2. The color, shape, size, texture, quantity, text, and spatial relationships of the objects.3. Actions, events, behaviors temporal relationships, physical movement changes of the objects.4. background environment, light, style and atmosphere.5. camera angles, movements, and transitions used in the video:<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n"""  # 95 tokens
+        self.llama = LLAMA3Tokenizer(embedding_directory=embedding_directory, min_length=1)
+
+    def tokenize_with_weights(self, text:str, return_word_ids=False):
+        out = {}
+        out["l"] = self.clip_l.tokenize_with_weights(text, return_word_ids)
+
+        llama_text = "{}{}".format(self.llama_template, text)
+        out["llama"] = self.llama.tokenize_with_weights(llama_text, return_word_ids)
+        return out
+
+    def untokenize(self, token_weight_pair):
+        return self.clip_l.untokenize(token_weight_pair)
+
+    def state_dict(self):
+        return {}
+
+
+class HunyuanVideoClipModel(torch.nn.Module):
+    def __init__(self, dtype_llama=None, device="cpu", dtype=None, model_options={}):
+        super().__init__()
+        dtype_llama = comfy.model_management.pick_weight_dtype(dtype_llama, dtype, device)
+        clip_l_class = model_options.get("clip_l_class", sd1_clip.SDClipModel)
+        self.clip_l = clip_l_class(device=device, dtype=dtype, return_projected_pooled=False, model_options=model_options)
+        self.llama = LLAMAModel(device=device, dtype=dtype_llama, model_options=model_options)
+        self.dtypes = set([dtype, dtype_llama])
+
+    def set_clip_options(self, options):
+        self.clip_l.set_clip_options(options)
+        self.llama.set_clip_options(options)
+
+    def reset_clip_options(self):
+        self.clip_l.reset_clip_options()
+        self.llama.reset_clip_options()
+
+    def encode_token_weights(self, token_weight_pairs):
+        token_weight_pairs_l = token_weight_pairs["l"]
+        token_weight_pairs_llama = token_weight_pairs["llama"]
+
+        llama_out, llama_pooled, llama_extra_out = self.llama.encode_token_weights(token_weight_pairs_llama)
+
+        template_end = 0
+        for i, v in enumerate(token_weight_pairs_llama[0]):
+            if v[0] == 128007:  # <|end_header_id|>
+                template_end = i
+
+        if llama_out.shape[1] > (template_end + 2):
+            if token_weight_pairs_llama[0][template_end + 1][0] == 271:
+                template_end += 2
+        llama_out = llama_out[:, template_end:]
+        llama_extra_out["attention_mask"] = llama_extra_out["attention_mask"][:, template_end:]
+        if llama_extra_out["attention_mask"].sum() == torch.numel(llama_extra_out["attention_mask"]):
+            llama_extra_out.pop("attention_mask")  # attention mask is useless if no masked elements
+
+        l_out, l_pooled = self.clip_l.encode_token_weights(token_weight_pairs_l)
+        return llama_out, l_pooled, llama_extra_out
+
+    def load_sd(self, sd):
+        if "text_model.encoder.layers.1.mlp.fc1.weight" in sd:
+            return self.clip_l.load_sd(sd)
+        else:
+            return self.llama.load_sd(sd)
+
+
+def hunyuan_video_clip(dtype_llama=None, llama_scaled_fp8=None):
+    class HunyuanVideoClipModel_(HunyuanVideoClipModel):
+        def __init__(self, device="cpu", dtype=None, model_options={}):
+            if llama_scaled_fp8 is not None and "llama_scaled_fp8" not in model_options:
+                model_options = model_options.copy()
+                model_options["llama_scaled_fp8"] = llama_scaled_fp8
+            super().__init__(dtype_llama=dtype_llama, device=device, dtype=dtype, model_options=model_options)
+    return HunyuanVideoClipModel_
--- a/comfy/text_encoders/llama.py
+++ b/comfy/text_encoders/llama.py
@@ -0,0 +1,226 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from dataclasses import dataclass
+from typing import Optional, Any
+
+from comfy.ldm.modules.attention import optimized_attention_for_device
+import comfy.model_management
+import comfy.ldm.common_dit
+
+import comfy.model_management
+
+@dataclass
+class Llama2Config:
+    vocab_size: int = 128320
+    hidden_size: int = 4096
+    intermediate_size: int = 14336
+    num_hidden_layers: int = 32
+    num_attention_heads: int = 32
+    num_key_value_heads: int = 8
+    max_position_embeddings: int = 8192
+    rms_norm_eps: float = 1e-5
+    rope_theta: float = 500000.0
+
+class RMSNorm(nn.Module):
+    def __init__(self, dim: int, eps: float = 1e-5, device=None, dtype=None):
+        super().__init__()
+        self.eps = eps
+        self.weight = nn.Parameter(torch.empty(dim, device=device, dtype=dtype))
+
+    def forward(self, x: torch.Tensor):
+        return comfy.ldm.common_dit.rms_norm(x, self.weight, self.eps)
+
+
+def rotate_half(x):
+    """Rotates half the hidden dims of the input."""
+    x1 = x[..., : x.shape[-1] // 2]
+    x2 = x[..., x.shape[-1] // 2 :]
+    return torch.cat((-x2, x1), dim=-1)
+
+
+def precompute_freqs_cis(head_dim, seq_len, theta, device=None):
+    theta_numerator = torch.arange(0, head_dim, 2, device=device).float()
+    inv_freq = 1.0 / (theta ** (theta_numerator / head_dim))
+
+    position_ids = torch.arange(0, seq_len, device=device).unsqueeze(0)
+
+    inv_freq_expanded = inv_freq[None, :, None].float().expand(position_ids.shape[0], -1, 1)
+    position_ids_expanded = position_ids[:, None, :].float()
+    freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)
+    emb = torch.cat((freqs, freqs), dim=-1)
+    cos = emb.cos()
+    sin = emb.sin()
+    return (cos, sin)
+
+
+def apply_rope(xq, xk, freqs_cis):
+    cos = freqs_cis[0].unsqueeze(1)
+    sin = freqs_cis[1].unsqueeze(1)
+    q_embed = (xq * cos) + (rotate_half(xq) * sin)
+    k_embed = (xk * cos) + (rotate_half(xk) * sin)
+    return q_embed, k_embed
+
+
+class Attention(nn.Module):
+    def __init__(self, config: Llama2Config, device=None, dtype=None, ops: Any = None):
+        super().__init__()
+        self.num_heads = config.num_attention_heads
+        self.num_kv_heads = config.num_key_value_heads
+        self.hidden_size = config.hidden_size
+        self.head_dim = self.hidden_size // self.num_heads
+
+        ops = ops or nn
+        self.q_proj = ops.Linear(config.hidden_size, config.hidden_size, bias=False, device=device, dtype=dtype)
+        self.k_proj = ops.Linear(config.hidden_size, self.num_kv_heads * self.head_dim, bias=False, device=device, dtype=dtype)
+        self.v_proj = ops.Linear(config.hidden_size, self.num_kv_heads * self.head_dim, bias=False, device=device, dtype=dtype)
+        self.o_proj = ops.Linear(config.hidden_size, config.hidden_size, bias=False, device=device, dtype=dtype)
+
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        attention_mask: Optional[torch.Tensor] = None,
+        freqs_cis: Optional[torch.Tensor] = None,
+        optimized_attention=None,
+    ):
+        batch_size, seq_length, _ = hidden_states.shape
+
+        xq = self.q_proj(hidden_states)
+        xk = self.k_proj(hidden_states)
+        xv = self.v_proj(hidden_states)
+
+        xq = xq.view(batch_size, seq_length, self.num_heads, self.head_dim).transpose(1, 2)
+        xk = xk.view(batch_size, seq_length, self.num_kv_heads, self.head_dim).transpose(1, 2)
+        xv = xv.view(batch_size, seq_length, self.num_kv_heads, self.head_dim).transpose(1, 2)
+
+        xq, xk = apply_rope(xq, xk, freqs_cis=freqs_cis)
+
+        xk = xk.repeat_interleave(self.num_heads // self.num_kv_heads, dim=1)
+        xv = xv.repeat_interleave(self.num_heads // self.num_kv_heads, dim=1)
+
+        output = optimized_attention(xq, xk, xv, self.num_heads, mask=attention_mask, skip_reshape=True)
+        return self.o_proj(output)
+
+class MLP(nn.Module):
+    def __init__(self, config: Llama2Config, device=None, dtype=None, ops: Any = None):
+        super().__init__()
+        ops = ops or nn
+        self.gate_proj = ops.Linear(config.hidden_size, config.intermediate_size, bias=False, device=device, dtype=dtype)
+        self.up_proj = ops.Linear(config.hidden_size, config.intermediate_size, bias=False, device=device, dtype=dtype)
+        self.down_proj = ops.Linear(config.intermediate_size, config.hidden_size, bias=False, device=device, dtype=dtype)
+
+    def forward(self, x):
+        return self.down_proj(F.silu(self.gate_proj(x)) * self.up_proj(x))
+
+class TransformerBlock(nn.Module):
+    def __init__(self, config: Llama2Config, device=None, dtype=None, ops: Any = None):
+        super().__init__()
+        self.self_attn = Attention(config, device=device, dtype=dtype, ops=ops)
+        self.mlp = MLP(config, device=device, dtype=dtype, ops=ops)
+        self.input_layernorm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps, device=device, dtype=dtype)
+        self.post_attention_layernorm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps, device=device, dtype=dtype)
+
+    def forward(
+        self,
+        x: torch.Tensor,
+        attention_mask: Optional[torch.Tensor] = None,
+        freqs_cis: Optional[torch.Tensor] = None,
+        optimized_attention=None,
+    ):
+        # Self Attention
+        residual = x
+        x = self.input_layernorm(x)
+        x = self.self_attn(
+            hidden_states=x,
+            attention_mask=attention_mask,
+            freqs_cis=freqs_cis,
+            optimized_attention=optimized_attention,
+        )
+        x = residual + x
+
+        # MLP
+        residual = x
+        x = self.post_attention_layernorm(x)
+        x = self.mlp(x)
+        x = residual + x
+
+        return x
+
+class Llama2_(nn.Module):
+    def __init__(self, config, device=None, dtype=None, ops=None):
+        super().__init__()
+        self.config = config
+        self.vocab_size = config.vocab_size
+
+        self.embed_tokens = ops.Embedding(
+            config.vocab_size,
+            config.hidden_size,
+            device=device,
+            dtype=dtype
+        )
+        self.layers = nn.ModuleList([
+            TransformerBlock(config, device=device, dtype=dtype, ops=ops)
+            for _ in range(config.num_hidden_layers)
+        ])
+        self.norm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps, device=device, dtype=dtype)
+        # self.lm_head = ops.Linear(config.hidden_size, config.vocab_size, bias=False, device=device, dtype=dtype)
+
+    def forward(self, x, attention_mask=None, intermediate_output=None, final_layer_norm_intermediate=True, dtype=None):
+        x = self.embed_tokens(x, out_dtype=dtype)
+
+        freqs_cis = precompute_freqs_cis(self.config.hidden_size // self.config.num_attention_heads,
+                                         x.shape[1],
+                                         self.config.rope_theta,
+                                         device=x.device)
+
+        mask = None
+        if attention_mask is not None:
+            mask = 1.0 - attention_mask.to(x.dtype).reshape((attention_mask.shape[0], 1, -1, attention_mask.shape[-1])).expand(attention_mask.shape[0], 1, attention_mask.shape[-1], attention_mask.shape[-1])
+            mask = mask.masked_fill(mask.to(torch.bool), float("-inf"))
+
+        causal_mask = torch.empty(x.shape[1], x.shape[1], dtype=x.dtype, device=x.device).fill_(float("-inf")).triu_(1)
+        if mask is not None:
+            mask += causal_mask
+        else:
+            mask = causal_mask
+        optimized_attention = optimized_attention_for_device(x.device, mask=mask is not None, small_input=True)
+
+        intermediate = None
+        if intermediate_output is not None:
+            if intermediate_output < 0:
+                intermediate_output = len(self.layers) + intermediate_output
+
+        for i, layer in enumerate(self.layers):
+            x = layer(
+                x=x,
+                attention_mask=mask,
+                freqs_cis=freqs_cis,
+                optimized_attention=optimized_attention,
+            )
+            if i == intermediate_output:
+                intermediate = x.clone()
+
+        x = self.norm(x)
+        if intermediate is not None and final_layer_norm_intermediate:
+            intermediate = self.norm(intermediate)
+
+        return x, intermediate
+
+
+class Llama2(torch.nn.Module):
+    def __init__(self, config_dict, dtype, device, operations):
+        super().__init__()
+        config = Llama2Config(**config_dict)
+        self.num_layers = config.num_hidden_layers
+
+        self.model = Llama2_(config, device=device, dtype=dtype, ops=operations)
+        self.dtype = dtype
+
+    def get_input_embeddings(self):
+        return self.model.embed_tokens
+
+    def set_input_embeddings(self, embeddings):
+        self.model.embed_tokens = embeddings
+
+    def forward(self, input_ids, *args, **kwargs):
+        return self.model(input_ids, *args, **kwargs)
--- a/comfy/text_encoders/llama_tokenizer/tokenizer.json
+++ b/comfy/text_encoders/llama_tokenizer/tokenizer.json
--- a/comfy/text_encoders/llama_tokenizer/tokenizer_config.json
+++ b/comfy/text_encoders/llama_tokenizer/tokenizer_config.json
--- a/comfy/text_encoders/spiece_tokenizer.py
+++ b/comfy/text_encoders/spiece_tokenizer.py
@@ -1,4 +1,3 @@
-import os
 import torch

 class SPieceTokenizer:
--- a/comfy/text_encoders/t5.py
+++ b/comfy/text_encoders/t5.py
@@ -172,7 +172,6 @@ class T5LayerSelfAttention(torch.nn.Module):
        # self.dropout = nn.Dropout(config.dropout_rate)

    def forward(self, x, mask=None, past_bias=None, optimized_attention=None):
-        normed_hidden_states = self.layer_norm(x)
        output, past_bias = self.SelfAttention(self.layer_norm(x), mask=mask, past_bias=past_bias, optimized_attention=optimized_attention)
        # x = x + self.dropout(attention_output)
        x += output
@@ -209,6 +208,11 @@ class T5Stack(torch.nn.Module):
        intermediate = None
        optimized_attention = optimized_attention_for_device(x.device, mask=attention_mask is not None, small_input=True)
        past_bias = None
+
+        if intermediate_output is not None:
+            if intermediate_output < 0:
+                intermediate_output = len(self.block) + intermediate_output
+
        for i, l in enumerate(self.block):
            x, past_bias = l(x, mask, past_bias, optimized_attention)
            if i == intermediate_output:
--- a/comfy/utils.py
+++ b/comfy/utils.py
@@ -26,6 +26,8 @@ import numpy as np
 from PIL import Image
 import logging
 import itertools
+from torch.nn.functional import interpolate
+from einops import rearrange

 def load_torch_file(ckpt, safe_load=False, device=None):
    if device is None:
@@ -46,7 +48,13 @@ def load_torch_file(ckpt, safe_load=False, device=None):
        if "state_dict" in pl_sd:
            sd = pl_sd["state_dict"]
        else:
-            sd = pl_sd
+            if len(pl_sd) == 1:
+                key = list(pl_sd.keys())[0]
+                sd = pl_sd[key]
+                if not isinstance(sd, dict):
+                    sd = pl_sd
+            else:
+                sd = pl_sd
    return sd

 def save_torch_file(sd, ckpt, metadata=None):
@@ -797,6 +805,8 @@ def tiled_scale_multidim(samples, function, tile=(64, 64), overlap = 8, upscale_

            for d in range(2, dims + 2):
                feather = round(get_upscale(d - 2, overlap[d - 2]))
+                if feather >= mask.shape[d]:
+                    continue
                for t in range(feather):
                    a = (t + 1) / feather
                    mask.narrow(d, t, 1).mul_(a)
@@ -867,5 +877,46 @@ def reshape_mask(input_mask, output_shape):
    mask = torch.nn.functional.interpolate(input_mask, size=output_shape[2:], mode=scale_mode)
    if mask.shape[1] < output_shape[1]:
        mask = mask.repeat((1, output_shape[1]) + (1,) * dims)[:,:output_shape[1]]
-    mask = comfy.utils.repeat_to_batch_size(mask, output_shape[0])
+    mask = repeat_to_batch_size(mask, output_shape[0])
    return mask
+
+def upscale_dit_mask(mask: torch.Tensor, img_size_in, img_size_out):
+        hi, wi = img_size_in
+        ho, wo = img_size_out
+        # if it's already the correct size, no need to do anything
+        if (hi, wi) == (ho, wo):
+            return mask
+        if mask.ndim == 2:
+            mask = mask.unsqueeze(0)
+        if mask.ndim != 3:
+            raise ValueError(f"Got a mask of shape {list(mask.shape)}, expected [b, q, k] or [q, k]")
+        txt_tokens = mask.shape[1] - (hi * wi)
+        # quadrants of the mask
+        txt_to_txt = mask[:, :txt_tokens, :txt_tokens]
+        txt_to_img = mask[:, :txt_tokens, txt_tokens:]
+        img_to_img = mask[:, txt_tokens:, txt_tokens:]
+        img_to_txt = mask[:, txt_tokens:, :txt_tokens]
+
+        # convert to 1d x 2d, interpolate, then back to 1d x 1d
+        txt_to_img = rearrange  (txt_to_img, "b t (h w) -> b t h w", h=hi, w=wi)
+        txt_to_img = interpolate(txt_to_img, size=img_size_out, mode="bilinear")
+        txt_to_img = rearrange  (txt_to_img, "b t h w -> b t (h w)")
+        # this one is hard because we have to do it twice
+        # convert to 1d x 2d, interpolate, then to 2d x 1d, interpolate, then 1d x 1d
+        img_to_img = rearrange  (img_to_img, "b hw (h w) -> b hw h w", h=hi, w=wi)
+        img_to_img = interpolate(img_to_img, size=img_size_out, mode="bilinear")
+        img_to_img = rearrange  (img_to_img, "b (hk wk) hq wq -> b (hq wq) hk wk", hk=hi, wk=wi)
+        img_to_img = interpolate(img_to_img, size=img_size_out, mode="bilinear")
+        img_to_img = rearrange  (img_to_img, "b (hq wq) hk wk -> b (hk wk) (hq wq)", hq=ho, wq=wo)
+        # convert to 2d x 1d, interpolate, then back to 1d x 1d
+        img_to_txt = rearrange  (img_to_txt, "b (h w) t -> b t h w", h=hi, w=wi)
+        img_to_txt = interpolate(img_to_txt, size=img_size_out, mode="bilinear")
+        img_to_txt = rearrange  (img_to_txt, "b t h w -> b (h w) t")
+
+        # reassemble the mask from blocks
+        out = torch.cat([
+            torch.cat([txt_to_txt, txt_to_img], dim=2),
+            torch.cat([img_to_txt, img_to_img], dim=2)],
+            dim=1
+        )
+        return out
--- a/comfy_execution/validation.py
+++ b/comfy_execution/validation.py
@@ -0,0 +1,39 @@
+from __future__ import annotations
+
+
+def validate_node_input(
+    received_type: str, input_type: str, strict: bool = False
+) -> bool:
+    """
+    received_type and input_type are both strings of the form "T1,T2,...".
+
+    If strict is True, the input_type must contain the received_type.
+      For example, if received_type is "STRING" and input_type is "STRING,INT",
+      this will return True. But if received_type is "STRING,INT" and input_type is
+      "INT", this will return False.
+
+    If strict is False, the input_type must have overlap with the received_type.
+      For example, if received_type is "STRING,BOOLEAN" and input_type is "STRING,INT",
+      this will return True.
+
+    Supports pre-union type extension behaviour of ``__ne__`` overrides.
+    """
+    # If the types are exactly the same, we can return immediately
+    # Use pre-union behaviour: inverse of `__ne__`
+    if not received_type != input_type:
+        return True
+
+    # Not equal, and not strings
+    if not isinstance(received_type, str) or not isinstance(input_type, str):
+        return False
+
+    # Split the type strings into sets for comparison
+    received_types = set(t.strip() for t in received_type.split(","))
+    input_types = set(t.strip() for t in input_type.split(","))
+
+    if strict:
+        # In strict mode, all received types must be in the input types
+        return received_types.issubset(input_types)
+    else:
+        # In non-strict mode, there must be at least one type in common
+        return len(received_types.intersection(input_types)) > 0
--- a/comfy_extras/nodes_advanced_samplers.py
+++ b/comfy_extras/nodes_advanced_samplers.py
@@ -2,8 +2,7 @@ import comfy.samplers
 import comfy.utils
 import torch
 import numpy as np
-from tqdm.auto import trange, tqdm
-import math
+from tqdm.auto import trange


@torch.no_grad()
--- a/comfy_extras/nodes_audio.py
+++ b/comfy_extras/nodes_audio.py
@@ -8,6 +8,7 @@ import json
 import struct
 import random
 import hashlib
+import node_helpers
 from comfy.cli_args import args

 class EmptyLatentAudio:
@@ -29,6 +30,27 @@ class EmptyLatentAudio:
        latent = torch.zeros([batch_size, 64, length], device=self.device)
        return ({"samples":latent, "type": "audio"}, )

+class ConditioningStableAudio:
+    @classmethod
+    def INPUT_TYPES(s):
+        return {"required": {"positive": ("CONDITIONING", ),
+                             "negative": ("CONDITIONING", ),
+                             "seconds_start": ("FLOAT", {"default": 0.0, "min": 0.0, "max": 1000.0, "step": 0.1}),
+                             "seconds_total": ("FLOAT", {"default": 47.0, "min": 0.0, "max": 1000.0, "step": 0.1}),
+                             }}
+
+    RETURN_TYPES = ("CONDITIONING","CONDITIONING")
+    RETURN_NAMES = ("positive", "negative")
+
+    FUNCTION = "append"
+
+    CATEGORY = "conditioning"
+
+    def append(self, positive, negative, seconds_start, seconds_total):
+        positive = node_helpers.conditioning_set_values(positive, {"seconds_start": seconds_start, "seconds_total": seconds_total})
+        negative = node_helpers.conditioning_set_values(negative, {"seconds_start": seconds_start, "seconds_total": seconds_total})
+        return (positive, negative)
+
 class VAEEncodeAudio:
    @classmethod
    def INPUT_TYPES(s):
@@ -225,4 +247,5 @@ NODE_CLASS_MAPPINGS = {
    "SaveAudio": SaveAudio,
    "LoadAudio": LoadAudio,
    "PreviewAudio": PreviewAudio,
+    "ConditioningStableAudio": ConditioningStableAudio,
 }
--- a/comfy_extras/nodes_clip_sdxl.py
+++ b/comfy_extras/nodes_clip_sdxl.py
@@ -1,4 +1,3 @@
-import torch
 from nodes import MAX_RESOLUTION

 class CLIPTextEncodeSDXLRefiner:
@@ -17,21 +16,21 @@ class CLIPTextEncodeSDXLRefiner:

    def encode(self, clip, ascore, width, height, text):
        tokens = clip.tokenize(text)
-        cond, pooled = clip.encode_from_tokens(tokens, return_pooled=True)
-        return ([[cond, {"pooled_output": pooled, "aesthetic_score": ascore, "width": width,"height": height}]], )
+        return (clip.encode_from_tokens_scheduled(tokens, add_dict={"aesthetic_score": ascore, "width": width, "height": height}), )

 class CLIPTextEncodeSDXL:
    @classmethod
    def INPUT_TYPES(s):
        return {"required": {
+            "clip": ("CLIP", ),
            "width": ("INT", {"default": 1024.0, "min": 0, "max": MAX_RESOLUTION}),
            "height": ("INT", {"default": 1024.0, "min": 0, "max": MAX_RESOLUTION}),
            "crop_w": ("INT", {"default": 0, "min": 0, "max": MAX_RESOLUTION}),
            "crop_h": ("INT", {"default": 0, "min": 0, "max": MAX_RESOLUTION}),
            "target_width": ("INT", {"default": 1024.0, "min": 0, "max": MAX_RESOLUTION}),
            "target_height": ("INT", {"default": 1024.0, "min": 0, "max": MAX_RESOLUTION}),
-            "text_g": ("STRING", {"multiline": True, "dynamicPrompts": True}), "clip": ("CLIP", ),
-            "text_l": ("STRING", {"multiline": True, "dynamicPrompts": True}), "clip": ("CLIP", ),
+            "text_g": ("STRING", {"multiline": True, "dynamicPrompts": True}),
+            "text_l": ("STRING", {"multiline": True, "dynamicPrompts": True}),
            }}
    RETURN_TYPES = ("CONDITIONING",)
    FUNCTION = "encode"
@@ -47,8 +46,7 @@ class CLIPTextEncodeSDXL:
                tokens["l"] += empty["l"]
            while len(tokens["l"]) > len(tokens["g"]):
                tokens["g"] += empty["g"]
-        cond, pooled = clip.encode_from_tokens(tokens, return_pooled=True)
-        return ([[cond, {"pooled_output": pooled, "width": width, "height": height, "crop_w": crop_w, "crop_h": crop_h, "target_width": target_width, "target_height": target_height}]], )
+        return (clip.encode_from_tokens_scheduled(tokens, add_dict={"width": width, "height": height, "crop_w": crop_w, "crop_h": crop_h, "target_width": target_width, "target_height": target_height}), )

 NODE_CLASS_MAPPINGS = {
    "CLIPTextEncodeSDXLRefiner": CLIPTextEncodeSDXLRefiner,
--- a/comfy_extras/nodes_compositing.py
+++ b/comfy_extras/nodes_compositing.py
@@ -1,4 +1,3 @@
-import numpy as np
 import torch
 import comfy.utils
 from enum import Enum
--- a/comfy_extras/nodes_flux.py
+++ b/comfy_extras/nodes_flux.py
@@ -18,10 +18,7 @@ class CLIPTextEncodeFlux:
        tokens = clip.tokenize(clip_l)
        tokens["t5xxl"] = clip.tokenize(t5xxl)["t5xxl"]

-        output = clip.encode_from_tokens(tokens, return_pooled=True, return_dict=True)
-        cond = output.pop("cond")
-        output["guidance"] = guidance
-        return ([[cond, output]], )
+        return (clip.encode_from_tokens_scheduled(tokens, add_dict={"guidance": guidance}), )

 class FluxGuidance:
    @classmethod
--- a/comfy_extras/nodes_hooks.py
+++ b/comfy_extras/nodes_hooks.py
@@ -0,0 +1,744 @@
+from __future__ import annotations
+from typing import TYPE_CHECKING, Union
+import torch
+from collections.abc import Iterable
+
+if TYPE_CHECKING:
+    from comfy.sd import CLIP
+
+import comfy.hooks
+import comfy.sd
+import comfy.utils
+import folder_paths
+
+###########################################
+# Mask, Combine, and Hook Conditioning
+#------------------------------------------
+class PairConditioningSetProperties:
+    NodeId = 'PairConditioningSetProperties'
+    NodeName = 'Cond Pair Set Props'
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+                "positive_NEW": ("CONDITIONING", ),
+                "negative_NEW": ("CONDITIONING", ),
+                "strength": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 10.0, "step": 0.01}),
+                "set_cond_area": (["default", "mask bounds"],),
+            },
+            "optional": {
+                "mask": ("MASK", ),
+                "hooks": ("HOOKS",),
+                "timesteps": ("TIMESTEPS_RANGE",),
+            }
+        }
+    
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("CONDITIONING", "CONDITIONING")
+    RETURN_NAMES = ("positive", "negative")
+    CATEGORY = "advanced/hooks/cond pair"
+    FUNCTION = "set_properties"
+
+    def set_properties(self, positive_NEW, negative_NEW,
+                       strength: float, set_cond_area: str,
+                       mask: torch.Tensor=None, hooks: comfy.hooks.HookGroup=None, timesteps: tuple=None):
+        final_positive, final_negative = comfy.hooks.set_conds_props(conds=[positive_NEW, negative_NEW],
+                                                                    strength=strength, set_cond_area=set_cond_area,
+                                                                    mask=mask, hooks=hooks, timesteps_range=timesteps)
+        return (final_positive, final_negative)
+    
+class PairConditioningSetPropertiesAndCombine:
+    NodeId = 'PairConditioningSetPropertiesAndCombine'
+    NodeName = 'Cond Pair Set Props Combine'
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+                "positive": ("CONDITIONING", ),
+                "negative": ("CONDITIONING", ),
+                "positive_NEW": ("CONDITIONING", ),
+                "negative_NEW": ("CONDITIONING", ),
+                "strength": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 10.0, "step": 0.01}),
+                "set_cond_area": (["default", "mask bounds"],),
+            },
+            "optional": {
+                "mask": ("MASK", ),
+                "hooks": ("HOOKS",),
+                "timesteps": ("TIMESTEPS_RANGE",),
+            }
+        }
+    
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("CONDITIONING", "CONDITIONING")
+    RETURN_NAMES = ("positive", "negative")
+    CATEGORY = "advanced/hooks/cond pair"
+    FUNCTION = "set_properties"
+
+    def set_properties(self, positive, negative, positive_NEW, negative_NEW,
+                       strength: float, set_cond_area: str,
+                       mask: torch.Tensor=None, hooks: comfy.hooks.HookGroup=None, timesteps: tuple=None):
+        final_positive, final_negative = comfy.hooks.set_conds_props_and_combine(conds=[positive, negative], new_conds=[positive_NEW, negative_NEW],
+                                                                                strength=strength, set_cond_area=set_cond_area,
+                                                                                mask=mask, hooks=hooks, timesteps_range=timesteps)
+        return (final_positive, final_negative)
+
+class ConditioningSetProperties:
+    NodeId = 'ConditioningSetProperties'
+    NodeName = 'Cond Set Props'
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+                "cond_NEW": ("CONDITIONING", ),
+                "strength": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 10.0, "step": 0.01}),
+                "set_cond_area": (["default", "mask bounds"],),
+            },
+            "optional": {
+                "mask": ("MASK", ),
+                "hooks": ("HOOKS",),
+                "timesteps": ("TIMESTEPS_RANGE",),
+            }
+        }
+
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("CONDITIONING",)
+    CATEGORY = "advanced/hooks/cond single"
+    FUNCTION = "set_properties"
+
+    def set_properties(self, cond_NEW,
+                       strength: float, set_cond_area: str,
+                       mask: torch.Tensor=None, hooks: comfy.hooks.HookGroup=None, timesteps: tuple=None):
+        (final_cond,) = comfy.hooks.set_conds_props(conds=[cond_NEW],
+                                                   strength=strength, set_cond_area=set_cond_area,
+                                                   mask=mask, hooks=hooks, timesteps_range=timesteps)
+        return (final_cond,)
+
+class ConditioningSetPropertiesAndCombine:
+    NodeId = 'ConditioningSetPropertiesAndCombine'
+    NodeName = 'Cond Set Props Combine'
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+                "cond": ("CONDITIONING", ),
+                "cond_NEW": ("CONDITIONING", ),
+                "strength": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 10.0, "step": 0.01}),
+                "set_cond_area": (["default", "mask bounds"],),
+            },
+            "optional": {
+                "mask": ("MASK", ),
+                "hooks": ("HOOKS",),
+                "timesteps": ("TIMESTEPS_RANGE",),
+            }
+        }
+
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("CONDITIONING",)
+    CATEGORY = "advanced/hooks/cond single"
+    FUNCTION = "set_properties"
+
+    def set_properties(self, cond, cond_NEW,
+                       strength: float, set_cond_area: str,
+                       mask: torch.Tensor=None, hooks: comfy.hooks.HookGroup=None, timesteps: tuple=None):
+        (final_cond,) = comfy.hooks.set_conds_props_and_combine(conds=[cond], new_conds=[cond_NEW],
+                                                               strength=strength, set_cond_area=set_cond_area,
+                                                               mask=mask, hooks=hooks, timesteps_range=timesteps)
+        return (final_cond,)
+
+class PairConditioningCombine:
+    NodeId = 'PairConditioningCombine'
+    NodeName = 'Cond Pair Combine'
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+                "positive_A": ("CONDITIONING",),
+                "negative_A": ("CONDITIONING",),
+                "positive_B": ("CONDITIONING",),
+                "negative_B": ("CONDITIONING",),
+            },
+        }
+    
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("CONDITIONING", "CONDITIONING")
+    RETURN_NAMES = ("positive", "negative")
+    CATEGORY = "advanced/hooks/cond pair"
+    FUNCTION = "combine"
+
+    def combine(self, positive_A, negative_A, positive_B, negative_B):
+        final_positive, final_negative = comfy.hooks.set_conds_props_and_combine(conds=[positive_A, negative_A], new_conds=[positive_B, negative_B],)
+        return (final_positive, final_negative,)
+
+class PairConditioningSetDefaultAndCombine:
+    NodeId = 'PairConditioningSetDefaultCombine'
+    NodeName = 'Cond Pair Set Default Combine'
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+                "positive": ("CONDITIONING",),
+                "negative": ("CONDITIONING",),
+                "positive_DEFAULT": ("CONDITIONING",),
+                "negative_DEFAULT": ("CONDITIONING",),
+            },
+            "optional": {
+                "hooks": ("HOOKS",),
+            }
+        }
+    
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("CONDITIONING", "CONDITIONING")
+    RETURN_NAMES = ("positive", "negative")
+    CATEGORY = "advanced/hooks/cond pair"
+    FUNCTION = "set_default_and_combine"
+
+    def set_default_and_combine(self, positive, negative, positive_DEFAULT, negative_DEFAULT,
+                                hooks: comfy.hooks.HookGroup=None):
+        final_positive, final_negative = comfy.hooks.set_default_conds_and_combine(conds=[positive, negative], new_conds=[positive_DEFAULT, negative_DEFAULT],
+                                                                                   hooks=hooks)
+        return (final_positive, final_negative)
+    
+class ConditioningSetDefaultAndCombine:
+    NodeId = 'ConditioningSetDefaultCombine'
+    NodeName = 'Cond Set Default Combine'
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+                "cond": ("CONDITIONING",),
+                "cond_DEFAULT": ("CONDITIONING",),
+            },
+            "optional": {
+                "hooks": ("HOOKS",),
+            }
+        }
+
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("CONDITIONING",)
+    CATEGORY = "advanced/hooks/cond single"
+    FUNCTION = "set_default_and_combine"
+
+    def set_default_and_combine(self, cond, cond_DEFAULT,
+                                hooks: comfy.hooks.HookGroup=None):
+        (final_conditioning,) = comfy.hooks.set_default_conds_and_combine(conds=[cond], new_conds=[cond_DEFAULT],
+                                                                        hooks=hooks)
+        return (final_conditioning,)
+    
+class SetClipHooks:
+    NodeId = 'SetClipHooks'
+    NodeName = 'Set CLIP Hooks'
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+                "clip": ("CLIP",),
+                "apply_to_conds": ("BOOLEAN", {"default": True}),
+                "schedule_clip": ("BOOLEAN", {"default": False})
+            },
+            "optional": {
+                "hooks": ("HOOKS",)
+            }
+        }
+    
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("CLIP",)
+    CATEGORY = "advanced/hooks/clip"
+    FUNCTION = "apply_hooks"
+
+    def apply_hooks(self, clip: 'CLIP', schedule_clip: bool, apply_to_conds: bool, hooks: comfy.hooks.HookGroup=None):
+        if hooks is not None:
+            clip = clip.clone()
+            if apply_to_conds:
+                clip.apply_hooks_to_conds = hooks
+            clip.patcher.forced_hooks = hooks.clone()
+            clip.use_clip_schedule = schedule_clip
+            if not clip.use_clip_schedule:
+                clip.patcher.forced_hooks.set_keyframes_on_hooks(None)
+            clip.patcher.register_all_hook_patches(hooks.get_dict_repr(), comfy.hooks.EnumWeightTarget.Clip)
+        return (clip,)
+
+class ConditioningTimestepsRange:
+    NodeId = 'ConditioningTimestepsRange'
+    NodeName = 'Timesteps Range'
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+                "start_percent": ("FLOAT", {"default": 0.0, "min": 0.0, "max": 1.0, "step": 0.001}),
+                "end_percent": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.001})
+            },
+        }
+    
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("TIMESTEPS_RANGE", "TIMESTEPS_RANGE", "TIMESTEPS_RANGE")
+    RETURN_NAMES = ("TIMESTEPS_RANGE", "BEFORE_RANGE", "AFTER_RANGE")
+    CATEGORY = "advanced/hooks"
+    FUNCTION = "create_range"
+
+    def create_range(self, start_percent: float, end_percent: float):
+        return ((start_percent, end_percent), (0.0, start_percent), (end_percent, 1.0))
+#------------------------------------------
+###########################################
+
+
+###########################################
+# Create Hooks
+#------------------------------------------
+class CreateHookLora:
+    NodeId = 'CreateHookLora'
+    NodeName = 'Create Hook LoRA'
+    def __init__(self):
+        self.loaded_lora = None
+    
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+                "lora_name": (folder_paths.get_filename_list("loras"), ),
+                "strength_model": ("FLOAT", {"default": 1.0, "min": -20.0, "max": 20.0, "step": 0.01}),
+                "strength_clip": ("FLOAT", {"default": 1.0, "min": -20.0, "max": 20.0, "step": 0.01}),
+            },
+            "optional": {
+                "prev_hooks": ("HOOKS",)
+            }
+        }
+    
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("HOOKS",)
+    CATEGORY = "advanced/hooks/create"
+    FUNCTION = "create_hook"
+
+    def create_hook(self, lora_name: str, strength_model: float, strength_clip: float, prev_hooks: comfy.hooks.HookGroup=None):
+        if prev_hooks is None:
+            prev_hooks = comfy.hooks.HookGroup()
+        prev_hooks.clone()
+
+        if strength_model == 0 and strength_clip == 0:
+            return (prev_hooks,)
+        
+        lora_path = folder_paths.get_full_path("loras", lora_name)
+        lora = None
+        if self.loaded_lora is not None:
+            if self.loaded_lora[0] == lora_path:
+                lora = self.loaded_lora[1]
+            else:
+                temp = self.loaded_lora
+                self.loaded_lora = None
+                del temp
+        
+        if lora is None:
+            lora = comfy.utils.load_torch_file(lora_path, safe_load=True)
+            self.loaded_lora = (lora_path, lora)
+
+        hooks = comfy.hooks.create_hook_lora(lora=lora, strength_model=strength_model, strength_clip=strength_clip)
+        return (prev_hooks.clone_and_combine(hooks),)
+
+class CreateHookLoraModelOnly(CreateHookLora):
+    NodeId = 'CreateHookLoraModelOnly'
+    NodeName = 'Create Hook LoRA (MO)'
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+                "lora_name": (folder_paths.get_filename_list("loras"), ),
+                "strength_model": ("FLOAT", {"default": 1.0, "min": -20.0, "max": 20.0, "step": 0.01}),
+            },
+            "optional": {
+                "prev_hooks": ("HOOKS",)
+            }
+        }
+    
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("HOOKS",)
+    CATEGORY = "advanced/hooks/create"
+    FUNCTION = "create_hook_model_only"
+
+    def create_hook_model_only(self, lora_name: str, strength_model: float, prev_hooks: comfy.hooks.HookGroup=None):
+        return self.create_hook(lora_name=lora_name, strength_model=strength_model, strength_clip=0, prev_hooks=prev_hooks)
+
+class CreateHookModelAsLora:
+    NodeId = 'CreateHookModelAsLora'
+    NodeName = 'Create Hook Model as LoRA'
+
+    def __init__(self):
+        # when not None, will be in following format:
+        # (ckpt_path: str, weights_model: dict, weights_clip: dict)
+        self.loaded_weights = None
+
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+                "ckpt_name": (folder_paths.get_filename_list("checkpoints"), ),
+                "strength_model": ("FLOAT", {"default": 1.0, "min": -20.0, "max": 20.0, "step": 0.01}),
+                "strength_clip": ("FLOAT", {"default": 1.0, "min": -20.0, "max": 20.0, "step": 0.01}),
+            },
+            "optional": {
+                "prev_hooks": ("HOOKS",)
+            }
+        }
+    
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("HOOKS",)
+    CATEGORY = "advanced/hooks/create"
+    FUNCTION = "create_hook"
+
+    def create_hook(self, ckpt_name: str, strength_model: float, strength_clip: float,
+                    prev_hooks: comfy.hooks.HookGroup=None):
+        if prev_hooks is None:
+            prev_hooks = comfy.hooks.HookGroup()
+        prev_hooks.clone()
+
+        ckpt_path = folder_paths.get_full_path("checkpoints", ckpt_name)
+        weights_model = None
+        weights_clip = None
+        if self.loaded_weights is not None:
+            if self.loaded_weights[0] == ckpt_path:
+                weights_model = self.loaded_weights[1]
+                weights_clip = self.loaded_weights[2]
+            else:
+                temp = self.loaded_weights
+                self.loaded_weights = None
+                del temp
+        
+        if weights_model is None:
+            out = comfy.sd.load_checkpoint_guess_config(ckpt_path, output_vae=True, output_clip=True, embedding_directory=folder_paths.get_folder_paths("embeddings"))
+            weights_model = comfy.hooks.get_patch_weights_from_model(out[0])
+            weights_clip = comfy.hooks.get_patch_weights_from_model(out[1].patcher if out[1] else out[1])
+            self.loaded_weights = (ckpt_path, weights_model, weights_clip)
+
+        hooks = comfy.hooks.create_hook_model_as_lora(weights_model=weights_model, weights_clip=weights_clip,
+                                                      strength_model=strength_model, strength_clip=strength_clip)
+        return (prev_hooks.clone_and_combine(hooks),)
+
+class CreateHookModelAsLoraModelOnly(CreateHookModelAsLora):
+    NodeId = 'CreateHookModelAsLoraModelOnly'
+    NodeName = 'Create Hook Model as LoRA (MO)'
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+                "ckpt_name": (folder_paths.get_filename_list("checkpoints"), ),
+                "strength_model": ("FLOAT", {"default": 1.0, "min": -20.0, "max": 20.0, "step": 0.01}),
+            },
+            "optional": {
+                "prev_hooks": ("HOOKS",)
+            }
+        }
+    
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("HOOKS",)
+    CATEGORY = "advanced/hooks/create"
+    FUNCTION = "create_hook_model_only"
+
+    def create_hook_model_only(self, ckpt_name: str, strength_model: float,
+                               prev_hooks: comfy.hooks.HookGroup=None):
+        return self.create_hook(ckpt_name=ckpt_name, strength_model=strength_model, strength_clip=0.0, prev_hooks=prev_hooks)
+#------------------------------------------
+###########################################
+
+
+###########################################
+# Schedule Hooks
+#------------------------------------------
+class SetHookKeyframes:
+    NodeId = 'SetHookKeyframes'
+    NodeName = 'Set Hook Keyframes'
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+                "hooks": ("HOOKS",),
+            },
+            "optional": {
+                "hook_kf": ("HOOK_KEYFRAMES",),
+            }
+        }
+    
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("HOOKS",)
+    CATEGORY = "advanced/hooks/scheduling"
+    FUNCTION = "set_hook_keyframes"
+
+    def set_hook_keyframes(self, hooks: comfy.hooks.HookGroup, hook_kf: comfy.hooks.HookKeyframeGroup=None):
+        if hook_kf is not None:
+            hooks = hooks.clone()
+            hooks.set_keyframes_on_hooks(hook_kf=hook_kf)
+        return (hooks,)
+
+class CreateHookKeyframe:
+    NodeId = 'CreateHookKeyframe'
+    NodeName = 'Create Hook Keyframe'
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+                "strength_mult": ("FLOAT", {"default": 1.0, "min": -20.0, "max": 20.0, "step": 0.01}),
+                "start_percent": ("FLOAT", {"default": 0.0, "min": 0.0, "max": 1.0, "step": 0.001}),
+            },
+            "optional": {
+                "prev_hook_kf": ("HOOK_KEYFRAMES",),
+            }
+        }
+    
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("HOOK_KEYFRAMES",)
+    RETURN_NAMES = ("HOOK_KF",)
+    CATEGORY = "advanced/hooks/scheduling"
+    FUNCTION = "create_hook_keyframe"
+
+    def create_hook_keyframe(self, strength_mult: float, start_percent: float, prev_hook_kf: comfy.hooks.HookKeyframeGroup=None):
+        if prev_hook_kf is None:
+            prev_hook_kf = comfy.hooks.HookKeyframeGroup()
+        prev_hook_kf = prev_hook_kf.clone()
+        keyframe = comfy.hooks.HookKeyframe(strength=strength_mult, start_percent=start_percent)
+        prev_hook_kf.add(keyframe)
+        return (prev_hook_kf,)
+
+class CreateHookKeyframesInterpolated:
+    NodeId = 'CreateHookKeyframesInterpolated'
+    NodeName = 'Create Hook Keyframes Interp.'
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+                "strength_start": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 10.0, "step": 0.001}, ),
+                "strength_end": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 10.0, "step": 0.001}, ),
+                "interpolation": (comfy.hooks.InterpolationMethod._LIST, ),
+                "start_percent": ("FLOAT", {"default": 0.0, "min": 0.0, "max": 1.0, "step": 0.001}),
+                "end_percent": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.001}),
+                "keyframes_count": ("INT", {"default": 5, "min": 2, "max": 100, "step": 1}),
+                "print_keyframes": ("BOOLEAN", {"default": False}),
+            },
+            "optional": {
+                "prev_hook_kf": ("HOOK_KEYFRAMES",),
+            },
+        }
+    
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("HOOK_KEYFRAMES",)
+    RETURN_NAMES = ("HOOK_KF",)
+    CATEGORY = "advanced/hooks/scheduling"
+    FUNCTION = "create_hook_keyframes"
+
+    def create_hook_keyframes(self, strength_start: float, strength_end: float, interpolation: str,
+                              start_percent: float, end_percent: float, keyframes_count: int,
+                              print_keyframes=False, prev_hook_kf: comfy.hooks.HookKeyframeGroup=None):
+        if prev_hook_kf is None:
+            prev_hook_kf = comfy.hooks.HookKeyframeGroup()
+        prev_hook_kf = prev_hook_kf.clone()
+        percents = comfy.hooks.InterpolationMethod.get_weights(num_from=start_percent, num_to=end_percent, length=keyframes_count,
+                                                               method=comfy.hooks.InterpolationMethod.LINEAR)
+        strengths = comfy.hooks.InterpolationMethod.get_weights(num_from=strength_start, num_to=strength_end, length=keyframes_count, method=interpolation)
+
+        is_first = True
+        for percent, strength in zip(percents, strengths):
+            guarantee_steps = 0
+            if is_first:
+                guarantee_steps = 1
+                is_first = False
+            prev_hook_kf.add(comfy.hooks.HookKeyframe(strength=strength, start_percent=percent, guarantee_steps=guarantee_steps))
+            if print_keyframes:
+                print(f"Hook Keyframe - start_percent:{percent} = {strength}")
+        return (prev_hook_kf,)
+
+class CreateHookKeyframesFromFloats:
+    NodeId = 'CreateHookKeyframesFromFloats'
+    NodeName = 'Create Hook Keyframes From Floats'
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+                "floats_strength": ("FLOATS", {"default": -1, "min": -1, "step": 0.001, "forceInput": True}),
+                "start_percent": ("FLOAT", {"default": 0.0, "min": 0.0, "max": 1.0, "step": 0.001}),
+                "end_percent": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.001}),
+                "print_keyframes": ("BOOLEAN", {"default": False}),
+            },
+            "optional": {
+                "prev_hook_kf": ("HOOK_KEYFRAMES",),
+            }
+        }
+    
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("HOOK_KEYFRAMES",)
+    RETURN_NAMES = ("HOOK_KF",)
+    CATEGORY = "advanced/hooks/scheduling"
+    FUNCTION = "create_hook_keyframes"
+
+    def create_hook_keyframes(self, floats_strength: Union[float, list[float]],
+                              start_percent: float, end_percent: float,
+                              prev_hook_kf: comfy.hooks.HookKeyframeGroup=None, print_keyframes=False):
+        if prev_hook_kf is None:
+            prev_hook_kf = comfy.hooks.HookKeyframeGroup()
+        prev_hook_kf = prev_hook_kf.clone()
+        if type(floats_strength) in (float, int):
+            floats_strength = [float(floats_strength)]
+        elif isinstance(floats_strength, Iterable):
+            pass
+        else:
+            raise Exception(f"floats_strength must be either an iterable input or a float, but was{type(floats_strength).__repr__}.")
+        percents = comfy.hooks.InterpolationMethod.get_weights(num_from=start_percent, num_to=end_percent, length=len(floats_strength),
+                                                               method=comfy.hooks.InterpolationMethod.LINEAR)
+        
+        is_first = True
+        for percent, strength in zip(percents, floats_strength):
+            guarantee_steps = 0
+            if is_first:
+                guarantee_steps = 1
+                is_first = False
+            prev_hook_kf.add(comfy.hooks.HookKeyframe(strength=strength, start_percent=percent, guarantee_steps=guarantee_steps))
+            if print_keyframes:
+                print(f"Hook Keyframe - start_percent:{percent} = {strength}")
+        return (prev_hook_kf,)
+#------------------------------------------
+###########################################
+
+
+class SetModelHooksOnCond:
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+                "conditioning": ("CONDITIONING",),
+                "hooks": ("HOOKS",),
+            },
+        }
+    
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("CONDITIONING",)
+    CATEGORY = "advanced/hooks/manual"
+    FUNCTION = "attach_hook"
+
+    def attach_hook(self, conditioning, hooks: comfy.hooks.HookGroup):
+        return (comfy.hooks.set_hooks_for_conditioning(conditioning, hooks),)
+
+
+###########################################
+# Combine Hooks
+#------------------------------------------
+class CombineHooks:
+    NodeId = 'CombineHooks2'
+    NodeName = 'Combine Hooks [2]'
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+            },
+            "optional": {
+                "hooks_A": ("HOOKS",),
+                "hooks_B": ("HOOKS",),
+            }
+        }
+    
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("HOOKS",)
+    CATEGORY = "advanced/hooks/combine"
+    FUNCTION = "combine_hooks"
+
+    def combine_hooks(self,
+                      hooks_A: comfy.hooks.HookGroup=None,
+                      hooks_B: comfy.hooks.HookGroup=None):
+        candidates = [hooks_A, hooks_B]
+        return (comfy.hooks.HookGroup.combine_all_hooks(candidates),)
+
+class CombineHooksFour:
+    NodeId = 'CombineHooks4'
+    NodeName = 'Combine Hooks [4]'
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+            },
+            "optional": {
+                "hooks_A": ("HOOKS",),
+                "hooks_B": ("HOOKS",),
+                "hooks_C": ("HOOKS",),
+                "hooks_D": ("HOOKS",),
+            }
+        }
+    
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("HOOKS",)
+    CATEGORY = "advanced/hooks/combine"
+    FUNCTION = "combine_hooks"
+
+    def combine_hooks(self,
+                      hooks_A: comfy.hooks.HookGroup=None,
+                      hooks_B: comfy.hooks.HookGroup=None,
+                      hooks_C: comfy.hooks.HookGroup=None,
+                      hooks_D: comfy.hooks.HookGroup=None):
+        candidates = [hooks_A, hooks_B, hooks_C, hooks_D]
+        return (comfy.hooks.HookGroup.combine_all_hooks(candidates),)
+
+class CombineHooksEight:
+    NodeId = 'CombineHooks8'
+    NodeName = 'Combine Hooks [8]'
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+            },
+            "optional": {
+                "hooks_A": ("HOOKS",),
+                "hooks_B": ("HOOKS",),
+                "hooks_C": ("HOOKS",),
+                "hooks_D": ("HOOKS",),
+                "hooks_E": ("HOOKS",),
+                "hooks_F": ("HOOKS",),
+                "hooks_G": ("HOOKS",),
+                "hooks_H": ("HOOKS",),
+            }
+        }
+    
+    EXPERIMENTAL = True
+    RETURN_TYPES = ("HOOKS",)
+    CATEGORY = "advanced/hooks/combine"
+    FUNCTION = "combine_hooks"
+
+    def combine_hooks(self,
+                      hooks_A: comfy.hooks.HookGroup=None,
+                      hooks_B: comfy.hooks.HookGroup=None,
+                      hooks_C: comfy.hooks.HookGroup=None,
+                      hooks_D: comfy.hooks.HookGroup=None,
+                      hooks_E: comfy.hooks.HookGroup=None,
+                      hooks_F: comfy.hooks.HookGroup=None,
+                      hooks_G: comfy.hooks.HookGroup=None,
+                      hooks_H: comfy.hooks.HookGroup=None):
+        candidates = [hooks_A, hooks_B, hooks_C, hooks_D, hooks_E, hooks_F, hooks_G, hooks_H]
+        return (comfy.hooks.HookGroup.combine_all_hooks(candidates),)
+#------------------------------------------
+###########################################
+
+node_list = [
+    # Create
+    CreateHookLora,
+    CreateHookLoraModelOnly,
+    CreateHookModelAsLora,
+    CreateHookModelAsLoraModelOnly,
+    # Scheduling
+    SetHookKeyframes,
+    CreateHookKeyframe,
+    CreateHookKeyframesInterpolated,
+    CreateHookKeyframesFromFloats,
+    # Combine
+    CombineHooks,
+    CombineHooksFour,
+    CombineHooksEight,
+    # Attach
+    ConditioningSetProperties,
+    ConditioningSetPropertiesAndCombine,
+    PairConditioningSetProperties,
+    PairConditioningSetPropertiesAndCombine,
+    ConditioningSetDefaultAndCombine,
+    PairConditioningSetDefaultAndCombine,
+    PairConditioningCombine,
+    SetClipHooks,
+    # Other
+    ConditioningTimestepsRange,
+]
+NODE_CLASS_MAPPINGS = {}
+NODE_DISPLAY_NAME_MAPPINGS = {}
+
+for node in node_list:
+    NODE_CLASS_MAPPINGS[node.NodeId] = node
+    NODE_DISPLAY_NAME_MAPPINGS[node.NodeId] = node.NodeName
--- a/comfy_extras/nodes_hunyuan.py
+++ b/comfy_extras/nodes_hunyuan.py
@@ -1,3 +1,8 @@
+import nodes
+import torch
+import comfy.model_management
+
+
 class CLIPTextEncodeHunyuanDiT:
    @classmethod
    def INPUT_TYPES(s):
@@ -15,11 +20,25 @@ class CLIPTextEncodeHunyuanDiT:
        tokens = clip.tokenize(bert)
        tokens["mt5xl"] = clip.tokenize(mt5xl)["mt5xl"]

-        output = clip.encode_from_tokens(tokens, return_pooled=True, return_dict=True)
-        cond = output.pop("cond")
-        return ([[cond, output]], )
+        return (clip.encode_from_tokens_scheduled(tokens), )

+class EmptyHunyuanLatentVideo:
+    @classmethod
+    def INPUT_TYPES(s):
+        return {"required": { "width": ("INT", {"default": 848, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
+                              "height": ("INT", {"default": 480, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
+                              "length": ("INT", {"default": 25, "min": 1, "max": nodes.MAX_RESOLUTION, "step": 4}),
+                              "batch_size": ("INT", {"default": 1, "min": 1, "max": 4096})}}
+    RETURN_TYPES = ("LATENT",)
+    FUNCTION = "generate"
+
+    CATEGORY = "latent/video"
+
+    def generate(self, width, height, length, batch_size=1):
+        latent = torch.zeros([batch_size, 16, ((length - 1) // 4) + 1, height // 8, width // 8], device=comfy.model_management.intermediate_device())
+        return ({"samples":latent}, )

 NODE_CLASS_MAPPINGS = {
    "CLIPTextEncodeHunyuanDiT": CLIPTextEncodeHunyuanDiT,
+    "EmptyHunyuanLatentVideo": EmptyHunyuanLatentVideo,
 }
--- a/comfy_extras/nodes_hypertile.py
+++ b/comfy_extras/nodes_hypertile.py
@@ -35,8 +35,6 @@ class HyperTile:
    CATEGORY = "model_patches/unet"

    def patch(self, model, tile_size, swap_size, max_depth, scale_depth):
-        model_channels = model.model.model_config.unet_config["model_channels"]
-
        latent_tile_size = max(32, tile_size) // 8
        self.temp = None

--- a/comfy_extras/nodes_load_3d.py
+++ b/comfy_extras/nodes_load_3d.py
@@ -0,0 +1,124 @@
+import nodes
+import folder_paths
+import os
+
+def normalize_path(path):
+    return path.replace('\\', '/')
+
+class Load3D():
+    @classmethod
+    def INPUT_TYPES(s):
+        input_dir = os.path.join(folder_paths.get_input_directory(), "3d")
+
+        os.makedirs(input_dir, exist_ok=True)
+
+        files = [normalize_path(os.path.join("3d", f)) for f in os.listdir(input_dir) if f.endswith(('.gltf', '.glb', '.obj', '.mtl', '.fbx', '.stl'))]
+
+        return {"required": {
+            "model_file": (sorted(files), {"file_upload": True}),
+            "image": ("LOAD_3D", {}),
+            "width": ("INT", {"default": 1024, "min": 1, "max": 4096, "step": 1}),
+            "height": ("INT", {"default": 1024, "min": 1, "max": 4096, "step": 1}),
+            "show_grid": ([True, False],),
+            "camera_type": (["perspective", "orthographic"],),
+            "view": (["front", "right", "top", "isometric"],),
+            "material": (["original", "normal", "wireframe", "depth"],),
+            "bg_color": ("STRING", {"default": "#000000", "multiline": False}),
+            "light_intensity": ("INT", {"default": 10, "min": 1, "max": 20, "step": 1}),
+            "up_direction": (["original", "-x", "+x", "-y", "+y", "-z", "+z"],),
+        }}
+
+    RETURN_TYPES = ("IMAGE", "MASK", "STRING")
+    RETURN_NAMES = ("image", "mask", "mesh_path")
+
+    FUNCTION = "process"
+    EXPERIMENTAL = True
+
+    CATEGORY = "3d"
+
+    def process(self, model_file, image, **kwargs):
+        imagepath = folder_paths.get_annotated_filepath(image)
+
+        load_image_node = nodes.LoadImage()
+
+        output_image, output_mask = load_image_node.load_image(image=imagepath)
+
+        return output_image, output_mask, model_file,
+
+class Load3DAnimation():
+    @classmethod
+    def INPUT_TYPES(s):
+        input_dir = os.path.join(folder_paths.get_input_directory(), "3d")
+
+        os.makedirs(input_dir, exist_ok=True)
+
+        files = [normalize_path(os.path.join("3d", f)) for f in os.listdir(input_dir) if f.endswith(('.gltf', '.glb', '.fbx'))]
+
+        return {"required": {
+            "model_file": (sorted(files), {"file_upload": True}),
+            "image": ("LOAD_3D_ANIMATION", {}),
+            "width": ("INT", {"default": 1024, "min": 1, "max": 4096, "step": 1}),
+            "height": ("INT", {"default": 1024, "min": 1, "max": 4096, "step": 1}),
+            "show_grid": ([True, False],),
+            "camera_type": (["perspective", "orthographic"],),
+            "view": (["front", "right", "top", "isometric"],),
+            "material": (["original", "normal", "wireframe", "depth"],),
+            "bg_color": ("STRING", {"default": "#000000", "multiline": False}),
+            "light_intensity": ("INT", {"default": 10, "min": 1, "max": 20, "step": 1}),
+            "up_direction": (["original", "-x", "+x", "-y", "+y", "-z", "+z"],),
+            "animation_speed": (["0.1", "0.5", "1", "1.5", "2"], {"default": "1"}),
+        }}
+
+    RETURN_TYPES = ("IMAGE", "MASK", "STRING")
+    RETURN_NAMES = ("image", "mask", "mesh_path")
+
+    FUNCTION = "process"
+    EXPERIMENTAL = True
+
+    CATEGORY = "3d"
+
+    def process(self, model_file, image, **kwargs):
+        imagepath = folder_paths.get_annotated_filepath(image)
+
+        load_image_node = nodes.LoadImage()
+
+        output_image, output_mask = load_image_node.load_image(image=imagepath)
+
+        return output_image, output_mask, model_file,
+
+class Preview3D():
+    @classmethod
+    def INPUT_TYPES(s):
+        return {"required": {
+            "model_file": ("STRING", {"default": "", "multiline": False}),
+            "show_grid": ([True, False],),
+            "camera_type": (["perspective", "orthographic"],),
+            "view": (["front", "right", "top", "isometric"],),
+            "material": (["original", "normal", "wireframe", "depth"],),
+            "bg_color": ("STRING", {"default": "#000000", "multiline": False}),
+            "light_intensity": ("INT", {"default": 10, "min": 1, "max": 20, "step": 1}),
+            "up_direction": (["original", "-x", "+x", "-y", "+y", "-z", "+z"],),
+        }}
+
+    OUTPUT_NODE = True
+    RETURN_TYPES = ()
+
+    CATEGORY = "3d"
+
+    FUNCTION = "process"
+    EXPERIMENTAL = True
+
+    def process(self, model_file, **kwargs):
+        return {"ui": {"model_file": [model_file]}, "result": ()}
+
+NODE_CLASS_MAPPINGS = {
+    "Load3D": Load3D,
+    "Load3DAnimation": Load3DAnimation,
+    "Preview3D": Preview3D
+}
+
+NODE_DISPLAY_NAME_MAPPINGS = {
+    "Load3D": "Load 3D",
+    "Load3DAnimation": "Load 3D - Animation",
+    "Preview3D": "Preview 3D"
+}
--- a/comfy_extras/nodes_lt.py
+++ b/comfy_extras/nodes_lt.py
@@ -32,7 +32,9 @@ class LTXVImgToVideo:
                             "width": ("INT", {"default": 768, "min": 64, "max": nodes.MAX_RESOLUTION, "step": 32}),
                             "height": ("INT", {"default": 512, "min": 64, "max": nodes.MAX_RESOLUTION, "step": 32}),
                             "length": ("INT", {"default": 97, "min": 9, "max": nodes.MAX_RESOLUTION, "step": 8}),
-                             "batch_size": ("INT", {"default": 1, "min": 1, "max": 4096})}}
+                             "batch_size": ("INT", {"default": 1, "min": 1, "max": 4096}),
+                             "image_noise_scale": ("FLOAT", {"default": 0.15, "min": 0, "max": 1.0, "step": 0.01, "tooltip": "Amount of noise to apply on conditioning image latent."})
+                             }}

    RETURN_TYPES = ("CONDITIONING", "CONDITIONING", "LATENT")
    RETURN_NAMES = ("positive", "negative", "latent")
@@ -40,12 +42,12 @@ class LTXVImgToVideo:
    CATEGORY = "conditioning/video_models"
    FUNCTION = "generate"

-    def generate(self, positive, negative, image, vae, width, height, length, batch_size):
+    def generate(self, positive, negative, image, vae, width, height, length, batch_size, image_noise_scale):
        pixels = comfy.utils.common_upscale(image.movedim(-1, 1), width, height, "bilinear", "center").movedim(1, -1)
        encode_pixels = pixels[:, :, :, :3]
        t = vae.encode(encode_pixels)
-        positive = node_helpers.conditioning_set_values(positive, {"guiding_latent": t})
-        negative = node_helpers.conditioning_set_values(negative, {"guiding_latent": t})
+        positive = node_helpers.conditioning_set_values(positive, {"guiding_latent": t, "guiding_latent_noise_scale": image_noise_scale})
+        negative = node_helpers.conditioning_set_values(negative, {"guiding_latent": t, "guiding_latent_noise_scale": image_noise_scale})

        latent = torch.zeros([batch_size, 128, ((length - 1) // 8) + 1, height // 32, width // 32], device=comfy.model_management.intermediate_device())
        latent[:, :, :t.shape[2]] = t
@@ -109,6 +111,7 @@ class ModelSamplingLTXV:
        model_sampling = ModelSamplingAdvanced(model.model.model_config)
        model_sampling.set_parameters(shift=shift)
        m.add_object_patch("model_sampling", model_sampling)
+
        return (m, )


--- a/comfy_extras/nodes_mahiro.py
+++ b/comfy_extras/nodes_mahiro.py
@@ -0,0 +1,41 @@
+import torch
+import torch.nn.functional as F
+
+class Mahiro:
+    @classmethod
+    def INPUT_TYPES(s):
+        return {"required": {"model": ("MODEL",),
+                            }}
+    RETURN_TYPES = ("MODEL",)
+    RETURN_NAMES = ("patched_model",)
+    FUNCTION = "patch"
+    CATEGORY = "_for_testing"
+    DESCRIPTION = "Modify the guidance to scale more on the 'direction' of the positive prompt rather than the difference between the negative prompt."
+    def patch(self, model):
+        m = model.clone()
+        def mahiro_normd(args):
+            scale: float = args['cond_scale']
+            cond_p: torch.Tensor = args['cond_denoised']
+            uncond_p: torch.Tensor = args['uncond_denoised']
+            #naive leap
+            leap = cond_p * scale
+            #sim with uncond leap
+            u_leap = uncond_p * scale
+            cfg = args["denoised"]
+            merge = (leap + cfg) / 2
+            normu = torch.sqrt(u_leap.abs()) * u_leap.sign()
+            normm = torch.sqrt(merge.abs()) * merge.sign()
+            sim = F.cosine_similarity(normu, normm).mean()
+            simsc = 2 * (sim+1)
+            wm = (simsc*cfg + (4-simsc)*leap) / 4
+            return wm
+        m.set_model_sampler_post_cfg_function(mahiro_normd)
+        return (m, )
+
+NODE_CLASS_MAPPINGS = {
+    "Mahiro": Mahiro
+}
+
+NODE_DISPLAY_NAME_MAPPINGS = {
+    "Mahiro": "Mahiro is so cute that she deserves a better guidance function!! (。・ω・。)",
+}
--- a/comfy_extras/nodes_model_advanced.py
+++ b/comfy_extras/nodes_model_advanced.py
@@ -1,4 +1,3 @@
-import folder_paths
 import comfy.sd
 import comfy.model_sampling
 import comfy.latent_formats
@@ -241,7 +240,6 @@ class ModelSamplingContinuousV:
    def patch(self, model, sampling, sigma_max, sigma_min):
        m = model.clone()

-        latent_format = None
        sigma_data = 1.0
        if sampling == "v_prediction":
            sampling_type = comfy.model_sampling.V_PREDICTION
--- a/comfy_extras/nodes_model_downscale.py
+++ b/comfy_extras/nodes_model_downscale.py
@@ -1,4 +1,3 @@
-import torch
 import comfy.utils

 class PatchModelAddDownscale:
--- a/comfy_extras/nodes_photomaker.py
+++ b/comfy_extras/nodes_photomaker.py
@@ -16,6 +16,7 @@ VISION_CONFIG_DICT = {
    "patch_size": 14,
    "projection_dim": 768,
    "hidden_act": "quick_gelu",
+    "model_type": "clip_vision_model",
 }

 class MLP(nn.Module):
--- a/comfy_extras/nodes_sd3.py
+++ b/comfy_extras/nodes_sd3.py
@@ -82,8 +82,7 @@ class CLIPTextEncodeSD3:
                tokens["l"] += empty["l"]
            while len(tokens["l"]) > len(tokens["g"]):
                tokens["g"] += empty["g"]
-        cond, pooled = clip.encode_from_tokens(tokens, return_pooled=True)
-        return ([[cond, {"pooled_output": pooled}]], )
+        return (clip.encode_from_tokens_scheduled(tokens), )


 class ControlNetApplySD3(nodes.ControlNetApplyAdvanced):
--- a/comfy_extras/nodes_slg.py
+++ b/comfy_extras/nodes_slg.py
@@ -16,7 +16,8 @@ class SkipLayerGuidanceDiT:
                             "single_layers": ("STRING", {"default": "7, 8, 9", "multiline": False}),
                             "scale": ("FLOAT", {"default": 3.0, "min": 0.0, "max": 10.0, "step": 0.1}),
                             "start_percent": ("FLOAT", {"default": 0.01, "min": 0.0, "max": 1.0, "step": 0.001}),
-                             "end_percent": ("FLOAT", {"default": 0.15, "min": 0.0, "max": 1.0, "step": 0.001})
+                             "end_percent": ("FLOAT", {"default": 0.15, "min": 0.0, "max": 1.0, "step": 0.001}),
+                             "rescaling_scale": ("FLOAT", {"default": 0.0, "min": 0.0, "max": 10.0, "step": 0.01}),
                                }}
    RETURN_TYPES = ("MODEL",)
    FUNCTION = "skip_guidance"
@@ -26,7 +27,7 @@ class SkipLayerGuidanceDiT:

    CATEGORY = "advanced/guidance"

-    def skip_guidance(self, model, scale, start_percent, end_percent, double_layers="", single_layers=""):
+    def skip_guidance(self, model, scale, start_percent, end_percent, double_layers="", single_layers="", rescaling_scale=0):
        # check if layer is comma separated integers
        def skip(args, extra_args):
            return args
@@ -65,6 +66,11 @@ class SkipLayerGuidanceDiT:
            if scale > 0 and sigma_ >= sigma_end and sigma_ <= sigma_start:
                (slg,) = comfy.samplers.calc_cond_batch(model, [cond], x, sigma, model_options)
                cfg_result = cfg_result + (cond_pred - slg) * scale
+                if rescaling_scale != 0:
+                    factor = cond_pred.std() / cfg_result.std()
+                    factor = rescaling_scale * factor + (1 - rescaling_scale)
+                    cfg_result *= factor
+
            return cfg_result

        m = model.clone()
--- a/comfy_extras/nodes_upscale_model.py
+++ b/comfy_extras/nodes_upscale_model.py
@@ -1,4 +1,3 @@
-import os
 import logging
 from spandrel import ModelLoader, ImageModelDescriptor
 from comfy import model_management
--- a/custom_nodes/websocket_image_save.py
+++ b/custom_nodes/websocket_image_save.py
@@ -1,7 +1,5 @@
-from PIL import Image, ImageOps
-from io import BytesIO
+from PIL import Image
 import numpy as np
-import struct
 import comfy.utils
 import time

--- a/execution.py
+++ b/execution.py
@@ -16,7 +16,7 @@ import comfy.model_management
 from comfy_execution.graph import get_input_info, ExecutionList, DynamicPrompt, ExecutionBlocker
 from comfy_execution.graph_utils import is_link, GraphBuilder
 from comfy_execution.caching import HierarchicalCache, LRUCache, CacheKeySetInputSignature, CacheKeySetID
-from comfy.cli_args import args
+from comfy_execution.validation import validate_node_input

 class ExecutionResult(Enum):
    SUCCESS = 0
@@ -144,11 +144,16 @@ def _map_node_over_list(obj, input_data_all, func, allow_interrupt=False, execut
        return {k: v[i if len(v) > i else -1] for k, v in d.items()}
    
    results = []
-    def process_inputs(inputs, index=None):
+    def process_inputs(inputs, index=None, input_is_list=False):
        if allow_interrupt:
            nodes.before_node_execution()
        execution_block = None
        for k, v in inputs.items():
+            if input_is_list:
+                for e in v:
+                    if isinstance(e, ExecutionBlocker):
+                        v = e
+                        break
            if isinstance(v, ExecutionBlocker):
                execution_block = execution_block_cb(v) if execution_block_cb else v
                break
@@ -160,7 +165,7 @@ def _map_node_over_list(obj, input_data_all, func, allow_interrupt=False, execut
            results.append(execution_block)

    if input_is_list:
-        process_inputs(input_data_all, 0)
+        process_inputs(input_data_all, 0, input_is_list=input_is_list)
    elif max_len_input == 0:
        process_inputs({})
    else: 
@@ -480,7 +485,7 @@ class PromptExecutor:
                if self.caches.outputs.get(node_id) is not None:
                    cached_nodes.append(node_id)

-            comfy.model_management.cleanup_models(keep_clone_weights_loaded=True)
+            comfy.model_management.cleanup_models_gc()
            self.add_message("execution_cached",
                          { "nodes": cached_nodes, "prompt_id": prompt_id},
                          broadcast=False)
@@ -527,7 +532,6 @@ class PromptExecutor:
                comfy.model_management.unload_all_models()


-
 def validate_inputs(prompt, item, validated):
    unique_id = item
    if unique_id in validated:
@@ -589,8 +593,8 @@ def validate_inputs(prompt, item, validated):
            r = nodes.NODE_CLASS_MAPPINGS[o_class_type].RETURN_TYPES
            received_type = r[val[1]]
            received_types[x] = received_type
-            if 'input_types' not in validate_function_inputs and received_type != type_input:
-                details = f"{x}, {received_type} != {type_input}"
+            if 'input_types' not in validate_function_inputs and not validate_node_input(received_type, type_input):
+                details = f"{x}, received_type({received_type}) mismatch input_type({type_input})"
                error = {
                    "type": "return_type_mismatch",
                    "message": "Return type mismatch between linked nodes",
@@ -761,7 +765,7 @@ def validate_prompt(prompt):
        if 'class_type' not in prompt[x]:
            error = {
                "type": "invalid_prompt",
-                "message": f"Cannot execute because a node is missing the class_type property.",
+                "message": "Cannot execute because a node is missing the class_type property.",
                "details": f"Node ID '#{x}'",
                "extra_info": {}
            }
--- a/fix_torch.py
+++ b/fix_torch.py
@@ -5,20 +5,24 @@ import ctypes
 import logging


-torch_spec = importlib.util.find_spec("torch")
-for folder in torch_spec.submodule_search_locations:
-    lib_folder = os.path.join(folder, "lib")
-    test_file = os.path.join(lib_folder, "fbgemm.dll")
-    dest = os.path.join(lib_folder, "libomp140.x86_64.dll")
-    if os.path.exists(dest):
-        break
-
-    with open(test_file, 'rb') as f:
-        contents = f.read()
-        if b"libomp140.x86_64.dll" not in contents:
+def fix_pytorch_libomp():
+    """
+    Fix PyTorch libomp DLL issue on Windows by copying the correct DLL file if needed.
+    """
+    torch_spec = importlib.util.find_spec("torch")
+    for folder in torch_spec.submodule_search_locations:
+        lib_folder = os.path.join(folder, "lib")
+        test_file = os.path.join(lib_folder, "fbgemm.dll")
+        dest = os.path.join(lib_folder, "libomp140.x86_64.dll")
+        if os.path.exists(dest):
            break
-    try:
-        mydll = ctypes.cdll.LoadLibrary(test_file)
-    except FileNotFoundError as e:
-        logging.warning("Detected pytorch version with libomp issue, patching.")
-        shutil.copyfile(os.path.join(lib_folder, "libiomp5md.dll"), dest)
+
+        with open(test_file, "rb") as f:
+            contents = f.read()
+            if b"libomp140.x86_64.dll" not in contents:
+                break
+        try:
+            ctypes.cdll.LoadLibrary(test_file)
+        except FileNotFoundError:
+            logging.warning("Detected pytorch version with libomp issue, patching.")
+            shutil.copyfile(os.path.join(lib_folder, "libiomp5md.dll"), dest)
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
comfyanonymous	9f4b181ab3	Add fast previews for hunyuan video.	2024-12-18 18:24:23 -05:00
comfyanonymous	cbbf077593	Small optimizations.	2024-12-18 18:23:28 -05:00
Chenlei Hu	0c04a6ae78	Add .github folder to maintainer owner list (#6027 )	2024-12-18 15:06:53 -05:00
Chenlei Hu	416ccc9e45	Update web content to release v1.5.19 (#6105 )	2024-12-18 15:06:20 -05:00
comfyanonymous	ff2ff02168	Support old diffusion-pipe hunyuan video loras.	2024-12-18 06:23:54 -05:00
comfyanonymous	4c5c4ddeda	Fix regression in VAE code on old pytorch versions.	2024-12-18 03:08:28 -05:00
comfyanonymous	79badea452	Add ConditioningStableAudio. This lets you control the seconds_start and seconds_total parameters for the Stable Audio model.	2024-12-18 03:01:12 -05:00
comfyanonymous	37e5390f5f	Add: --use-sage-attention to enable SageAttention. You need to have the library installed first.	2024-12-18 01:56:10 -05:00
comfyanonymous	a4f59bc65e	Pick attention implementation based on device in llama code.	2024-12-18 01:30:20 -05:00
comfyanonymous	ca457f7ba1	Properly tokenize the template for hunyuan video.	2024-12-17 16:22:02 -05:00
comfyanonymous	cd6f615038	Fix tiled vae not working with some shapes.	2024-12-17 16:22:02 -05:00
Terry Jia	517669aaa3	add preview 3d node (#6070 ) * add preview 3d node * mark 3d nodes as EXPERIMENTAL	2024-12-17 10:42:24 -08:00
comfyanonymous	e4e1bff605	Support diffusion-pipe hunyuan video lora format.	2024-12-17 07:14:21 -05:00
comfyanonymous	d6656b0c0c	Support llama hunyuan video text encoder in scaled fp8 format.	2024-12-17 04:19:22 -05:00
comfyanonymous	f4cdedea62	Fix regression with ltxv VAE.	2024-12-17 02:17:31 -05:00
comfyanonymous	39b1fc4ccc	Adjust used dtypes for hunyuan video VAE and diffusion model.	2024-12-16 23:31:10 -05:00
comfyanonymous	0b25f47bd9	Add some missing imports.	2024-12-16 19:42:01 -05:00
comfyanonymous	bda1482a27	Basic Hunyuan Video model support.	2024-12-16 19:35:40 -05:00
comfyanonymous	19ee5d9d8b	Don't expand mask when not necessary. Expanding seems to slow down inference.	2024-12-16 18:22:50 -05:00
Raphael Walker	61b50720d0	Add support for attention masking in Flux (#5942 ) * fix attention OOM in xformers * allow passing attention mask in flux attention * allow an attn_mask in flux * attn masks can be done using replace patches instead of a separate dict * fix return types * fix return order * enumerate * patch the right keys * arg names * fix a silly bug * fix xformers masks * replace match with if, elif, else * mask with image_ref_size * remove unused import * remove unused import 2 * fix pytorch/xformers attention This corrects a weird inconsistency with skip_reshape. It also allows masks of various shapes to be passed, which will be automtically expanded (in a memory-efficient way) to a size that is compatible with xformers or pytorch sdpa respectively. * fix mask shapes	2024-12-16 18:21:17 -05:00
Alexander Dyadyun	0f954f34af	Update README.md (#6071 ) The last ROCM 6.2 build was November 22nd, after that date new builds use ROCM 6.2.4. The builds from the new URL have been tested and work without problems.	2024-12-16 15:24:54 -05:00
Chenlei Hu	5262901c5c	Update web content to release v1.5.18 (#6075 )	2024-12-16 11:38:24 -08:00
Terry Jia	cc550d5908	use String directly to set bg color for load 3d canvas (#6057 )	2024-12-16 10:51:40 -08:00
comfyanonymous	6d1a3f7d00	Fix case of ExecutionBlocker not handled correctly with INPUT_IS_LIST.	2024-12-15 08:41:35 -05:00
Alexander Piskun	1b3a650f19	(fix): added "model_type" to photomaker node (#6047 )	2024-12-15 00:18:02 -05:00
comfyanonymous	e83063bf24	Support conv3d in PatchEmbed.	2024-12-14 05:46:04 -05:00
Dr.Lt.Data	558b7d8b22	fix: prestartup script is not applied due to `extra_model_paths.yaml` and ensure custom paths are used during startup (#5872 ) * fix: The custom nodes installed in the paths specified in `extra_model_paths.yaml` encounter a bug where the prestartup script is not imported. * Ensure custom paths are used during startup https://github.com/comfyanonymous/ComfyUI/pull/5794	2024-12-13 18:21:32 -05:00
Alexander Piskun	caf2074773	add_model_folder_path: ensure unique paths by removing duplicates (#5998 ) * add_model_folder_path: ensure unique paths by removing duplicates Signed-off-by: bigcat88 <bigcat88@icloud.com> * refactored "add_model_folder_path" and added tests --------- Signed-off-by: bigcat88 <bigcat88@icloud.com>	2024-12-13 18:19:22 -05:00
Terry Jia	bdf393792d	add load 3d node support (#5564 ) * add load 3d node support * remove Preview3D from BE	2024-12-13 18:13:52 -05:00
comfyanonymous	4e14032c02	Make pad_to_patch_size function work on multi dim.	2024-12-13 07:22:05 -05:00
Chenlei Hu	59d58b1158	[Security] Fix potential XSS on /view (#6034 )	2024-12-13 04:56:43 -05:00
Chenlei Hu	563291ee51	Enforce all pyflake lint rules (#6033 ) * Enforce F821 undefined-name * Enforce all pyflake lint rules	2024-12-12 19:29:37 -05:00
Chenlei Hu	6c0377f43e	Enforce F821 undefined-name (#6032 )	2024-12-12 19:24:41 -05:00
Chenlei Hu	2cddbf0821	Lint and fix undefined names (1/N) (#6028 )	2024-12-12 18:55:26 -05:00
Chenlei Hu	60749f345d	Lint and fix undefined names (3/N) (#6030 )	2024-12-12 18:49:40 -05:00
Chenlei Hu	d4426dce7c	Lint and fix undefined names (2/N) (#6029 )	2024-12-12 18:48:21 -05:00
Chenlei Hu	d9d7f3c619	Lint all unused variables (#5989 ) * Enable F841 * Autofix * Remove all unused variable assignment	2024-12-12 17:59:16 -05:00
comfyanonymous	fd5dfb812c	Set initial load devices for te and model to mps device on mac.	2024-12-12 06:00:31 -05:00
Chenlei Hu	3dfdddcc91	Update README (Add new keybinding entries) (#6020 )	2024-12-11 15:55:38 -08:00
Hayden	5747bc6457	Optimize model library (#5841 ) * Move model manager routes * Add experiment model manager api * Fix cache causing returns to be empty * Fix unable to compare sub-dir caches * Skip non-existent folders * Add model preview * Revert 'Move model manager routes' * move model_filemanager.py to app/ * Update model_manager.py 3.8 compatibility ---------	2024-12-11 18:12:04 -05:00
yoinked	5bea1d2ec9	Add MaHiRo (improved/alternate CFG) (#5975 ) * Add MaHiRo (improved CFG) long explanation of what it is is [here](https://huggingface.co/spaces/yoinked/blue-arxiv) (2024-1208.1) note: if the node name has encoding issues (utf 8/whatever), id suggest to replace the face at the end with `(>w<)` * add it to nodes.py, add description, and make it a post_cfg function * fix * revert the sampler_cfg_function thing * switch cfg to args["denoised"]	2024-12-11 16:51:51 -05:00
Yoland Yan	5def9fbc83	Update CI workflow to remove Windows testing configuration (#6007 ) - Commented out Windows OS from the CI matrix in test-ci.yml. - Removed the test-win-nightly job to streamline testing on macOS and Linux only. - Adjusted the matrix strategy to focus on Python versions and CUDA compatibility without Windows support.	2024-12-11 16:48:41 -05:00
comfyanonymous	7a7efe8424	Support loading some checkpoint files with nested dicts.	2024-12-11 08:04:54 -05:00
comfyanonymous	44db978531	Fix a few things in text enc code for models with no eos token.	2024-12-10 23:07:26 -05:00
comfyanonymous	1c8d11e48a	Support different types of tokenizers. Support tokenizers without an eos token. Pass full sentences to tokenizer for more efficient tokenizing.	2024-12-10 15:03:39 -05:00
Chenlei Hu	a220d11e6b	Replace pylint with ruff (#5987 )	2024-12-09 22:04:23 -05:00
catboxanon	23827ca312	Add `cond_scale` to `sampler_post_cfg_function` (#5985 )	2024-12-09 20:13:18 -05:00
Chenlei Hu	0fd4e6c778	Lint unused import (#5973 ) * Lint unused import * nit * Remove unused imports * revert fix_torch import * nit	2024-12-09 15:24:39 -05:00
comfyanonymous	e2fafe0686	Make CLIP set last layer node work with t5 models.	2024-12-09 03:57:14 -05:00
comfyanonymous	6579632201	Remove unused imports and variables.	2024-12-08 08:08:12 -05:00
comfyanonymous	ac2f0523ca	Set env vars to disable telemetry in libs used by some custom nodes.	2024-12-07 14:51:45 -05:00
Haoming	fbf68c4e52	clamp input (#5928 )	2024-12-07 14:00:31 -05:00
Chenlei Hu	93477f8efe	Add code owners (#5873 ) * Add code owners * Update owners * nit * Inline owners * Remove team links * Add Kosinkadink	2024-12-06 22:00:54 -05:00
comfyanonymous	8af9a91e0c	A few improvements to #5937 .	2024-12-06 05:49:15 -05:00
Michael Kupchick	005d2d3a13	ltxv: add noise to guidance image to ensure generated motion. (#5937 )	2024-12-06 05:46:08 -05:00
comfyanonymous	1e21f4c14e	Make timestep ranges more usable on rectified flow models. This breaks some old workflows but should make the nodes actually useful.	2024-12-05 16:40:58 -05:00
comfyanonymous	9a616b81c1	Add rescaling_scale from STG to SkipLayerGuidanceDiT.	2024-12-04 19:25:50 -05:00
comfyanonymous	3bed56bb13	Add another ROCm tip.	2024-12-04 15:14:12 -05:00
filtered	4e402b11c6	Reland union type (#5900 ) * Reapply "Add union link connection type support (#5806)" (#5889) This reverts commit `bf9a90a145`. * Fix union type breaks existing type workarounds * Add non-string test * Add tests for hacks and non-string types * Support python versions lower than 3.11	2024-12-04 15:12:10 -05:00
Chenlei Hu	48272448ad	[Developer Experience] Add node typing (#5676 ) * [Developer Experience] Add node typing * Shim StrEnum * nit * nit * nit	2024-12-04 15:01:00 -05:00
Jedrzej Kosinski	f7695b5f9e	Add Create Hook Keyframes Interp. node to simplify creating groups of keyframes without external nodes (#5896 )	2024-12-03 21:03:09 -05:00
comfyanonymous	452179fe4f	Make ModelPatcher class clone function work with inheritance.	2024-12-03 13:57:57 -05:00
Chenlei Hu	bf9a90a145	Revert "Add union link connection type support (#5806 )" (#5889 ) This reverts commit `8d4e06324f`.	2024-12-03 13:06:34 -05:00
comfyanonymous	c1b92b719d	Some optimizations to euler a.	2024-12-03 06:11:52 -05:00
Alexander Piskun	cdc3b97dd5	resolve relative paths in YAML configuration for extra model paths (#5847 ) Signed-off-by: bigcat88 <bigcat88@icloud.com>	2024-12-03 06:02:01 -05:00
Chenlei Hu	8d4e06324f	Add union link connection type support (#5806 ) * Add union type support * Move code * nit	2024-12-03 05:46:00 -05:00
comfyanonymous	57e8bf6a9f	Fix case where a memory leak could cause crash. Now the only symptom of code messing up and keeping references to a model object when it should not will be endless prints in the log instead of the next workflow crashing ComfyUI.	2024-12-02 19:49:49 -05:00
Jedrzej Kosinski	0ee322ec5f	ModelPatcher Overhaul and Hook Support (#5583 ) * Added hook_patches to ModelPatcher for weights (model) * Initial changes to calc_cond_batch to eventually support hook_patches * Added current_patcher property to BaseModel * Consolidated add_hook_patches_as_diffs into add_hook_patches func, fixed fp8 support for model-as-lora feature * Added call to initialize_timesteps on hooks in process_conds func, and added call prepare current keyframe on hooks in calc_cond_batch * Added default_conds support in calc_cond_batch func * Added initial set of hook-related nodes, added code to register hooks for loras/model-as-loras, small renaming/refactoring * Made CLIP work with hook patches * Added initial hook scheduling nodes, small renaming/refactoring * Fixed MaxSpeed and default conds implementations * Added support for adding weight hooks that aren't registered on the ModelPatcher at sampling time * Made Set Clip Hooks node work with hooks from Create Hook nodes, began work on better Create Hook Model As LoRA node * Initial work on adding 'model_as_lora' lora type to calculate_weight * Continued work on simpler Create Hook Model As LoRA node, started to implement ModelPatcher callbacks, attachments, and additional_models * Fix incorrect ref to create_hook_patches_clone after moving function * Added injections support to ModelPatcher + necessary bookkeeping, added additional_models support in ModelPatcher, conds, and hooks * Added wrappers to ModelPatcher to facilitate standardized function wrapping * Started scaffolding for other hook types, refactored get_hooks_from_cond to organize hooks by type * Fix skip_until_exit logic bug breaking injection after first run of model * Updated clone_has_same_weights function to account for new ModelPatcher properties, improved AutoPatcherEjector usage in partially_load * Added WrapperExecutor for non-classbound functions, added calc_cond_batch wrappers * Refactored callbacks+wrappers to allow storing lists by id * Added forward_timestep_embed_patch type, added helper functions on ModelPatcher for emb_patch and forward_timestep_embed_patch, added helper functions for removing callbacks/wrappers/additional_models by key, added custom_should_register prop to hooks * Added get_attachment func on ModelPatcher * Implement basic MemoryCounter system for determing with cached weights due to hooks should be offloaded in hooks_backup * Modified ControlNet/T2IAdapter get_control function to receive transformer_options as additional parameter, made the model_options stored in extra_args in inner_sample be a clone of the original model_options instead of same ref * Added create_model_options_clone func, modified type annotations to use __future__ so that I can use the better type annotations * Refactored WrapperExecutor code to remove need for WrapperClassExecutor (now gone), added sampler.sample wrapper (pending review, will likely keep but will see what hacks this could currently let me get rid of in ACN/ADE) * Added Combine versions of Cond/Cond Pair Set Props nodes, renamed Pair Cond to Cond Pair, fixed default conds never applying hooks (due to hooks key typo) * Renamed Create Hook Model As LoRA nodes to make the test node the main one (more changes pending) * Added uuid to conds in CFGGuider and uuids to transformer_options to allow uniquely identifying conds in batches during sampling * Fixed models not being unloaded properly due to current_patcher reference; the current ComfyUI model cleanup code requires that nothing else has a reference to the ModelPatcher instances * Fixed default conds not respecting hook keyframes, made keyframes not reset cache when strength is unchanged, fixed Cond Set Default Combine throwing error, fixed model-as-lora throwing error during calculate_weight after a recent ComfyUI update, small refactoring/scaffolding changes for hooks * Changed CreateHookModelAsLoraTest to be the new CreateHookModelAsLora, rename old ones as 'direct' and will be removed prior to merge * Added initial support within CLIP Text Encode (Prompt) node for scheduling weight hook CLIP strength via clip_start_percent/clip_end_percent on conds, added schedule_clip toggle to Set CLIP Hooks node, small cleanup/fixes * Fix range check in get_hooks_for_clip_schedule so that proper keyframes get assigned to corresponding ranges * Optimized CLIP hook scheduling to treat same strength as same keyframe * Less fragile memory management. * Make encode_from_tokens_scheduled call cleaner, rollback change in model_patcher.py for hook_patches_backup dict * Fix issue. * Remove useless function. * Prevent and detect some types of memory leaks. * Run garbage collector when switching workflow if needed. * Moved WrappersMP/CallbacksMP/WrapperExecutor to patcher_extension.py * Refactored code to store wrappers and callbacks in transformer_options, added apply_model and diffusion_model.forward wrappers * Fix issue. * Refactored hooks in calc_cond_batch to be part of get_area_and_mult tuple, added extra_hooks to ControlBase to allow custom controlnets w/ hooks, small cleanup and renaming * Fixed inconsistency of results when schedule_clip is set to False, small renaming/typo fixing, added initial support for ControlNet extra_hooks to work in tandem with normal cond hooks, initial work on calc_cond_batch merging all subdicts in returned transformer_options * Modified callbacks and wrappers so that unregistered types can be used, allowing custom_nodes to have their own unique callbacks/wrappers if desired * Updated different hook types to reflect actual progress of implementation, initial scaffolding for working WrapperHook functionality * Fixed existing weight hook_patches (pre-registered) not working properly for CLIP * Removed Register/Direct hook nodes since they were present only for testing, removed diff-related weight hook calculation as improved_memory removes unload_model_clones and using sample time registered hooks is less hacky * Added clip scheduling support to all other native ComfyUI text encoding nodes (sdxl, flux, hunyuan, sd3) * Made WrapperHook functional, added another wrapper/callback getter, added ON_DETACH callback to ModelPatcher * Made opt_hooks append by default instead of replace, renamed comfy.hooks set functions to be more accurate * Added apply_to_conds to Set CLIP Hooks, modified relevant code to allow text encoding to automatically apply hooks to output conds when apply_to_conds is set to True * Fix cached_hook_patches not respecting target_device/memory_counter results * Fixed issue with setting weights from hooks instead of copying them, added additional memory_counter check when caching hook patches * Remove unnecessary torch.no_grad calls for hook patches * Increased MemoryCounter minimum memory to leave free by 2 until a better way to get inference memory estimate of currently loaded models exists For encode_from_tokens_scheduled, allow start_percent and end_percent in add_dict to limit which scheduled conds get encoded for optimization purposes * Removed a .to call on results of calculate_weight in patch_hook_weight_to_device that was screwing up the intermediate results for fp8 prior to being passed into stochastic_rounding call * Made encode_from_tokens_scheduled work when no hooks are set on patcher * Small cleanup of comments * Turn off hook patch caching when only 1 hook present in sampling, replace some current_hook = None with calls to self.patch_hooks(None) instead to avoid a potential edge case * On Cond/Cond Pair nodes, removed opt_ prefix from optional inputs * Allow both FLOATS and FLOAT for floats_strength input * Revert change, does not work * Made patch_hook_weight_to_device respect set_func and convert_func * Make discard_model_sampling True by default * Add changes manually from 'master' so merge conflict resolution goes more smoothly * Cleaned up text encode nodes with just a single clip.encode_from_tokens_scheduled call * Make sure encode_from_tokens_scheduled will respect use_clip_schedule on clip * Made nodes in nodes_hooks be marked as experimental (beta) * Add get_nested_additional_models for cases where additional_models could have their own additional_models, and add robustness for circular additional_models references * Made finalize_default_conds area math consistent with other sampling code * Changed 'opt_hooks' input of Cond/Cond Pair Set Default Combine nodes to 'hooks' * Remove a couple old TODO's and a no longer necessary workaround	2024-12-02 14:51:02 -05:00
comfyanonymous	79d5ceae6e	Improved memory management. (#5450 ) * Less fragile memory management. * Fix issue. * Remove useless function. * Prevent and detect some types of memory leaks. * Run garbage collector when switching workflow if needed. * Fix issue.	2024-12-02 14:39:34 -05:00
comfyanonymous	2d5b3e0078	Remove useless code.	2024-12-02 06:49:55 -05:00