Bump version, add changelog

semjon00 · semjon00 · commit b29718911181 · 2023-07-31T15:36:44.000+03:00
Also updated some parts of the README. Other parts still need updating.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,4 +1,9 @@
 ## Changelog
+### 0.4.3 video processing tab
+ * Added an option to process videos directly from a video file. This leads to better results than batch-processing individual frames of a video. Allows generating depthmap videos, that can be used in further generations as custom depthmap videos.
+ * UI improvements.
+ * Extra stereoimage generation modes - enable in extension settings if you want to use them.
+ * New stereoimage generation parameter - offset exponent. Setting it to 1 may produce more realistic outputs.
 ### 0.4.2
  * Added UI options for 2 additional rembg models.
  * Heatmap generation UI option is hidden - if you want to use it, please activate it in the extension settings.
diff --git a/README.md b/README.md
@@ -1,13 +1,13 @@
 ﻿# High Resolution Depth Maps for Stable Diffusion WebUI
-This script is an addon for [AUTOMATIC1111's Stable Diffusion WebUI](https://github.com/AUTOMATIC1111/stable-diffusion-webui) that creates `depth maps`, and now also `3D stereo image pairs` as side-by-side or anaglyph from a single image. The result can be viewed on 3D or holographic devices like VR headsets or [Looking Glass](https://lookingglassfactory.com/) displays, used in Render- or Game- Engines on a plane with a displacement modifier, and maybe even 3D printed.
+This program is an addon for [AUTOMATIC1111's Stable Diffusion WebUI](https://github.com/AUTOMATIC1111/stable-diffusion-webui) that creates depth maps. Using either generated or custom depth maps, it can also create 3D stereo image pairs (as side-by-side or anaglyph), normalmaps and 3D meshes. The outputs of the script can be viewed directly or used as an asset for a 3D engine. Please see [wiki](https://github.com/thygate/stable-diffusion-webui-depthmap-script/wiki/Viewing-Results) to learn more. The program has integration with [Rembg](https://github.com/danielgatis/rembg). It also supports batch processing, processing of videos, and can also be run in standalone mode, without Stable Diffusion WebUI.
 
-To generate realistic depth maps `from a single image`, this script uses code and models from the [MiDaS](https://github.com/isl-org/MiDaS) and [ZoeDepth](https://github.com/isl-org/ZoeDepth) repositories by Intel ISL, or LeReS from the [AdelaiDepth](https://github.com/aim-uofa/AdelaiDepth) repository by Advanced Intelligent Machines. Multi-resolution merging as implemented by [BoostingMonocularDepth](https://github.com/compphoto/BoostingMonocularDepth) is used to generate high resolution depth maps.
+To generate realistic depth maps from individual images, this script uses code and models from the [MiDaS](https://github.com/isl-org/MiDaS) and [ZoeDepth](https://github.com/isl-org/ZoeDepth) repositories by Intel ISL, or LeReS from the [AdelaiDepth](https://github.com/aim-uofa/AdelaiDepth) repository by Advanced Intelligent Machines. Multi-resolution merging as implemented by [BoostingMonocularDepth](https://github.com/compphoto/BoostingMonocularDepth) is used to generate high resolution depth maps.
 
-3D stereo, and red/cyan anaglyph images are generated using code from the [stereo-image-generation](https://github.com/m5823779/stereo-image-generation) repository. Thanks to [@sina-masoud-ansari](https://github.com/sina-masoud-ansari) for the tip! Discussion [here](https://github.com/thygate/stable-diffusion-webui-depthmap-script/discussions/45). Improved techniques for generating stereo images and balancing distortion between eyes by [@semjon00](https://github.com/semjon00), see [here](https://github.com/thygate/stable-diffusion-webui-depthmap-script/pull/51) and [here](https://github.com/thygate/stable-diffusion-webui-depthmap-script/pull/56).
+Stereoscopic images are created using a custom-written algorithm.
 
-3D Photography using Context-aware Layered Depth Inpainting by Virginia Tech Vision and Learning Lab , or [3D-Photo-Inpainting](https://github.com/vt-vl-lab/3d-photo-inpainting) is used to generate a `3D inpainted mesh` and render `videos` from said mesh.
+3D Photography using Context-aware Layered Depth Inpainting by Virginia Tech Vision and Learning Lab, or [3D-Photo-Inpainting](https://github.com/vt-vl-lab/3d-photo-inpainting) is used to generate a `3D inpainted mesh` and render `videos` from said mesh.
 
-[Rembg](https://github.com/danielgatis/rembg) by [@DanielGatis](https://github.com/danielgatis) support added by [@graemeniedermayer](https://github.com/graemeniedermayer), using [U-2-Net](https://github.com/xuebinqin/U-2-Net) by [@xuebinqin](https://github.com/xuebinqin) to remove backgrounds.
+Rembg uses [U-2-Net](https://github.com/xuebinqin/U-2-Net) and [IS-Net](https://github.com/xuebinqin/DIS).
 
 ## Depthmap Examples
 [![screenshot](examples.png)](https://raw.githubusercontent.com/thygate/stable-diffusion-webui-depthmap-script/main/examples.png)
@@ -20,32 +20,30 @@ video by [@graemeniedermayer](https://github.com/graemeniedermayer), more exampl
 ![](https://user-images.githubusercontent.com/54073010/210012661-ef07986c-2320-4700-bc54-fad3899f0186.png)    
 images generated by [@semjon00](https://github.com/semjon00) from CC0 photos, more examples [here](https://github.com/thygate/stable-diffusion-webui-depthmap-script/pull/56#issuecomment-1367596463).
 
-
 ## Install instructions
-The script is now also available to install from the `Available` subtab under the `Extensions` tab in the WebUI.
+### As extension
+The script can be installed directly from WebUI. Please navigate to `Extensions` tab, then click `Available`, `Load from` and then install the `Depth Maps` extension. Alternatively, the extension can be installed from URL: `https://github.com/thygate/stable-diffusion-webui-depthmap-script`.
 
 ### Updating
 In the WebUI, in the `Extensions` tab, in the `Installed` subtab, click `Check for Updates` and then `Apply and restart UI`.
 
-### Automatic installation 
-In the WebUI, in the `Extensions` tab, in the `Install from URL` subtab, enter this repository 
-`https://github.com/thygate/stable-diffusion-webui-depthmap-script`
- and click install and restart.   
+### Standalone
+Clone the repository, install the requirements from `requirements.txt`, launch using `main.py`.
 
->Model `weights` will be downloaded automatically on first use and saved to /models/midas, /models/leres and /models/pix2pix
+>Model weights will be downloaded automatically on their first use and saved to /models/midas, /models/leres and /models/pix2pix. Zoedepth models are stored in torch cache folder.
 
 
 ## Usage
-Select the "DepthMap vX.X.X" script from the script selection box in either txt2img or img2img, or go to the Depth tab when using existing images.
+Select the "DepthMap" script from the script selection box in either txt2img or img2img, or go to the Depth tab when using existing images.
 ![screenshot](options.png)
 
-The models can `Compute on` GPU and CPU, use CPU if low on VRAM. 
+The models can `Compute on` GPU and CPU, use CPU if low on VRAM.
 
-There are seven models available from the `Model` dropdown. For the first model, res101, see [AdelaiDepth/LeReS](https://github.com/aim-uofa/AdelaiDepth/tree/main/LeReS) for more info. The others are the midas models: dpt_beit_large_512, dpt_beit_large_384, dpt_large_384, dpt_hybrid_384, midas_v21, and midas_v21_small. See the [MiDaS](https://github.com/isl-org/MiDaS) repository for more info. The newest dpt_beit_large_512 model was trained on a 512x512 dataset but is VERY VRAM hungry.
+There are ten models available from the `Model` dropdown. For the first model, res101, see [AdelaiDepth/LeReS](https://github.com/aim-uofa/AdelaiDepth/tree/main/LeReS) for more info. The others are the midas models: dpt_beit_large_512, dpt_beit_large_384, dpt_large_384, dpt_hybrid_384, midas_v21, and midas_v21_small. See the [MiDaS](https://github.com/isl-org/MiDaS) repository for more info. The newest dpt_beit_large_512 model was trained on a 512x512 dataset but is VERY VRAM hungry. The last three models are [ZoeDepth](https://github.com/isl-org/ZoeDepth) models.
 
 Net size can be set with `net width` and `net height`, or will be the same as the input image when `Match input size` is enabled. There is a trade-off between structural consistency and high-frequency details with respect to net size (see [observations](https://github.com/compphoto/BoostingMonocularDepth#observations)).
 
-`Boost` will enable multi-resolution merging as implemented by [BoostingMonocularDepth](https://github.com/compphoto/BoostingMonocularDepth) and will significantly improve the results. Mitigating the observations mentioned above. Net size is ignored when enabled. Best results with res101.
+`Boost` will enable multi-resolution merging as implemented by [BoostingMonocularDepth](https://github.com/compphoto/BoostingMonocularDepth) and will significantly improve the results, mitigating the observations mentioned above, and the cost of much larger compute time. Best results with res101.
 
 `Clip and renormalize` allows for clipping the depthmap on the `near` and `far` side, the values in between will be renormalized to fit the available range. Set both values equal to get a b&w mask of a single depth plane at that value. This option works on the 16-bit depthmap and allows for 1000 steps to select the clip values.
 
@@ -55,8 +53,6 @@ Regardless of global settings, `Save DepthMap` will always save the depthmap in
 
 To see the generated output in the webui `Show DepthMap` should be enabled. When using Batch img2img this option should also be enabled.
 
-To make the depthmap easier to analyze for human eyes, `Show HeatMap` shows an extra image in the WebUI that has a color gradient applied. It is not saved.
-
 When `Combine into one image` is enabled, the depthmap will be combined with the original image, the orientation can be selected with `Combine axis`. When disabled, the depthmap will be saved as a 16 bit single channel PNG as opposed to a three channel (RGB), 8 bit per channel image when the option is enabled.
 
 When either `Generate Stereo` or `Generate anaglyph` is enabled, a stereo image pair will be generated. `Divergence` sets the amount of 3D effect that is desired. `Balance between eyes` determines where the (inevitable) distortion from filling up gaps will end up, -1 Left, +1 Right, and 0 balanced.  
@@ -78,17 +74,19 @@ If you often get out of memory errors when computing a depthmap on GPU while usi
 ## FAQ
 
  * `Can I use this on existing images ?`
-    - Yes, you can now use the Depth tab to easily process existing images.
-    - Yes, in img2img, set denoising strength to 0. This will effectively skip stable diffusion and use the input image. You will still have to set the correct size, and need to select `Crop and resize` instead of `Just resize` when the input image resolution does not match the set size perfectly.
- * `Can I run this on google colab ?`
+    - Yes, you can use the Depth tab to easily process existing images.
+    - Another way of doing this would be to use img2img with denoising strength to 0. This will effectively skip stable diffusion and use the input image. You will still have to set the correct size, and need to select `Crop and resize` instead of `Just resize` when the input image resolution does not match the set size perfectly.
+ * `Can I run this on Google Colab ?`
     - You can run the MiDaS network on their colab linked here https://pytorch.org/hub/intelisl_midas_v2/
     - You can run BoostingMonocularDepth on their colab linked here : https://colab.research.google.com/github/compphoto/BoostingMonocularDepth/blob/main/Boostmonoculardepth.ipynb
-
-## Forks and Related
-
-* Several scripts by [@Extraltodeus](https://github.com/Extraltodeus) using depth maps : https://github.com/Extraltodeus?tab=repositories
-
-### More updates soon .. Feel free to comment and share in the discussions. 
+    - Running this program on Colab is not officially supported, but it may work. Please look for more suitable ways of running this. If you still decide to try, standalone installation may be easier to manage.
+ * `What other depth-related projects could I check out?`
+    - Several [scripts](https://github.com/Extraltodeus?tab=repositories) by [@Extraltodeus](https://github.com/Extraltodeus) using depth maps.
+    - Geo11 and [Depth3D](https://github.com/BlueSkyDefender/Depth3D) for playing existing games in 3D.
+ * `How can I know what changed in the new version of the script?`
+    - You can see the git history log or refer to the `CHANGELOG.md` file.
+
+### Feel free to comment and share in the discussions! 
 
 ## Acknowledgements
 
diff --git a/main.py b/main.py
@@ -1,6 +1,4 @@
 # This launches DepthMap without the AUTOMATIC1111/stable-diffusion-webui
-# If DepthMap is installed as an extension,
-# you may want to change the working directory to the stable-diffusion-webui root.
 
 import argparse
 import os
@@ -11,7 +9,8 @@
 
 def maybe_chdir():
     """Detects if DepthMap was installed as a stable-diffusion-webui script, but run without current directory set to
-    the stable-diffusion-webui root. Changes current directory if needed, to aviod clutter."""
+    the stable-diffusion-webui root. Changes current directory if needed.
+    This is to avoid re-downloading models and putting results into a wrong folder."""
     try:
         file_path = pathlib.Path(__file__)
         path = file_path.parts
diff --git a/src/common_ui.py b/src/common_ui.py
@@ -222,9 +222,10 @@ def open_folder_action():
 
 
 def depthmap_mode_video(inp):
-    gr.HTML(value="Single video mode allows generating videos from videos. Every frame of the video is processed, "
-                  "please adjust generation settings, so that generation is not too slow. For the best results, "
-                  "Use a zoedepth model, since they provide the highest level of temporal coherency.")
+    gr.HTML(value="Single video mode allows generating videos from videos. Please "
+                  "keep in mind that all the frames of the video need to be processed - therefore it is important to "
+                  "pick settings so that the generation is not too slow. For the best results, "
+                  "use a zoedepth model, since they provide the highest level of coherency between frames.")
     inp += gr.File(elem_id='depthmap_vm_input', label="Video or animated file",
                    file_count="single", interactive=True, type="file")
     inp += gr.Dropdown(elem_id="depthmap_vm_smoothening_mode", label="Smoothening", type="value", choices=['none'])
diff --git a/src/misc.py b/src/misc.py
@@ -24,7 +24,7 @@ def call_git(dir):
 
 REPOSITORY_NAME = "stable-diffusion-webui-depthmap-script"
 SCRIPT_NAME = "DepthMap"
-SCRIPT_VERSION = "v0.4.2"
+SCRIPT_VERSION = "v0.4.3"
 SCRIPT_FULL_NAME = f"{SCRIPT_NAME} {SCRIPT_VERSION} ({get_commit_hash()})"