Tsingularity
diff --git a/‎.gitignore‎
Lines changed: 10 additions & 0 deletions b/‎.gitignore‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 171 additions & 6 deletions b/‎README.md‎
Lines changed: 171 additions & 6 deletions
diff --git a/‎assets/cartoon.png‎
19.3 KB b/‎assets/cartoon.png‎
19.3 KB
diff --git a/‎assets/guitar_cat.jpg‎
121 KB b/‎assets/guitar_cat.jpg‎
121 KB
diff --git a/‎assets/painting_cat.jpg‎
103 KB b/‎assets/painting_cat.jpg‎
103 KB
diff --git a/‎demo.ipynb‎
Lines changed: 12 additions & 54 deletions b/‎demo.ipynb‎
Lines changed: 12 additions & 54 deletions
diff --git a/‎edit_propagation.ipynb‎
Lines changed: 170 additions & 0 deletions b/‎edit_propagation.ipynb‎
Lines changed: 170 additions & 0 deletions
@@ -161,3 +161,13 @@ cython_debug/
 
 */.DS_Store
 .DS_Store
+
+guided-diffusion/
+davis_results_sd/
+davis_results_adm/
+superpoint-1k/
+hpatches_results/
+superpoint-1k.zip
+SPair-71k.tar.gz
+SPair-71k/
+./guided-diffusion/models/256x256_diffusion_uncond.pt
@@ -1,7 +1,7 @@
 # Diffusion Features (DIFT)
-This repository contains code for paper "Emergent Correspondence from Image Diffusion". 
+This repository contains code for our NeurIPS 2023 paper "Emergent Correspondence from Image Diffusion".
 
-### [Project Page](https://diffusionfeatures.github.io/) | [Paper](https://arxiv.org/abs/2306.03881) | [Colab Demo](https://colab.research.google.com/drive/1tUTJ3UJxbqnfvUMvYH5lxcqt0UdUjdq6?usp=sharing)
+### [Project Page](https://diffusionfeatures.github.io/) | [Paper](https://arxiv.org/abs/2306.03881) | [Colab Demo](https://colab.research.google.com/drive/1km6MGafhAvbPOouD3oo64aUXgLlWM6L1?usp=sharing)
 
 ![video](./assets/teaser.gif)
 
@@ -11,7 +11,7 @@ If you have a Linux machine, you could either set up the python environment usin
 conda env create -f environment.yml
 conda activate dift
 ```
-or create a new conda environment and install the packages manually using the 
+or create a new conda environment and install the packages manually using the
 shell commands in [setup_env.sh](setup_env.sh).
 
 ## Interactive Demo: Give it a Try!
@@ -24,7 +24,7 @@ We provide an interative jupyter notebook [demo.ipynb](demo.ipynb) to demonstrat
     </tr>
 </table>
 
-If you don't have a local GPU, you can also use the provided [Colab Demo](https://colab.research.google.com/drive/1tUTJ3UJxbqnfvUMvYH5lxcqt0UdUjdq6?usp=sharing).
+If you don't have a local GPU, you can also use the provided [Colab Demo](https://colab.research.google.com/drive/1km6MGafhAvbPOouD3oo64aUXgLlWM6L1?usp=sharing).
 
 ## Extract DIFT for a given image
 You could use the following [command](extract_dift.sh) to extract DIFT from a given image, and save it as a torch tensor. These arguments are set to the same as in the semantic correspondence tasks by default.
@@ -47,10 +47,175 @@ Here're the explanation for each argument:
 - `prompt`: the prompt used in the diffusion model.
 - `ensemble_size`: the number of repeated images in each batch used to get features. `ensemble_size=8` by default. You can reduce this value if encountering memory issue.
 
-The output DIFT tensor spatial size is determined by both `img_size` and `up_ft_index`. If `up_ft_index=0`, the output size would be 1/32 of `img_size`; if `up_ft_index=1`, it would be 1/16; if `up_ft_index=2 or 3`, it would be 1/8. 
+The output DIFT tensor spatial size is determined by both `img_size` and `up_ft_index`. If `up_ft_index=0`, the output size would be 1/32 of `img_size`; if `up_ft_index=1`, it would be 1/16; if `up_ft_index=2 or 3`, it would be 1/8.
 
 ## Application: Edit Propagation
 Using DIFT, we can propagate edits in one image to others that share semantic correspondences, even cross categories and domains:
 <img src="./assets/edit_cat.gif" alt="edit cat" style="width:90%;">
+More implementation details are in this notebook [edit_propagation.ipynb](edit_propagation.ipynb).
 
-Check out more videos and visualizations in the [project page](https://diffusionfeatures.github.io/). 
+## Get Benchmark Evaluation Results
+First, run the following scripts to enable the usage of DIFT_adm:
+```
+git clone [email protected]:openai/guided-diffusion.git
+cd guided-diffusion && mkdir models && cd models
+wget https://openaipublic.blob.core.windows.net/diffusion/jul-2021/256x256_diffusion_uncond.pt
+```
+
+### SPair-71k
+
+First, download SPair-71k data:
+```
+wget https://cvlab.postech.ac.kr/research/SPair-71k/data/SPair-71k.tar.gz
+tar -xzvf SPair-71k.tar.gz
+```
+Run the following script to get PCK (both per point and per img) of DIFT_sd on SPair-71k:
+```
+python eval_spair.py \
+    --dataset_path ./SPair-71k \
+    --save_path ./spair_ft \ # a path to save features
+    --dift_model sd \
+    --img_size 768 768 \
+    --t 261 \
+    --up_ft_index 2 \
+    --ensemble_size 8
+```
+Run the following script to get PCK (both per point and per img) of DIFT_adm on SPair-71k:
+```
+python eval_spair.py \
+    --dataset_path ./SPair-71k \
+    --save_path ./spair_ft \ # a path to save features
+    --dift_model adm \
+    --img_size 512 512 \
+    --t 101 \
+    --up_ft_index 4 \
+    --ensemble_size 8
+```
+
+### HPatches
+
+First, prepare HPatches data:
+```
+cd $HOME
+git clone [email protected]:mihaidusmanu/d2-net.git && cd d2-net/hpatches_sequences/
+chmod u+x download.sh
+./download.sh
+```
+
+Then, download the 1k superpoint keypoints:
+```
+wget "https://www.dropbox.com/scl/fi/1mxy3oycnz7m2acd92u2x/superpoint-1k.zip?rlkey=fic30gr2tlth3cmsyyywcg385&dl=1" -O superpoint-1k.zip
+unzip superpoint-1k.zip
+rm superpoint-1k.zip
+```
+
+Run the following script to get hompography estimation accuracy of DIFT_sd on HPatches:
+```
+python eval_hpatches.py \
+    --hpatches_path ../d2-net/hpatches_sequences/hpatches-sequences-release \
+    --kpts_path ./superpoint-1k \
+    --save_path ./hpatches_results \
+    --dift_model sd \
+    --img_size 768 768 \
+    --t 0 \
+    --up_ft_index 2 \
+    --ensemble_size 8
+
+python eval_homography.py \
+    --hpatches_path ../d2-net/hpatches_sequences/hpatches-sequences-release \
+    --save_path ./hpatches_results \
+    --hpatches_path
+    --feat dift_sd \
+    --metric cosine \
+    --mode lmeds
+```
+
+Run the following script to get hompography estimation accuracy of DIFT_adm on HPatches:
+```
+python eval_hpatches.py \
+    --hpatches_path ../d2-net/hpatches_sequences/hpatches-sequences-release \
+    --kpts_path ./superpoint-1k \
+    --save_path ./hpatches_results \
+    --dift_model adm \
+    --img_size 768 768 \
+    --t 41 \
+    --up_ft_index 11 \
+    --ensemble_size 4
+
+python eval_homography.py \
+    --hpatches_path ../d2-net/hpatches_sequences/hpatches-sequences-release \
+    --save_path ./hpatches_results \
+    --hpatches_path
+    --feat dift_adm \
+    --metric l2 \
+    --mode ransac
+```
+
+### DAVIS
+
+We follow the evaluation protocal as in DINO's [implementation](https://github.com/facebookresearch/dino#evaluation-davis-2017-video-object-segmentation).
+
+First, prepare DAVIS 2017 data and evaluation tools:
+```
+cd $HOME
+git clone https://github.com/davisvideochallenge/davis-2017 && cd davis-2017
+./data/get_davis.sh
+cd $HOME
+git clone https://github.com/davisvideochallenge/davis2017-evaluation
+```
+
+Then, get segmentation results using DIFT_sd:
+```
+python eval_davis.py \
+    --dift_model sd \
+    --t 51 \
+    --up_ft_index 2 \
+    --temperature 0.2 \
+    --topk 15 \
+    --n_last_frames 28 \
+    --ensemble_size 8 \
+    --size_mask_neighborhood 15 \
+    --data_path $HOME/davis-2017/DAVIS/ \
+    --output_dir ./davis_results_sd/
+```
+
+and results using DIFT_adm:
+```
+python eval_davis.py \
+    --dift_model adm \
+    --t 51 \
+    --up_ft_index 7 \
+    --temperature 0.1 \
+    --topk 10 \
+    --n_last_frames 28 \
+    --ensemble_size 4 \
+    --size_mask_neighborhood 15 \
+    --data_path $HOME/davis-2017/DAVIS/ \
+    --output_dir ./davis_results_adm/
+```
+
+Finally, evaluate the results:
+```
+python $HOME/davis2017-evaluation/evaluation_method.py \
+    --task semi-supervised \
+    --results_path ./davis_results_sd/ \
+    --davis_path $HOME/davis-2017/DAVIS/
+
+python $HOME/davis2017-evaluation/evaluation_method.py \
+    --task semi-supervised \
+    --results_path ./davis_results_adm/ \
+    --davis_path $HOME/davis-2017/DAVIS/
+```
+
+# Misc.
+If you find our code or paper useful to your research work, please consider citing our work using the following bibtex:
+```
+@inproceedings{
+    tang2023emergent,
+    title={Emergent Correspondence from Image Diffusion},
+    author={Luming Tang and Menglin Jia and Qianqian Wang and Cheng Perng Phoo and Bharath Hariharan},
+    booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
+    year={2023},
+    url={https://openreview.net/forum?id=ypOiXjdfnU}
+}
+```
@@ -0,0 +1,170 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3306ccce-4b17-41a9-831d-add6cccddc0e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "import gc\n",
+    "import imageio\n",
+    "from PIL import Image\n",
+    "from torchvision.transforms import PILToTensor\n",
+    "import os\n",
+    "import json\n",
+    "from PIL import Image, ImageDraw\n",
+    "import torch.nn.functional as F\n",
+    "import cv2\n",
+    "import glob\n",
+    "from torchvision.transforms import PILToTensor\n",
+    "from src.models.dift_sd import SDFeaturizer4Eval"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "081cd585-9d9d-4ffe-8c9b-6c6360d2e4ad",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def gen_grid(h, w, device, normalize=False, homogeneous=False):\n",
+    "    if normalize:\n",
+    "        lin_y = torch.linspace(-1., 1., steps=h, device=device)\n",
+    "        lin_x = torch.linspace(-1., 1., steps=w, device=device)\n",
+    "    else:\n",
+    "        lin_y = torch.arange(0, h, device=device)\n",
+    "        lin_x = torch.arange(0, w, device=device)\n",
+    "    grid_y, grid_x = torch.meshgrid((lin_y, lin_x))\n",
+    "    grid = torch.stack((grid_x, grid_y), -1)\n",
+    "    if homogeneous:\n",
+    "        grid = torch.cat([grid, torch.ones_like(grid[..., :1])], dim=-1)\n",
+    "    return grid  # [h, w, 2 or 3]\n",
+    "\n",
+    "\n",
+    "def normalize_coords(coords, h, w, no_shift=False):\n",
+    "    assert coords.shape[-1] == 2\n",
+    "    if no_shift:\n",
+    "        return coords / torch.tensor([w-1., h-1.], device=coords.device) * 2\n",
+    "    else:\n",
+    "        return coords / torch.tensor([w-1., h-1.], device=coords.device) * 2 - 1."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2a13b459-4698-4a9c-803f-d7ba8adb6962",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cat = 'cat'\n",
+    "dift = SDFeaturizer4Eval(cat_list=['cat'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0606e9dd-9e51-49ec-bf37-1f2bc9f78a84",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "src_img = Image.open('./assets/guitar_cat.jpg').convert('RGB')\n",
+    "trg_img = Image.open('./assets/painting_cat.jpg').convert('RGB')\n",
+    "sticker = imageio.imread('./assets/cartoon.png')\n",
+    "sticker_color, sticker_mask = sticker[..., :3], sticker[..., 3]\n",
+    "\n",
+    "assert np.array(src_img).shape[:2] == sticker.shape[:2]\n",
+    "h_src, w_src = sticker.shape[:2]\n",
+    "h_trg, w_trg = np.array(trg_img).shape[:2]\n",
+    "\n",
+    "sd_feat_src = dift.forward(src_img, cat)\n",
+    "sd_feat_trg = dift.forward(trg_img, cat)\n",
+    "\n",
+    "sd_feat_src = F.normalize(sd_feat_src.squeeze(), p=2, dim=0)\n",
+    "sd_feat_trg = F.normalize(sd_feat_trg.squeeze(), p=2, dim=0)\n",
+    "feat_dim = sd_feat_src.shape[0]\n",
+    "\n",
+    "grid_src = gen_grid(h_src, w_src, device='cuda')\n",
+    "grid_trg = gen_grid(h_trg, w_trg, device='cuda')\n",
+    "\n",
+    "coord_src = grid_src[sticker_mask > 0]\n",
+    "coord_src = coord_src[torch.randperm(len(coord_src))][:1000]\n",
+    "coord_src_normed = normalize_coords(coord_src, h_src, w_src)\n",
+    "grid_trg_normed = normalize_coords(grid_trg, h_trg, w_trg)\n",
+    "\n",
+    "feat_src = F.grid_sample(sd_feat_src[None], coord_src_normed[None, None], align_corners=True).squeeze().T\n",
+    "feat_trg = F.grid_sample(sd_feat_trg[None], grid_trg_normed[None], align_corners=True).squeeze()\n",
+    "feat_trg_flattened = feat_trg.permute(1, 2, 0).reshape(-1, feat_dim)\n",
+    "\n",
+    "distances = torch.cdist(feat_src, feat_trg_flattened)\n",
+    "_, indices = torch.min(distances, dim=1)\n",
+    "\n",
+    "src_pts = coord_src.reshape(-1, 2).cpu().numpy()\n",
+    "trg_pts = grid_trg.reshape(-1, 2)[indices].cpu().numpy()\n",
+    "\n",
+    "M, mask = cv2.findHomography(src_pts, trg_pts, cv2.RANSAC, 5.0)\n",
+    "sticker_out = cv2.warpPerspective(sticker, M, (w_trg, h_trg))\n",
+    "\n",
+    "sticker_out_alpha = sticker_out[..., 3:] / 255\n",
+    "sticker_alpha = sticker[..., 3:] / 255\n",
+    "\n",
+    "trg_img_with_sticker = sticker_out_alpha * sticker_out[..., :3] + (1 - sticker_out_alpha) * trg_img\n",
+    "src_img_with_sticker = sticker_alpha * sticker[..., :3] + (1 - sticker_alpha) * src_img"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "88723600-c18f-4eb1-aec7-feb4112e2610",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig, axs = plt.subplots(2, 2, figsize=(10, 10))\n",
+    "\n",
+    "axs[0, 0].imshow(src_img)\n",
+    "axs[0, 0].set_title(\"Source Image\")\n",
+    "axs[0, 0].axis('off')\n",
+    "\n",
+    "axs[0, 1].imshow(src_img_with_sticker.astype(np.uint8))\n",
+    "axs[0, 1].set_title(\"Source Image with Edits\")\n",
+    "axs[0, 1].axis('off')\n",
+    "\n",
+    "axs[1, 0].imshow(trg_img)\n",
+    "axs[1, 0].set_title(\"Target Image\")\n",
+    "axs[1, 0].axis('off')\n",
+    "\n",
+    "axs[1, 1].imshow(trg_img_with_sticker.astype(np.uint8))\n",
+    "axs[1, 1].set_title(\"Target Image with Propagated Edits\")\n",
+    "axs[1, 1].axis('off')\n",
+    "\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}