Skip to content

Commit fa37e39

Browse files
committed
v0.2.0
1 parent 097cb3f commit fa37e39

File tree

3 files changed

+54
-23
lines changed

3 files changed

+54
-23
lines changed

README.md

Lines changed: 53 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,10 @@ I've heard good things about this deep learning stuff, so let's try that. I firs
3434

3535
I had a look at the corresponding [Python example](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/python/label_image.py), [C++ example](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/examples/label_image), and [Android example](https://github.com/tensorflow/examples/tree/master/lite/examples/image_segmentation/android), and based on those, I first cobbled together a [Python demo](https://github.com/floe/deepbacksub/blob/master/deepseg.py). That was running at about 2.5 FPS, which is really excruciatingly slow, so I built a [C++ version](https://github.com/floe/deepbacksub/blob/master/deepseg.cc) which manages 10 FPS without too much hand optimization. Good enough.
3636

37+
I've also tested a TFLite-converted version of the [Body-Pix model](https://blog.tensorflow.org/2019/11/updated-bodypix-2.html), but the results haven't been much different to DeepLab for this use case.
38+
39+
More recently, Google has released a model specifically trained for [person segmentation that's used in Google Meet](https://ai.googleblog.com/2020/10/background-features-in-google-meet.html). This has way better performance than DeepLab, both in terms of speed and of accuracy, so this is now the default. It needs one custom op from the MediaPipe framework, but that was quite easy to integrate. Thanks to @jiangjianping for pointing this out in the [corresponding issue](https://github.com/floe/deepbacksub/issues/28).
40+
3741
## Replace Background
3842

3943
This is basically one line of code with OpenCV: `bg.copyTo(raw,mask);` Told you that's the easy part.
@@ -48,46 +52,73 @@ The dataflow through the whole program is roughly as follows:
4852

4953
- init
5054
- load background.png, convert to YUYV
51-
- load DeepLab v3+ network, initialize TFLite
55+
- initialize TFLite, register custom op
56+
- load Google Meet segmentation model
5257
- setup V4L2 Loopback device (w,h,YUYV)
5358
- loop
5459
- grab raw YUYV image from camera
55-
- extract square ROI in center
56-
- downscale ROI to 257 x 257 (*)
60+
- extract portrait ROI in center
61+
- downscale ROI to 144 x 256 (*)
5762
- convert to RGB (*)
58-
- run DeepLab v3+
59-
- convert result to binary mask for class "person"
63+
- run Google Meet segmentation model
64+
- convert result to binary mask using softmax
6065
- denoise mask using erode/dilate
6166
- upscale mask to raw image size
6267
- copy background over raw image with mask (see above)
6368
- `write()` data to virtual video device
6469

65-
(*) these are required input parameters for DeepLab v3+
70+
(*) these are required input parameters for this model
6671

6772
## Requirements
6873

6974
Tested with the following dependencies:
75+
76+
- Ubuntu 20.04, x86-64
77+
- Linux kernel 5.6 (stock package)
78+
- OpenCV 4.2.0 (stock package)
79+
- V4L2-Loopback 0.12.5 (stock package)
80+
- Tensorflow Lite 2.4.0 (from [repo](https://github.com/tensorflow/tensorflow/tree/v2.4.0/tensorflow/lite))
7081
- Ubuntu 18.04.5, x86-64
71-
- Linux kernel 4.15 (stock package)
72-
- OpenCV 3.2.0 (stock package)
73-
- V4L2-Loopback 0.10.0 (stock package)
74-
- Tensorflow Lite 2.1.0 (from [repo](https://github.com/tensorflow/tensorflow/tree/v2.1.0/tensorflow/lite))
75-
- Ultra-short build guide for Tensorflow Lite C++ library: clone repo above, then...
76-
- run `./tensorflow/lite/tools/make/download_dependencies.sh`
77-
- run `./tensorflow/lite/tools/make/build_lib.sh`
82+
- Linux kernel 4.15 (stock package)
83+
- OpenCV 3.2.0 (stock package)
84+
- V4L2-Loopback 0.10.0 (stock package)
85+
- Tensorflow Lite 2.1.0 (from [repo](https://github.com/tensorflow/tensorflow/tree/v2.1.0/tensorflow/lite))
7886

7987
Tested with the following software:
88+
8089
- Firefox
81-
- 74.0.1 (works)
90+
- 84.0 (works)
8291
- 76.0.1 (works)
92+
- 74.0.1 (works)
8393
- Skype
84-
- 8.58.0.93 (works)
94+
- 8.67.0.96 (works)
8595
- 8.60.0.76 (works)
86-
- guvcview 2.0.5 (works with parameter `-c read`)
87-
- Microsoft Teams 1.3.00.5153 (works)
88-
- Chrome 81.0.4044.138 (works)
89-
- Zoom 5.0.403652.0509 (works - yes, I'm a hypocrite, I tested it with Zoom after all :-)
90-
96+
- 8.58.0.93 (works)
97+
- guvcview
98+
- 2.0.6 (works with parameter `-c read`)
99+
- 2.0.5 (works with parameter `-c read`)
100+
- Microsoft Teams
101+
- 1.3.00.30857 (works)
102+
- 1.3.00.5153 (works)
103+
- Chrome
104+
- 87.0.4280.88 (works)
105+
- 81.0.4044.138 (works)
106+
- Zoom - yes, I'm a hypocrite, I tested it with Zoom after all :-)
107+
- 5.4.54779.1115 (works)
108+
- 5.0.403652.0509 (works)
109+
110+
## Building
111+
112+
Install dependencies (`sudo apt install libopencv-dev build-essential v4l2loopback-dkms`).
113+
114+
Run `make` to build everything (should also clone and build Tensorflow Lite).
115+
116+
If the first part doesn't work:
117+
- Clone https://github.com/tensorflow/tensorflow/ repo into tensorflow/ folder
118+
- Checkout tag v2.4.0
119+
- run ./tensorflow/lite/tools/make/download_dependencies.sh
120+
- run ./tensorflow/lite/tools/make/build_lib.sh
121+
91122
## Usage
92123

93124
First, load the v4l2loopback module (extra settings needed to make Chrome work):
@@ -106,13 +137,13 @@ As usual: pull requests welcome.
106137
- Resolution is currently hardcoded to 640x480 (lowest common denominator).
107138
- Only works with Linux, because that's what I use.
108139
- Needs a webcam that can produce raw YUYV data (but extending to the common YUV420 format should be trivial)
109-
- CPU hog: maxes out two cores on my 2.7 GHz i5 machine for just VGA @ 10 FPS.
110-
- Uses stock Deeplab v3+ network. Maybe re-training with only "person" and "background" classes could improve performance?
111140

112141
## Fixed
113142

114143
- Should probably do a erosion (+ dilation?) operation on the mask.
115144
- Background image size needs to match camera resolution (see issue #1).
145+
- CPU hog: maxes out two cores on my 2.7 GHz i5 machine for just VGA @ 10 FPS. Fixed via Google Meet segmentation model.
146+
- Uses stock Deeplab v3+ network. Maybe re-training with only "person" and "background" classes could improve performance? Fixed via Google Meet segmentation model.
116147

117148
## Other links
118149

deepseg.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ void *grab_thread(void *arg) {
112112

113113
int main(int argc, char* argv[]) {
114114

115-
printf("deepseg v0.1.0\n");
115+
printf("deepseg v0.2.0\n");
116116
printf("(c) 2020 by [email protected]\n");
117117
printf("https://github.com/floe/deepseg\n");
118118

File renamed without changes.

0 commit comments

Comments
 (0)