Digital Image Processing Program Framework

[TOC]

This project comes from class "digital image processing", and realizes the basic program framework construction. The curve data point extraction function was added later, and the file format was expanded, and many common additional functions were added. Function list:

basic function
- Basic image processing program framework
- Supports image file formats, including*.png *.jpg *.bmp *.raw *.data
- Image Fourier transform and filtering
- Representation of Image Fourier Descriptor
- edge detection
- Add color high-pass/low-pass filtering
Extract Curve Data Points
- box cropping
- Separate by color
- order selection
- fit or interpolate
UI design improvements
- Extended file format, support pdf format, automatic background pdf conversion image
- Optimize file cache, resolve possible conflicts, cache in Roaming/digitalImagedirectory
- The basic functions are compatible with the pdf format (the image reconstruction of Fourier transform and Fourier descriptor is not currently supported)
- Optimize the interface design, add switching, closing, input and output moving buttons
extensions
- Image to PDF function
- Image/PDF to GIF function
- Remove watermark function
- Composite large image function
- split horizon function

program description

Directory Structure

main.py: The main entry of the program, which can be started by python main.pystarting the PyQt5 interface
main.ui: interface design using Qt Designer
Ui_main.py: The interface code generated by compiling the .ui file is called by the main program
util.py: Various tool functions, including histogram, Fourier transform, various edge detection algorithms, etc.
LineExtractor: Class functions related to extracting curve data points
- extractor.py: Select points sequentially and perform fitting or interpolation
- seperator.py: separate by color
- tailor.py: Box cropping
dist: The folder contains various packaged libraries and dependencies, as well as executable files (the file is not uploaded because it is too large, it is still in the development stage)
- To run the UI program directly, please clickdist/main/main.exe
- Packaging command:pyinstaller -D -i test.ico main.py --noconfirm
doc: related documents

Dependencies need to be installed before using python main.py:pip install -r requirements.txt

program interface

After the program is opened, as shown in the figure below, it includes two panes of input and output, both of which have a fixed size of 700*700. The window can be zoomed in and out, but the size of the pane is fixed.

The menu bar contains 5 main menus, and each menu has some specific functions, which correspond to all the requirements of the following large tasks one by one, and some functions have corresponding shortcut keys.

Files: Open, close and export images (and PDFs), which will be displayed in the input pane after opening
Basic operation: use the activated picture in the input pane as input, perform basic image operations, and display the picture in the output pane
FFT: Use the activated picture in the input pane as input, automatically convert it to a grayscale image, perform FFT-related operations, and display the picture in the output pane
- The "Fourier Transform" function outputs a total of 3 images of FFT, enhancement, and inverse transformation
Edge detection: Use the activated picture in the input pane as input, automatically convert it to a grayscale image, perform edge detection related operations, and display the picture in the output pane
Extraction: Steps related to extracting curve data points, which does not yet support the "whole process of extracting curve data points"
Practical tool: mainly used for the PDF of the input box, if the current input is not PDF, it will be used for all input images, and output a PDF

When running the program:

The console output is the output of the program running. If the execution is abnormal, the program will exit. See the console output for the exit reason.
The opened image will be displayed compressed, that is, the larger of the length and width is set to 700, and the other dimension is scaled proportionally. However, the actual size of the picture remains unchanged, and is displayed in the lower right corner through numbers, and the name of the picture will be displayed at the same time. If it is a pdf, the page number of the pdf will also be displayed
There are multiple tabs for input and output, and you can switch between panes. To close the pane, click Menu>File>Close, or click the ×button at the bottom of the window
The content currently displayed in each pane is the active tab
<Through and at the bottom of the pane >, you can switch the active tab. If it is a pdf, it will only switch the page of the pdf without switching the tab

basic function

Image Processing Program Framework

Design the framework of digital image processing program based on VC-based multiple document interface (MDI)
Program in the software to realize the reading and display of BMP format image files
Choose to realize the reading and display of JPG and RAW format files, as well as the conversion with BMP format
Complete the basic operations of images: addition, negation, geometric transformation
Complete the histogram equalization of the image

file reading and writing

There are various types of file formats, and four types of image formats and pdf formats are supported for opening and closing. In this way, the conversion between various image formats is realized, and the conversion from pdf to image is realized at the same time.

Basic operations on images

Addition will use all open input images (or current pdf) for addition, and finally use the maximum width and height of all images to be the size of the latest image.

Note : Please do not use images with inconsistent channels to add points, it will cause the program to crash (can be converted to grayscale images in advance)

The effect of zooming and rendering will not change, but the actual size has changed, see the size in the upper right corner.

Histogram equalization

Colors are visibly more vibrant:

Image Fourier transform and filtering

Realize the FFT transformation and display of the image
Implement FFT inverse transformation

Observe the spectrogram of a typical image after FFT transformation

First construct a black and white binary test image, for example: a 4×4 white square is generated in the center of a 128×128 black background. Then perform the following tests in sequence.
- DFT
- pan, zoom

FFT

Through the menu bar>FFT>Fourier transform, output 3 pictures during the execution process:

Spectrogram in the complex domain (forcibly converting uint8 makes the picture distorted)
2DFT map of dynamic range compression (map the point with the highest value to brightness 255)
Output map of FFT inverse transformation

High-pass/low-pass filtering

Support custom filter radius, select by interactive input, the results of high-pass and low-pass are as follows:

Representation of Fourier descriptors

For the boundary on the XY plane in Figure 1, it is represented by a Fourier descriptor and reconstructed with different numbers of items

The Fourier descriptor is an image feature used to describe the characteristic parameters of the contour.

The basic idea of the Fourier descriptor is: first, we set the shape profile of the object to be a closed curve, and a point moves along the boundary curve, assuming that this point is p(l), and its complex coordinate is x(l )+jy(l), its period is the perimeter of this closed curve, which also shows that it belongs to a periodic function. This function whose period is the circumference of the curve can be represented by a Fourier series . Multiple coefficients z(k) in the Fourier series have a direct relationship with the shape of the closed boundary curve, which is defined as a Fourier descriptor . When the coefficient term z(k) of sufficient order is taken, the Fourier descriptor can fully extract the shape information and restore the shape of the object. That is to say, the Fourier descriptor uses a vector to represent the contour and digitize the contour, so as to better distinguish different contours and achieve the purpose of recognizing objects. The Fourier descriptor is simple and very efficient, and it is one of the important methods to recognize the shape of an object.

To put it simply, the Fourier descriptor is to use a vector to represent a contour, digitize the contour, so as to better distinguish different contours, and then achieve the purpose of recognizing objects.

As shown in the figure above, a small number of Fourier descriptors can be used to capture the general characteristics of the boundary. This property is useful because these coefficients carry shape information.

The whole process is as follows:

Edge detection: use the edge detection algorithm to extract the edge, and perform a closing operation to make the edge clearer and remove small black spots
Select the largest contour among all contours, that is, the target contour, and draw it
Calculate the Fourier descriptor of the contour, and output the first 32 descriptors through the pane
The number of descriptor items can be selected for reconstruction.

edge detection

Program to realize image edge extraction based on typical differential operators (not less than Roberts, Sobel, Prewitt, Laplacian), able to read the content of image files, and output edge detection results after detection
Analyze and compare the characteristics of different operators

operator	Comparison of advantages and disadvantages
Roberts	The effect of image processing with steep low noise is better, but the result of using the Roberts operator to extract the edge is that the edge is relatively thick, so the edge positioning is not very accurate
Sobel	The image processing effect is better for grayscale gradients and images with more noise, and the Sobel operator is more accurate for edge positioning
Scharr	The difference from the Sobel operator is in the smoothing part. The smoothing operator used here is 1/16 [3, 10, 3], compared to 1/4 [1, 2, 1], the central element accounts for more weight Heavy. Assuming that the image is a signal with strong randomness, the domain correlation is not large
Prewitt	It has a better effect on image processing with grayscale gradients and more noise
Laplacian	Accurate positioning of step edge points in the image is very sensitive to noise, and part of the direction information of the edge is lost, resulting in some discontinuous detection edges.
Log	The LG operator often has double-edge pixel boundaries, and the detection method is sensitive to noise, so the LG operator is rarely used to detect edges, but to judge whether the edge pixels are located in the bright or dark areas of the image.
Canny	This method is not susceptible to noise and can detect truly weak edges. In the edge function, the most effective edge detection method is the Canny method. The advantage of this method is that two different thresholds are used to detect strong and weak edges respectively, and weak edges are included in the output image only when weak edges are connected to strong edges. Therefore, this method is not easy to be "filled" by noise, and it is easier to detect real weak edges

Extract Curve Data Points

Example of program running:

need

Input: images in .jpg/.png format of thermogravimetric tests of materials collected from literature

Output: data points corresponding to the curve

The image types involved can be mainly divided into the following three categories:

Simple lines

There are interfering lines (ie blue lines)

Complex lines (there are multiple target curves)

problem analysis

In the above example, all the input data have obvious characteristics, so it is easier to extract than ordinary curves. Input features include:

Most of the curves are surrounded by a black box , the abscissa is below the box, and the ordinate is on the left
If there are multiple curves in a graph, the colors of the curves are mostly inconsistent
The curves are mostly smooth curves, continuous from the far left to the right of the box, and monotonically decreasing
The shape of the curve can be solid or dashed
The curve mark is generally at the upper right of the line
The background of the picture is all white

Extraction process:

First find the position of the box or coordinate axis, and cut out the content to identify
Different lines are distinguished by color, and as many lines as there are colors are output, arranged in descending order of the number of contained points. Ignore the case where two lines have the same color
Select points from left to right on the line, always select the first point from top to bottom each time, but need to meet the requirements of monotonous decline
Restore the original function by interpolation or fitting, and output 100 equidistant interpolation points

box cropping

See tailor.py, the core idea is to find the x/y coordinate axis:

Determine horizontal and vertical lines based on the size of the horizontal/vertical pixel average
In these lines, the coordinate axis generally appears at the leftmost or bottommost of the picture
Because there must be a coordinate scale on the other side of the coordinate axis, exclude the horizontal and vertical lines that are completely white on the left/bottom
Determine the cropping box according to the coordinate axis and crop

Separate lines by color

Since the lines have their own colors, and the differences between the lines are large, different lines can be distinguished according to the colors. There are the following metrics to distinguish the difference of the lines:

Hue, after converting to hsv space, the hue is a different value between 0 and 1
Brightness, and the grayscale of the pixel
Color space distance, that is, the sum of the absolute deviations of RGB

Note: In order to facilitate the generation of a new image, the image is reversed before the separation is performed. The gray value below 50 is considered black, and the gray value above 235 is considered white.

round	number of colors	color band
0	215
1	1647
2	6986	——
3	9998	——

order selection

Select points from left to right on the line, always select the first point from top to bottom each time, but need to meet the requirements of monotonous decline. In this way, the selection of points can completely avoid the situation of selecting labels, and because the monotonous decline is limited, it is impossible for the jitter to be too large.

fit or interpolate

fit

In the process of fitting according to the sample points, the curve does not necessarily pass through the sample points, but the difficulty lies in the definition of the function form, and it is difficult to find a function fitting curve that meets the conditions. The figure below is the result of fitting from -3 to +3 items using a polynomial function, and the seven coefficients are:

[-5.81978492e+01  2.23481625e+02 -1.84351082e+02  1.34665445e+02
 -2.35943787e+00  1.71681831e-02 -1.61449275e-05]

It can be seen that this method does not represent the curve well.

interpolation

The result obtained by interpolation is much easier and more accurate than finding a fitting function, but there may be jitter or unsmoothness, as in the four places circled in the figure, the interpolated curve will also jitter.

The solution to the jittering lines:

Where the fitting curve jitters, such as the absolute value of the high-order reciprocal is large, delete the sample points in the neighborhood and re-interpolate. Threshold needs manual adjustment
Use the Fourier descriptor to reconstruct the contour, and smooth the contour by adjusting the number of M items

This step can be further optimized

extensions

Image to PDF function

Image/PDF to GIF function

Remove watermark function

There are different forms of watermarks. The common forms of watermarks on the market are:

Pure gray watermark, the three values of RGB are the same, which is obviously lighter (lower gray value) than other colors
Colored watermark, the hue is obviously inconsistent with other colors

Composite large image function

split horizon function

It is hoped that a sheet of music will be separated by group and divided into pictures.

Add the horizontal pixels of the picture, and you can see that there are obvious faults between each line of the score: 12 lines in total and 6 groups correspond to the 12 peaks in the right picture, and there are 5 sub-peaks in each peak, namely The five lines of the stave

Take all grayscale values >thresh as rows containing content, and aggregate adjacent rows together. If the distance is >distance, they will be separated as two separate pieces. When (thresh=80, distance=30), the output is as follows:

{0: [171, 177, 182, 183, 188, 194], 1: [230, 231, 236, 242, 247, 248, 253], 2: [305, 327, 328, 333, 334, 339, 345, 350, 351], 3: [387, 388, 393, 398, 399, 404, 410, 411], 4: [484, 485, 490, 495, 496, 501, 502, 507], 5: [544, 549, 550, 555, 561, 566, 567], 6: [624, 625, 641, 646, 647, 652, 653, 658, 663, 664], 7: [700, 701, 706, 712, 717, 718, 723], 8: [797, 798, 803, 804, 809, 814, 815, 820, 821], 9: [861, 866, 867, 872, 878, 883, 884], 10: [958, 963, 964, 969, 970, 975, 980, 981], 11: [1017, 1018, 1023, 1029, 1034, 1035, 1040]})

Visualization:

This method successfully finds all valid stave sequences. When segmenting, it can be segmented in the center of the gap. In particular, for the first group and the last group, take the mean value of all intervals.

error log

When the program starts:recursion is detected during loading of cv2 binary extensions. check opencv installation

Because the version of opencv is wrong: pip install opencv-python==4.5.3.56

Image rendering failed:

blocked by CORS policy: The request client is not a secure context and the resource is in more-private address space local.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.qt_for_python/uic		.qt_for_python/uic
LineExtraction		LineExtraction
__pycache__		__pycache__
data		data
doc		doc
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README-zh.md		README-zh.md
README.md		README.md
Ui_main.py		Ui_main.py
helper.py		helper.py
main.py		main.py
main.spec		main.spec
main.ui		main.ui
pic.py		pic.py
requirements.txt		requirements.txt
test.ico		test.ico
test.png		test.png
test1.png		test1.png
util.py		util.py

License

sunieee/DigitalImage

Folders and files

Latest commit

History

Repository files navigation

Digital Image Processing Program Framework

program description

Directory Structure

program interface

basic function

Image Processing Program Framework

file reading and writing

Basic operations on images

Histogram equalization

Image Fourier transform and filtering

FFT

High-pass/low-pass filtering

Representation of Fourier descriptors

edge detection

Extract Curve Data Points

need

problem analysis

box cropping

Separate lines by color

order selection

fit or interpolate

fit

interpolation

extensions

Image to PDF function

Image/PDF to GIF function

Remove watermark function

Composite large image function

split horizon function

error log

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages