mupdf.git
22 months agoUpdate version number and CHANGES. 1.22.x 1.22.2
Robin Watts [Fri, 16 Jun 2023 13:58:17 +0000 (14:58 +0100)]
Update version number and CHANGES.

22 months agoBug 706738: OSS Fuzz 59207. Avoid using invalid pdf_xref_entry.
Robin Watts [Mon, 29 May 2023 14:32:07 +0000 (15:32 +0100)]
Bug 706738: OSS Fuzz 59207. Avoid using invalid pdf_xref_entry.

The problem here is that we end up accessing a pdf_xref_entry
that has just been freed due to an intervening operation
(an xref solidification).

The bug report notes that this has only happened since
https://git.ghostscript.com/?p=mupdf.git;a=commitdiff;h=27a069e786d36bc9d22d2fe9c02493612d2f5ef8

In the code before that bug, the solidification would happen
earlier and hence we were getting away with it. This is really
a separate issue.

Essentially, any call to pdf_cache_object can cause a
solidification (or repair). Similarly, any call to
pdf_get_xref_entry (or pdf_get_xref_entry_no_null, but not
pdf_get_xref_entry_no_change) can similarly causes changes.

pdf_load_object (and pdf_load_obj_stm) call pdf_cache_object, so
the same limitations apply to that.

As such, we must not hold pdf_xref_entry pointers over such
calls.

We were breaking this rule in pdf_cache_object itself (where it
recurses to handle object streams), and in pdf_load_obj_stm
where it would call pdf_get_xref_entry_no_null.

Fix both those cases here, and add some comments to try to
avoid this happening again in future.

22 months agoBug 706694: Fix missing layer names from PDF.
Robin Watts [Fri, 9 Jun 2023 18:10:39 +0000 (19:10 +0100)]
Bug 706694: Fix missing layer names from PDF.

I was interpreting these as 'Names' when at least some of the
time (maybe always?) they are strings.

22 months agoRemove bad return from inside try block.
Tor Andersson [Mon, 29 May 2023 08:52:52 +0000 (10:52 +0200)]
Remove bad return from inside try block.

22 months agoBug 706581 continued: Fix SEGV for pages with no text.
Robin Watts [Fri, 12 May 2023 10:29:38 +0000 (11:29 +0100)]
Bug 706581 continued: Fix SEGV for pages with no text.

Doing text extraction on pages with no text could result in
flush text deferencing NULL. Guard against this.

22 months agoBug 706642: Text extraction; maybe add spaces when prepending lines.
Robin Watts [Thu, 11 May 2023 13:37:28 +0000 (14:37 +0100)]
Bug 706642: Text extraction; maybe add spaces when prepending lines.

When we prepend a line to another one, if there is a suitable gap
between it and the line we are prepending it to, insert a space.

Logic matches the addition of spaces when simply adding chars.

22 months agoBug 706718: Don't prepend text extracted lines if vertically shifted.
Robin Watts [Thu, 11 May 2023 11:14:56 +0000 (12:14 +0100)]
Bug 706718: Don't prepend text extracted lines if vertically shifted.

The bugfix for 706426 was incorrect, in that it did not check for
text extracted lines being vertically shifted when considering them
for prepending.

Fixed here.

22 months agoBug 706581: Flush pending text when starting/finishing a layer.
Robin Watts [Thu, 11 May 2023 10:21:48 +0000 (11:21 +0100)]
Bug 706581: Flush pending text when starting/finishing a layer.

This ensures that text does not end up in the wrong layer.

22 months agoFurther speed chinese-example.pdf.
Robin Watts [Wed, 3 May 2023 19:14:44 +0000 (20:14 +0100)]
Further speed chinese-example.pdf.

This file has a series of incremental updates, but the original
xref section in the file is highly fragmented. The time taken
to search through this highly fragmented subsection is significant.

So, add an optimisation whereby we solidify the earliest xref
in the file on loading.

22 months agoAttempt to speeedup "copy: chinese-example.pdf" PyMuPDF test.
Robin Watts [Wed, 3 May 2023 14:40:16 +0000 (15:40 +0100)]
Attempt to speeedup "copy: chinese-example.pdf" PyMuPDF test.

In order to copy the file, we first open it, then save it.

As part of opening it, we perform tests on each of the
objects in the file. This takes a long time.

This file is strangely constructed in that of the 4 xrefs in
the file, the most basic one is extremely fragmented, making
looking up an object in the xref be almost an O(n) process.

We lookup every object as part of the tests, making this an
O(n^2) process.

Here we move to a process whereby we 'map' the checks across
the objects in the file, moving it back to O(n).

23 months agoUpdate version number post release.
Tor Andersson [Fri, 12 May 2023 10:58:37 +0000 (12:58 +0200)]
Update version number post release.

23 months agoUpdate version number to 1.22.1. 1.22.1
Tor Andersson [Fri, 12 May 2023 10:40:19 +0000 (12:40 +0200)]
Update version number to 1.22.1.

23 months agoUpdate CHANGES and README.
Tor Andersson [Thu, 11 May 2023 17:38:49 +0000 (19:38 +0200)]
Update CHANGES and README.

23 months agoBug 706719: Fix redaction of unmasked images.
Robin Watts [Thu, 11 May 2023 15:17:41 +0000 (16:17 +0100)]
Bug 706719: Fix redaction of unmasked images.

The fix for bug 706114 contained a logical error, meaning that
we were never redacting unmasked images.

We correct the logic here.

23 months agoBug 706667: Add missing limits.h include for UINT_MAX.
Raphaël Mélotte [Mon, 24 Apr 2023 15:27:35 +0000 (17:27 +0200)]
Bug 706667: Add missing limits.h include for UINT_MAX.

encode-basic.c uses 'UINT_MAX', which is available in the 'limits.h'
header.

In some configurations that build with zlib from [1], by chance
limits.h gets indirectly included when including 'z-imp.h' (the
includes are: 'z-imp.h' -> 'zlib.h' -> 'zconf.h' -> 'limits.h'), so
the build succeeds.

When using other zlib implementations however (for example from [2]),
limits.h is not necessarily included indirectly, which leads to the
build failing in the following way:

source/fitz/encode-basic.c: In function 'deflate_write':
source/fitz/encode-basic.c:343:27: error: 'UINT_MAX' undeclared (first use in this function)
  343 |         newbufsize = n >= UINT_MAX ? UINT_MAX : deflateBound(&state->z, n);
      |                           ^~~~~~~~
source/fitz/encode-basic.c:26:1: note: 'UINT_MAX' is defined in header '<limits.h>'; did you forget to '#include <limits.h>'?

Add the missing include, so that the build succeeds no matter if zlib
indirectly includes 'limit.h' or not.

Similarly, also add it in output-ps.c where it's also missing.

[1]: https://zlib.net/
[2]: https://github.com/zlib-ng/zlib-ng

Signed-off-by: Raphaël Mélotte <[email protected]>
23 months agoBug 706591: Don't write uninitialized data in error message.
Tor Andersson [Fri, 5 May 2023 09:32:45 +0000 (11:32 +0200)]
Bug 706591: Don't write uninitialized data in error message.

23 months agoBug 706703: Avoid stream 'Length' overflow.
Robin Watts [Thu, 4 May 2023 11:50:26 +0000 (12:50 +0100)]
Bug 706703: Avoid stream 'Length' overflow.

When we read 'Length' from the file, we were reading it
as an int. We then 'guessed' the length of the uncompressed
stream by multiplying by 3 - and this overflowed.

Swap to reading it as an int64_t (the best we can do),
converting to a size_t and then working in size_t's
throughout.

23 months agoUpdate jbig2dec.
Sebastian Rasmussen [Wed, 3 May 2023 11:08:43 +0000 (13:08 +0200)]
Update jbig2dec.

23 months agoBug 706582: Revamp mobi parser for third time.
Sebastian Rasmussen [Mon, 17 Apr 2023 14:56:42 +0000 (16:56 +0200)]
Bug 706582: Revamp mobi parser for third time.

  * Never append uninitialized buffer data to output.

  * Ensure parsing each record never reads outside the record.

  * Throw exception reading/skipping past record header fields fail.

  * Print warnings when running out of data in a text record.

  * Limit record offsets to the range between end of header and EOF.

  * Require record offsets to be increasing.

23 months agoBug 706583: Do not dereference NULL pointer for empty font entries.
Sebastian Rasmussen [Sun, 16 Apr 2023 19:39:41 +0000 (21:39 +0200)]
Bug 706583: Do not dereference NULL pointer for empty font entries.

Previously if search_by_family() was asked to lookup "" it would get
to the NotoSansChorasmian font where the font family would match and
the code would try to dereference the size pointer which was NULL
resulting in a crash.

23 months agoTweak bounds on loop to avoid crash with NULL pointer.
Robin Watts [Fri, 28 Apr 2023 16:06:29 +0000 (17:06 +0100)]
Tweak bounds on loop to avoid crash with NULL pointer.

If s == NULL, e = s + 0, d = s+2 then:

testing "s < d && d < e - 9" is bad, because e - 9 is unsigned.

Better to test "s < d && d + 9 < e".

2 years agoEnsure that succeeding record offsets are less than mobi size. 1.22.0
Sebastian Rasmussen [Wed, 12 Apr 2023 20:47:04 +0000 (22:47 +0200)]
Ensure that succeeding record offsets are less than mobi size.

2 years agohtml: Handle nodes with an empty string as the tag name.
Tor Andersson [Thu, 13 Apr 2023 12:22:22 +0000 (14:22 +0200)]
html: Handle nodes with an empty string as the tag name.

The HTML5 parsing algorithm can generate such nodes when given
junk input.

Example file:

<!DOCTYPE html>
</><x>test

2 years agoBug 706569: Ensure that offsets are smaller than mobi size.
Sebastian Rasmussen [Tue, 11 Apr 2023 19:20:45 +0000 (21:20 +0200)]
Bug 706569: Ensure that offsets are smaller than mobi size.

2 years agotiff: Better boundary checking for JPEG embedded in TIFF.
Sebastian Rasmussen [Sat, 8 Apr 2023 00:19:40 +0000 (02:19 +0200)]
tiff: Better boundary checking for JPEG embedded in TIFF.

2 years agotiff: Require a minimum number of samples based on photometric.
Sebastian Rasmussen [Sat, 8 Apr 2023 00:19:10 +0000 (02:19 +0200)]
tiff: Require a minimum number of samples based on photometric.

2 years agohtml: Inherit background-color in table cells to pick up row backgrounds.
Tor Andersson [Thu, 6 Apr 2023 13:12:33 +0000 (15:12 +0200)]
html: Inherit background-color in table cells to pick up row backgrounds.

2 years agohtml: Set all table cell heights to match the table row height.
Tor Andersson [Thu, 6 Apr 2023 13:08:58 +0000 (15:08 +0200)]
html: Set all table cell heights to match the table row height.

2 years agoBug 706541: Retry FT_Load_Glyph without hinting if we get an error.
Tor Andersson [Thu, 6 Apr 2023 10:45:46 +0000 (12:45 +0200)]
Bug 706541: Retry FT_Load_Glyph without hinting if we get an error.

Recover if embedded font files have invalid opcodes or other bytecode
interpretation errors during hinting.

2 years agoMSVC: Fix dom.c/story.c compilation problems.
Robin Watts [Fri, 7 Apr 2023 11:42:40 +0000 (12:42 +0100)]
MSVC: Fix dom.c/story.c compilation problems.

Should be excluded from x64 builds as well as Win32 ones.

2 years agoUpdate CHANGES with changes since last release. 1.22.0-rc1
Tor Andersson [Tue, 14 Mar 2023 12:08:21 +0000 (13:08 +0100)]
Update CHANGES with changes since last release.

2 years agohtml: Don't compensate for box margins twice when sizing images.
Tor Andersson [Thu, 30 Mar 2023 15:26:50 +0000 (17:26 +0200)]
html: Don't compensate for box margins twice when sizing images.

Horizontal margins are already taken into account when sizing the
flow box; we can limit the image size to the width of the flox box
directly.

Vertical margins of surrounding blocks also don't matter because they
don't add extra space at the beginning/end of a page break - only the
page margins which are already accounted for in page_h.

2 years agoCheck that we have a color image in fz_fill_image.
Tor Andersson [Mon, 3 Apr 2023 15:28:39 +0000 (17:28 +0200)]
Check that we have a color image in fz_fill_image.

We can't draw alpha-only images; the user must convert these to grayscale
or visualize them in some other way before trying to draw such images.

2 years agotiff: Limit bits per sample to 16 until MuPDF supports more.
Sebastian Rasmussen [Tue, 4 Apr 2023 16:39:33 +0000 (18:39 +0200)]
tiff: Limit bits per sample to 16 until MuPDF supports more.

This fixes Coverity CID 313480.

2 years agoFix PDF mark list handling of 0-numbered objects.
Robin Watts [Wed, 5 Apr 2023 10:37:48 +0000 (11:37 +0100)]
Fix PDF mark list handling of 0-numbered objects.

In the recent commit for Bug 706506, I incorrectly optimised
pdf_mark_list_push to just exit if called with an object
with num == 0. We didn't check for it, so why store it?

Well, if we don't store it, we have nothing to pop, and hence
we can get out of sync when popping stuff off off the list.

Revert that optimisation here, but keep the actual fix (which
was the memcpy bit).

2 years agoFix bad format string in fz_warn.
Robin Watts [Mon, 3 Apr 2023 17:37:26 +0000 (18:37 +0100)]
Fix bad format string in fz_warn.

2 years agotiff: Support embedded JPEG images.
Sebastian Rasmussen [Wed, 15 Mar 2023 07:15:03 +0000 (08:15 +0100)]
tiff: Support embedded JPEG images.

2 years agotiff: Take byte order into account for palettized images.
Robin Watts [Thu, 30 Mar 2023 13:38:22 +0000 (15:38 +0200)]
tiff: Take byte order into account for palettized images.

2 years agotiff: Rename confusing x/y variable names.
Sebastian Rasmussen [Wed, 29 Mar 2023 03:40:44 +0000 (05:40 +0200)]
tiff: Rename confusing x/y variable names.

2 years agotiff: Do not initialize samples to 0x55, use 0x00.
Sebastian Rasmussen [Wed, 29 Mar 2023 14:23:10 +0000 (16:23 +0200)]
tiff: Do not initialize samples to 0x55, use 0x00.

The reason is that for tiled TIFFs the tile is ORed into
place in the samples array. So if the sample array already
contains set bits, then those will interfere with the ORed
tile samples.

2 years agotiff: Fix check for source stride when expanding colormap.
Sebastian Rasmussen [Thu, 30 Mar 2023 14:20:36 +0000 (16:20 +0200)]
tiff: Fix check for source stride when expanding colormap.

2 years agotiff: Harden implementation against fuzzing.
Sebastian Rasmussen [Mon, 13 Mar 2023 14:27:58 +0000 (15:27 +0100)]
tiff: Harden implementation against fuzzing.

* Process tiff tags twice: First process all tags that can only
  have a single value (count must be 1) and store their value
  into the internal state, if tags that can have multiple values
  (count can be > 1) are encountered, store the count in the
  internal state. Next, use the read out values to compute upper
  limits on, e.g. colormap size, number of tiles and number of
  strips. Finally, process all tags again, but read out only
  those with multiple values, ensuring that their data do not
  exceed the previously computed limits.

* Clamp tag reading to remaining data. Before this commit the
  tiff parser would read all remaining data and then reinterpret
  EOF as all bits set for byte/short/long/etc reads until the
  count reached zero. Since the count may be ridiculously large
  this may take a long time. After this commit the reads are
  clamped to the remaining data. Any unused part of the
  destination buffer is zeroed.

* Require samples per pixel to be larger than extra samples.

* Limit bits per sample to 32.

* Limit TileOffsets/TileByteCounts lengths depending on
  ImageWidth/ImageLength/TileWidth/TileLength.

* Limit StripOffsets/StripByteCounts lengths depending on
  ImageLength/RowsPerStrip.

* Limit ColorMap length depending on BitsPerSample.

* Error out if having to read tiles/strips but no data available.

* When pasting tiles into the decoded samples, index into the
  correct source/destination byte and ensure to shift the source
  sample if needed.

* Sort the tags numerically in the switch blocks in the parsing
  functions.

2 years agoRemove unused function.
Sebastian Rasmussen [Sun, 2 Apr 2023 02:36:32 +0000 (04:36 +0200)]
Remove unused function.

2 years agoInclude header declaring mkdir and set mode when creating directory.
Sebastian Rasmussen [Sun, 2 Apr 2023 02:34:54 +0000 (04:34 +0200)]
Include header declaring mkdir and set mode when creating directory.

2 years agoBug 706485: Fix mutool clean to cope with XRef missing from ObjStm.
Robin Watts [Fri, 31 Mar 2023 17:21:27 +0000 (18:21 +0100)]
Bug 706485: Fix mutool clean to cope with XRef missing from ObjStm.

If we try to load an obj from an objstm and it's not there,
trigger a repair. If the obj is still not found after the
repair, we'll still throw.

Cope with ignoring such errors in preloadobjstms.

2 years agoBug 706480 followup: Further tweak gridfitting.
Robin Watts [Fri, 31 Mar 2023 11:46:25 +0000 (12:46 +0100)]
Bug 706480 followup: Further tweak gridfitting.

The draw device maintains a flag so that it knows when it is
rendering a type3 font. We use this flag to inhibit gridfitting
of images.

If, however, we are using a pattern within a type3 font that
happens to have an image in it, we'd rather NOT inhibit
gridfitting for that.

So, tweak the draw device to clear the type3 flag during
patterns.

This only produces 1 diff in our tests, and that diff is a
cure for exactly the 1 test that I thought was a regression
when the original fix for bug 706480 went in.

tests_private/pdf/PDF_2.0_FTS/fts_23_2311.pdf to pgm @ 200dpi
now has no breakup in the fill of the Type3 d1 triangle glyph.

2 years agoBug 706484: Remove limits on page tree depth.
Robin Watts [Thu, 30 Mar 2023 16:11:52 +0000 (17:11 +0100)]
Bug 706484: Remove limits on page tree depth.

When we search the page tree for resources, we currently
use a recursive algorithm. To avoid stack overflows we
limit recursive depth to 100.

Use an alternative scheme, suggested by Anatoly Vorobey,
whereby we use a modified version of "Floyd's cycle detection
algorithm".

This does away both with the need for recursion, and for the
need to search an extending list at each step, hence moving
from quadratic time to linear. Very clever!

2 years agoBug 705866: Move accelerator files.
Robin Watts [Thu, 23 Mar 2023 11:42:13 +0000 (11:42 +0000)]
Bug 705866: Move accelerator files.

Currently, MuPDF puts accelerator files into the directory given
by the environment variables for TEMP, or TMP, or failing that
'/var/tmp' or '/tmp/'.

It has been suggested that this is a security risk because people
can cause access to block by using a fifo.

On windows machines, we try to save in %USERPROFILE%\.config\mupdf.
If USERPROFILE is not defined, we drop back to TEMP and TMP as
before - these are generally user specific on modern Windows boxes.

On other machines, we try $XDG_CACHE_HOME/mupdf, or $HOME/.cache/mupdf.

2 years agoBug 706480: Don't grid fit images during Type3 font rendering.
Robin Watts [Thu, 30 Mar 2023 17:08:20 +0000 (18:08 +0100)]
Bug 706480: Don't grid fit images during Type3 font rendering.

2 years agoBug 706506: Fix infinite recursion in mutool info.
Robin Watts [Thu, 30 Mar 2023 11:35:43 +0000 (04:35 -0700)]
Bug 706506: Fix infinite recursion in mutool info.

2 separate problems here.

Firstly, pdf_mark_list_push fails to copy the pushed data
from local_list to the newly malloced block when we first
move from local storage to malloced storage.

(Also, while we are here, there is no point in storing
0 entries in the list as they will never be checked!)

Secondly, while gatherresourceinfo is supposed to be
called with indirected objects, it can (as is the case
with the given fuzzed file) be called with direct
references. These don't put anything sane in the cycle
checker.

Thus by using (legal) indirections from (say) the Font
resource entry back to something that contains the
original resource entry, we can get cycles that we don't
detect.

Fix this by pushing such entries onto the mark list.

2 years agoAdd some clarifying comments to page/document structures.
Robin Watts [Tue, 28 Mar 2023 16:48:48 +0000 (17:48 +0100)]
Add some clarifying comments to page/document structures.

2 years agoSimplify overflow-wrap: break-word handling.
Tor Andersson [Fri, 24 Mar 2023 11:26:59 +0000 (12:26 +0100)]
Simplify overflow-wrap: break-word handling.

Split the word nodes at each cluster, and flag the fragments as
atomic (can't be broken down further) and overflow-wrap (can be taken
as a wrapping point if in desperate need).

This lets us more easily avoid infinite looping at the beginning
of a line, when even the tiniest split fragments can't fit.

This fixes bug 706481.

2 years agoUpdate MuJS submodule.
Tor Andersson [Tue, 21 Mar 2023 11:39:14 +0000 (12:39 +0100)]
Update MuJS submodule.

2 years agoFall back to glyph names if ToUnicode table is obviously broken.
Tor Andersson [Mon, 20 Mar 2023 21:39:30 +0000 (22:39 +0100)]
Fall back to glyph names if ToUnicode table is obviously broken.

2 years agoscripts/wrap/: added diagnostics activated by -b -d <name>.
Julian Smith [Thu, 23 Mar 2023 16:00:13 +0000 (16:00 +0000)]
scripts/wrap/: added diagnostics activated by -b -d <name>.

2 years agoscripts/wrap/cpp.py: also use pdf_new_foo_*() fns as constructors of fz_* structs.
Julian Smith [Thu, 23 Mar 2023 15:57:14 +0000 (15:57 +0000)]
scripts/wrap/cpp.py: also use pdf_new_foo_*() fns as constructors of fz_* structs.

We previously only considered fz_new_() fns as constructors of fz_* structs,
but the fz_/pdf_ prefix is determined by the args, not the return type, so this
was missing possible constructors.

E.g. this picks up pdf_new_stext_page_from_annot() as a constructor of
fz_stext_page.

2 years agoscripts/wrap/__main__.py: use new wdev.py to find C# compiler csc.exe on windows.
Julian Smith [Fri, 17 Mar 2023 13:12:42 +0000 (13:12 +0000)]
scripts/wrap/__main__.py: use new wdev.py to find C# compiler csc.exe on windows.

This avoids using the previous hard-coded csc.exe path.

2 years agoscripts/wdev.py: new, support for finding windows command-line dev tools.
Julian Smith [Fri, 17 Mar 2023 13:11:19 +0000 (13:11 +0000)]
scripts/wdev.py: new, support for finding windows command-line dev tools.

For example this successfully finds cl.exe, link.exe and csc.exe on Github
Windows machines.

2 years ago.github/workflows/test_csharp.yml: new, for testing C# bindings on github.
Julian Smith [Fri, 17 Mar 2023 11:10:13 +0000 (11:10 +0000)]
.github/workflows/test_csharp.yml: new, for testing C# bindings on github.

2 years agoscripts/wrap/cpp.py: fix handling of fns that return const fz_foo*.
Julian Smith [Fri, 10 Mar 2023 23:36:40 +0000 (23:36 +0000)]
scripts/wrap/cpp.py: fix handling of fns that return const fz_foo*.

2 years agoscripts/wrap/cpp.py: removed SWIG out-param preprocessor code in generated output.
Julian Smith [Fri, 10 Mar 2023 23:35:08 +0000 (23:35 +0000)]
scripts/wrap/cpp.py: removed SWIG out-param preprocessor code in generated output.

This is unnecessary now we provide explicit alternative fns for out-param
handling.

2 years agoscripts/wrap/cpp.py: fix for use with libclang-16.0.0.
Julian Smith [Fri, 24 Mar 2023 12:45:14 +0000 (12:45 +0000)]
scripts/wrap/cpp.py: fix for use with libclang-16.0.0.

With libclang-16.0.0, typedefs for fnptrs seem to appear as
clang.cindex.TypeKind.ELABORATED instead of clang.cindex.TypeKind.TYPEDEF.

This was leading us to omission of fnptr wrappers for generated Director
classes such as FzOutput2.

2 years agoBug 706497: Avoid overflow in XPS clean_path method.
Robin Watts [Thu, 23 Mar 2023 16:49:35 +0000 (16:49 +0000)]
Bug 706497: Avoid overflow in XPS clean_path method.

Thanks to Chamal Desilva for the report.

2 years agoBug 706498: Increase maximum number of chars in an MRange CMAP entry.
Robin Watts [Thu, 23 Mar 2023 16:12:24 +0000 (16:12 +0000)]
Bug 706498: Increase maximum number of chars in an MRange CMAP entry.

This is to cope with PDF files that map single font chars to
long strings, like "[free-action]".

Someone will undoutably complain that 32 is not large enough at some
point in future...

Thanks to Anatoly Vorobey for the report and patch.

2 years agoBug 706493: Avoid buffer overrun in xps_parse_color.
Robin Watts [Thu, 23 Mar 2023 16:32:29 +0000 (16:32 +0000)]
Bug 706493: Avoid buffer overrun in xps_parse_color.

Thanks to Chamal Desilva for the report.

2 years agoMSVC: Fix Memento mupdf-gl Debug builds to not be optimised.
Robin Watts [Wed, 15 Mar 2023 18:47:52 +0000 (18:47 +0000)]
MSVC: Fix Memento mupdf-gl Debug builds to not be optimised.

Makes debugging hard!

2 years agoFix github code analysis issue #65.
Robin Watts [Thu, 23 Mar 2023 11:17:16 +0000 (11:17 +0000)]
Fix github code analysis issue #65.

Multiplication result converted to larger type.

Don't int * int, then cast to size_t as any overflow in the
original multiplication will have been lost. Instead, cast
to size_t before the multiplication.

2 years agoMemento additions.
Julian Smith [Mon, 20 Mar 2023 22:03:59 +0000 (22:03 +0000)]
Memento additions.

Memento_startLeaking():
    Call Memento_init() if necessary, otherwise memento.leaking will be reset
    to 0 by later call of Memento_init().

Memento_addBacktraceLimitFnname():
    New, terminate backtraces at given function name(s); useful when MuPDF is
    being run by Python, e.g.:
        Memento_addBacktraceLimitFnname("cfunction_call");

Environmental variable MEMENTO_HIDE_REF_CHANGE_BACKTRACES:
    If 1, we exclude backtraces for takeRef/dropRef events.

Environmental variable MEMENTO_ATEXIT_FIN:
    If 0, we don't call Memento_fin() in an atexit() handler.

Improved info about known leaks:
    Omit backtraces for known leaked blocks.
    Added number/size block stats excluding known leaks.
    Added number/size block stats for known leaks.

2 years agoAdd 'pdf_pin_document' API to future proof us.
Robin Watts [Wed, 22 Mar 2023 10:30:02 +0000 (10:30 +0000)]
Add 'pdf_pin_document' API to future proof us.

In the next version, we plan to rewrite the internal
pdf_obj -> pdf_doc reference handling to use a "weak
reference" system. This will protect us more against
the document being dropped while references are still
held to pdf_objs.

The largest potential for this is with systems which
wrap MuPDF's C API into garbage collected objects
(such as Java and Python).

Accordingly, people should move their code to use
pdf_pin_document instead of pdf_get_bound_document
and pdf_get_indirect_document.

2 years agobmp: Move check earlier to avoid pixmap allocation then throwing an exception.
Sebastian Rasmussen [Tue, 14 Mar 2023 22:48:54 +0000 (23:48 +0100)]
bmp: Move check earlier to avoid pixmap allocation then throwing an exception.

2 years agopnm: Check buffer boundaries when parsing.
Sebastian Rasmussen [Mon, 13 Mar 2023 23:28:17 +0000 (00:28 +0100)]
pnm: Check buffer boundaries when parsing.

2 years agoEnsure MuPDF compiles with FZ_ENABLE_PDF=0.
Sebastian Rasmussen [Mon, 13 Mar 2023 02:01:00 +0000 (03:01 +0100)]
Ensure MuPDF compiles with FZ_ENABLE_PDF=0.

This commit resolves:

 * pdf_border_style is an incomplete type
 * ffi_to_quad defined but not used
 * implicit declaration of ffi_tobuffer

2 years agoSave and restore gstate when calling a Type3 CharProc.
Tor Andersson [Fri, 17 Mar 2023 13:38:18 +0000 (14:38 +0100)]
Save and restore gstate when calling a Type3 CharProc.

Clear the current font before calling the CharProc, so that we
don't end up with infinite recursion if the CharProc draws text
without setting its own font explicitly (...which would mean drawing
text using itself).

This fixes OSS-fuzz issue 57066.

2 years agosource/html/html-parse.c: fix use of uninitialised fz_html_box::list_item.
Julian Smith [Mon, 20 Mar 2023 12:31:06 +0000 (12:31 +0000)]
source/html/html-parse.c: fix use of uninitialised fz_html_box::list_item.

Spotted by valgrind.

new_box(): explicitly initialise fields .is_first_flow, .structure and
.list_item. This doesn't actually fix the problem, but may avoid potential
problems in future.

xml_to_boxes(): explicitly initialise `struct genstate g = {0}` so that
later use of `this_box->list_item = ++g->list_counter` does not propogate
uninitialised value. In the following explicit initialisations, added some
missing items and reorder to match `struct genstate` definition.

2 years agojs: Make setPageLabels/deletePageLabels match Java API.
Tor Andersson [Thu, 9 Mar 2023 14:40:05 +0000 (15:40 +0100)]
js: Make setPageLabels/deletePageLabels match Java API.

2 years agoAdd fz_new_document_writer_with_buffer function.
Tor Andersson [Tue, 7 Mar 2023 15:11:33 +0000 (16:11 +0100)]
Add fz_new_document_writer_with_buffer function.

2 years agoAdd fz_open_document_with_buffer function.
Tor Andersson [Fri, 3 Mar 2023 15:10:18 +0000 (16:10 +0100)]
Add fz_open_document_with_buffer function.

This is very often wanted by various language bindings.

Provide it!

2 years agoSetting annotation file specification should not take ownership.
Sebastian Rasmussen [Mon, 6 Mar 2023 02:47:51 +0000 (03:47 +0100)]
Setting annotation file specification should not take ownership.

2 years agogl: Show both stream and string encryption methods in info dialog if not same.
Sebastian Rasmussen [Sat, 11 Mar 2023 01:59:26 +0000 (02:59 +0100)]
gl: Show both stream and string encryption methods in info dialog if not same.

2 years agoAdd APIs to expose both stream and string encryption methods.
Sebastian Rasmussen [Sat, 11 Mar 2023 01:59:01 +0000 (02:59 +0100)]
Add APIs to expose both stream and string encryption methods.

2 years agoBug 699212: Prefer default method/key length based on crypt ver/rev.
Sebastian Rasmussen [Fri, 10 Mar 2023 06:24:45 +0000 (07:24 +0100)]
Bug 699212: Prefer default method/key length based on crypt ver/rev.

Previously if crypt filters' method/key lengths were inconsistent
with the expectations of the crypt version/revision MuPDF would prefer
those given by the crypt filter.

The file in the bug specifies version 5 revision 6 encryption.
According to spec revision 6 implies version 5, and version 5
is always a 256 bit key with the AES cryption method. The crypt
filters in the file specify AESV2. But according to PDF 1.7
extension level 3 AESV2 is only valid for version 4 revision 4
encryption!

Since MuPDF preferred the crypt filters' encryption method AESV2
and key length 128 bits, an exception was triggered later when
the key length was found not to be 256 bits.

After this commit, the keylength and method implied by ver/rev in
the crypt dictionary will be assumed.

2 years agoBug 706114: Be smarter about redacting masked images.
Robin Watts [Mon, 13 Mar 2023 18:40:23 +0000 (18:40 +0000)]
Bug 706114: Be smarter about redacting masked images.

A fairly common construct in PDF files is a 1x1 image, masked
with a much more detailed softmask.

If a portion of such an image is redacted, we should redact
only the mask, not the image, otherwise we remove all the
content that's outside the area too.

2 years agoBug 706473: Loosen controls on when we can embed fonts.
Robin Watts [Fri, 10 Mar 2023 17:57:34 +0000 (17:57 +0000)]
Bug 706473: Loosen controls on when we can embed fonts.

Embed fonts in all cases except those where we are really told
not to.

2 years agoLimit OCG parsing recursion when handling marked content.
Sebastian Rasmussen [Fri, 10 Mar 2023 15:15:45 +0000 (16:15 +0100)]
Limit OCG parsing recursion when handling marked content.

This fixes OSS-fuzz 56869.

2 years agogl: Drop draw device upon exception rendering page contents/annots/widgets.
Sebastian Rasmussen [Thu, 9 Mar 2023 13:42:17 +0000 (14:42 +0100)]
gl: Drop draw device upon exception rendering page contents/annots/widgets.

2 years agoUpdate extract with latest changes.
Sebastian Rasmussen [Mon, 6 Mar 2023 14:58:21 +0000 (15:58 +0100)]
Update extract with latest changes.

2 years agoscripts/wrap/swig.py: removed use of %cstring_output_allocate.
Julian Smith [Mon, 6 Mar 2023 16:38:12 +0000 (16:38 +0000)]
scripts/wrap/swig.py: removed use of %cstring_output_allocate.

This seems to cause problems when building the C# bindings on some machines,
and these days we don't really use swig's out-param support.

2 years agoQuiet a few compiler warnings.
Sebastian Rasmussen [Fri, 24 Feb 2023 15:43:54 +0000 (16:43 +0100)]
Quiet a few compiler warnings.

2 years agodocs: Update list of thirdparty libraries.
Sebastian Rasmussen [Thu, 2 Mar 2023 18:32:51 +0000 (19:32 +0100)]
docs: Update list of thirdparty libraries.

2 years agogl: Catch error while saving accelerator file.
Sebastian Rasmussen [Thu, 2 Mar 2023 18:24:41 +0000 (19:24 +0100)]
gl: Catch error while saving accelerator file.

2 years agoUpdate harfbuzz to vesion 6.0.0.
Sebastian Rasmussen [Thu, 2 Mar 2023 17:29:32 +0000 (18:29 +0100)]
Update harfbuzz to vesion 6.0.0.

2 years agoBug 706402: Downgrade problems while loading PDF jpx's to minor.
Robin Watts [Wed, 1 Mar 2023 14:26:57 +0000 (14:26 +0000)]
Bug 706402: Downgrade problems while loading PDF jpx's to minor.

Introduce fz_morph_error to allow us to change the code of an
error. Use this to replace 'GENERIC' errors thrown when loading
jpxs from images with 'MINOR' ones. This allows others (such as
TRYLATER and MEMORY) to continue to be passed as is.

2 years agojava: Fix typo in Rect.isValid.
Tor Andersson [Thu, 2 Mar 2023 16:47:08 +0000 (17:47 +0100)]
java: Fix typo in Rect.isValid.

2 years agoUpdate freetype to version 2.13.0.
Sebastian Rasmussen [Wed, 1 Mar 2023 20:04:09 +0000 (21:04 +0100)]
Update freetype to version 2.13.0.

2 years agoOnly attempt to free resource stack entry if it exists.
Sebastian Rasmussen [Wed, 1 Mar 2023 19:01:44 +0000 (20:01 +0100)]
Only attempt to free resource stack entry if it exists.

This fixes OSS-fuzz 56153.

2 years agoOnly try to free forward page map entires if a map exists.
Sebastian Rasmussen [Wed, 1 Mar 2023 18:51:23 +0000 (19:51 +0100)]
Only try to free forward page map entires if a map exists.

2 years agoChange forward page map to be a list of object references.
Robin Watts [Wed, 1 Mar 2023 11:09:16 +0000 (11:09 +0000)]
Change forward page map to be a list of object references.

Currently, the forward page map is a list of object numbers.
This means that when we pdf_lookup_page_obj, we resolve the
object to get a borrowed ref (the xref reference) to the
required object.

Unfortunately, a lot of code that calls this assumes that
we'll get a borrowed reference to an indirection object that
points to the page object (i.e. they want a page ref, not the
page object).

To fix that, we move the array to be object references (as
loaded from the Page tree, so indirection objects) rather than
object numbers.

This should fix the problem where PyMuPDF's xref member for
the Page object was always returning 0.

2 years agoDo forward page map boundary check before using the index.
Sebastian Rasmussen [Tue, 28 Feb 2023 23:34:04 +0000 (00:34 +0100)]
Do forward page map boundary check before using the index.

Otherwise this results in a segmentation fault

This fixes oss-fuzz issue 56449.

2 years agoFix three division by zero problems.
Robin Watts [Tue, 28 Feb 2023 19:34:20 +0000 (19:34 +0000)]
Fix three division by zero problems.

Spotted by Krzysztof Kowalczyk of SumatraPDF fame. Many thanks!

2 years agoMute libjpeg messages.
Tor Andersson [Fri, 24 Feb 2023 12:47:16 +0000 (13:47 +0100)]
Mute libjpeg messages.