Sebastian Rasmussen [Thu, 13 Mar 2025 17:23:49 +0000 (18:23 +0100)]
Clear the in-doc flag when removing a page from the opened page list.
When the last reference to a page is dropped, there is a check to see if it
belongs to a document. If it does not, the page object is freed immediately.
If the page belongs to a document, then freeing of the page object is delayed
until the last reference of the document is dropped.
Deleting a page requires syncing the document's list of opened pages.
If the page is no longer part of a document, it will be nuked from this list.
The assumption is that either there are no more references to the page, causing
the page to be freed immediately, or the caller holds all remaining references,
causing the page to be freed when the last of its references are dropped.
When a page is removed from the document, its in-document flag must be cleared.
Otherwise, when dropping the page reference from the list, the page object will
not be freed, even if this is the last reference to the page. Instead, since the
in-document flag is set, the page is assuming that the document owning it will
free the page object, but this will never happen, leading to a leak.
Tor Andersson [Tue, 14 Jan 2025 15:17:46 +0000 (16:17 +0100)]
Sync open page numbers after undo has swapped the xrefs, not before!!!
Tor Andersson [Wed, 5 Mar 2025 12:54:07 +0000 (13:54 +0100)]
Fix const warning.
Sebastian Rasmussen [Wed, 5 Mar 2025 21:57:21 +0000 (22:57 +0100)]
Change initializer to one not causing clang warnings in release build.
Sebastian Rasmussen [Wed, 5 Mar 2025 21:55:41 +0000 (22:55 +0100)]
Remove unused variables revealed by a release build using clang.
Jamie Lemon [Thu, 6 Mar 2025 12:42:59 +0000 (12:42 +0000)]
Documentation: Adds note in Multi-threading section.
Sebastian Rasmussen [Tue, 4 Mar 2025 14:42:34 +0000 (15:42 +0100)]
Consistently use uint32_t for color in stext device.
This fixes Coverity issue 430581.
Sebastian Rasmussen [Tue, 4 Mar 2025 14:57:15 +0000 (15:57 +0100)]
Check whether the argument list is NULL, when argument are said to exist.
Sebastian Rasmussen [Tue, 4 Mar 2025 12:59:07 +0000 (13:59 +0100)]
Check whether opts is NULL when cleaning a PDF file.
This resolves Coverity issue 486335 where a stray NULL-check
points to all later uses of opts as locations where NULL may
be dereferenced. The fix is to use default options if options
are passed as NULL.
Sebastian Rasmussen [Fri, 28 Feb 2025 09:30:32 +0000 (10:30 +0100)]
Bug 708304: Handle PDF objects numbered outside xref range.
The PDF audit tool would trigger an ASAN warning when encountering a
PDF object outside of the xref range, since the audit tool allocates
internal state based of the number of objects specified by the xref.
Later on when encountering an object that resides outside of that
range, the audit tool did no bounds checking.
The fix is to add bounds checking to avoid accessing outside the
internal state.
Sebastian Rasmussen [Fri, 28 Feb 2025 09:29:49 +0000 (10:29 +0100)]
Report error in audit tool, otherwise it counts as unhandled.
Sebastian Rasmussen [Fri, 28 Feb 2025 06:58:47 +0000 (07:58 +0100)]
Fix typo in LZW compressed inline image dicitonary.
Tor Andersson [Fri, 28 Feb 2025 13:41:02 +0000 (14:41 +0100)]
Remove unused variables.
Robin Watts [Thu, 27 Feb 2025 17:09:22 +0000 (17:09 +0000)]
Update gumbo-parser with some warning fixes.
Tor Andersson [Thu, 27 Feb 2025 13:23:38 +0000 (14:23 +0100)]
Dance around const (to silence warnings about freeing const pointer).
Sebastian Rasmussen [Fri, 21 Feb 2025 10:04:00 +0000 (11:04 +0100)]
Remove redundant memset.
Sebastian Rasmussen [Fri, 21 Feb 2025 01:51:06 +0000 (02:51 +0100)]
Bug 708293: Allocate xml root node in pool.
Because of the recent commit that uses flexible array members for
the xml tag name the root node no longer contains a 1 byte '\0' to
indicate the empty string for the root node tag name.
Yet, if close_tag() is called after the last tag has been closed,
then the root tag name will be accessed, causing an ASAN complaint.
A simple fix is to allocate the root node of the correct size including
a trailing '\0' for the tag name. Instead of allocating this on the
stack, we now allocate it in the parser pool, which means it will be
dropped whenever the parser and its pool is dropped.
Sebastian Rasmussen [Thu, 20 Feb 2025 10:06:20 +0000 (11:06 +0100)]
Ensure that cfb archive entry names are null-terminated.
Previously open_cfb_archive_with_stream() depended on that input data
from the file contained a null-terminator for the UTF-8 decoded string
in its internal archive state to be null-terminated.
The parser should always null-terminate its own strings, and not depend
on input data to be well-formed.
This fixes oss-fuzz issue
391935579.
Tor Andersson [Wed, 19 Feb 2025 13:24:16 +0000 (14:24 +0100)]
Change capitalization in mutool usage to be consistent.
Robin Watts [Tue, 18 Feb 2025 12:46:22 +0000 (12:46 +0000)]
PDF saving: Perform a pre-pass to load objects before saving.
With some files, we can trigger a repair during the writing out
of objects (seen with the test file from bug 708289).
We'd rather the repair happened BEFORE we started writing objects.
So perform a pre-pass at the start of the save operation to cache
objects. We would have cached all these objects at some point during
the save operation, so it's not really costing us anything extra.
Sebastian Rasmussen [Tue, 18 Feb 2025 16:24:39 +0000 (17:24 +0100)]
Do not create bad write options if encrypt option was set to unknown value.
Julian Smith [Mon, 17 Feb 2025 17:20:25 +0000 (17:20 +0000)]
scripts/wrap/cpp.py: improve Director exception handling.
Previous conversion of C++ exception to Fitz exception leaked the
std::exception instance. Thanks to Sebastian for spotting this problem in a
review of some unrelated C++ code.
Julian Smith [Mon, 17 Feb 2025 11:05:25 +0000 (11:05 +0000)]
include/mupdf/fitz/geometry.h: fix matrix inversion comments.
Robin Watts [Tue, 18 Feb 2025 13:03:20 +0000 (13:03 +0000)]
Fix access after free.
For pages that are in the document, page can be freed by
the call to fz_drop_document, meaning we access page->in_doc
illegally.
The fix is just to test this case first, as we know that
fz_drop_document won't access page in this case.
Robin Watts [Mon, 17 Feb 2025 10:07:47 +0000 (10:07 +0000)]
Free unopened pages instead of waiting for document to reap them.
fz_drop_page() decrements a page's reference count, if it reaches
zero, the page is marked dead (by setting its document pointer to
NULL) and left to be freed when the document is freed.
fz_drop_document() iterates over its list of opened pages, and
any pages marked dead (by their document pointer being NULL) are
freed.
fz_load_chapter_page() calls a document type-specific callback to
load a page, before adding the loaded page to the document's list
of opened pages.
The document type-specific callback for PDF, pdf_load_page_imp(),
allocates a pdf_page and then proceeds to load the page's
annotations. This page will only be added to the document's list
of opened pages when pdf_load_page_imp() has finished.
If an exception is thrown while processing the page's
annotations, the page object is dropped.
In this case fz_drop_page() decrements the page's reference count
to zero, assuming that the page will be dropped when its document
is freed, the page is marked dead as usual. However, since the
page was never added to any document, the page will never be
freed, leading to a memory leak.
The fix is for fz_drop_page() to be able to identify pages that
have never been added to their document's list of opened pages,
and it those case, instead of deferring to fz_drop_document() to
reap the pages, just free the pages immediately.
(Problem and solution found by Sebastian!)
Sebastian Rasmussen [Sun, 16 Feb 2025 22:37:06 +0000 (23:37 +0100)]
Avoid double drop of fz_html_tree upon exception in xml_to_boxes().
Upon exception xml_to_boxes() drops the fz_html_tree passed to it.
Consider the case where write_rich_content() calls fz_new_story()
to create a fz_story (which contains fz_html_tree). Upon
exception write_rich_content() drops the fz_story, but it also
calls fz_place_story() which ends up in convert_to_boxes(), which
calls xml_to_boxes(). If an exception is thrown here, then first
xml_to_boxes() will drop the fz_story's fz_html_tree, and later
write_rich_content() will also drop the same fz_story and its
fz_html_tree .
The only other function that calls xml_to_boxes() is
fz_parse_html_tree() which is only called by fz_parse_html()
which upon exception also drops the fz_html_tree is has created.
The conclusion is that the functions, write_rich_content() and
fz_parse_html_tree(), retain ownership of the fz_html_tree they
created, and consequently drop their fz_html_tree's upon
exception, while xml_to_boxes() erroneously assumes that it takes
ownership of the fz_html_tree passed to it.
The fix is to remove the drop from xml_to_boxes().
This fixes oss-fuzz issue
396958483.
Robin Watts [Fri, 14 Feb 2025 19:06:45 +0000 (19:06 +0000)]
Fix Makefiles to test/alter CFLAGS, not XCFLAGS.
XCFLAGS are supposed to be sent in by the user, and as such
are probably not mutable at this point.
Sebastian Rasmussen [Tue, 11 Feb 2025 18:06:05 +0000 (19:06 +0100)]
Add support archive script to create commercial tarballs.
Tor Andersson [Fri, 31 Jan 2025 15:08:20 +0000 (16:08 +0100)]
Silence warnings.
Julian Smith [Tue, 4 Feb 2025 14:20:13 +0000 (14:20 +0000)]
scripts/wrap/: Use `explicit` in some C++ constructors to avoid unsafe usage.
scripts/wrap/cpp.py
Use `explicit` for wrapper class constructors that take a pointer to a
refcounted C struct. This avoids silent generation of code that can cause a
drop on destruction without corresponding keep on construction, which can
cause segv.
scripts/wrap/swig.py
page_merge_helper(): avoid implicit construction of PdfObj from pdf_obj*.
scripts/wrap/__main__.py:
Removed -l from mutool.py command because linearization support was
removed.
Julian Smith [Tue, 11 Feb 2025 21:37:55 +0000 (21:37 +0000)]
scripts/wrap/swig.py: removed ll_fz_pixmap_copy().
Was incorrect, and is not required by latest pymupdf.
Sebastian Rasmussen [Mon, 10 Feb 2025 16:36:26 +0000 (17:36 +0100)]
docs: Remove pdf-trace.js callback arguments that are not passed.
Robin Watts [Thu, 6 Feb 2025 13:06:55 +0000 (13:06 +0000)]
Bug 708157: Fix redaction problem with form transforms.
When filtering a page, we were applying the form transform in
the wrong order. In cases with no translation, this didn't
matter, but when translations are present, this caused a shift
in the area that was actually redacted.
Robin Watts [Wed, 5 Feb 2025 19:48:35 +0000 (19:48 +0000)]
Bug 708170: Use ULL rather than Ui64 in windows specific time funcs
This works with both MSVC and Mingw.
Robin Watts [Wed, 5 Feb 2025 19:42:07 +0000 (19:42 +0000)]
Bug 708176: Cope with "undersized" cross-reference streams.
Cross reference streams contain a 'Size' entry. This is supposed
to be 1 larger than the largest object 'used' in the file.
The example given is not counting the cross reference stream
itself when calculating this value.
Fix our code to tolerate this.
Julian Smith [Mon, 3 Feb 2025 17:19:47 +0000 (17:19 +0000)]
scripts/wrap/ docs/: added swig-friendly wrapper for pdf_set_annot_callout_line().
Sebastian Rasmussen [Mon, 3 Feb 2025 01:28:28 +0000 (02:28 +0100)]
win32: Let Windows handle unhandled ALT-key combinations.
Since commit
b7131bef3a57b369fe6e1a7d54dc5d171f15712c by
accident MuPDF for Windows has ignored ALT-F4 instead of
passing that key combination to the default message handler.
Passing the key combination to that handler will allow
Windows to emit WM_CLOSE which will close the program.
Sebastian Rasmussen [Thu, 23 Jan 2025 23:02:13 +0000 (00:02 +0100)]
Fix bug ignoring last entry in UAX 14 line-breaking table.
Robin Watts [Thu, 30 Jan 2025 15:18:35 +0000 (15:18 +0000)]
Bug 708274 followup: Fix thinko in previous fix commit.
In the case where we detect a fixup is necessary, we've already
overwritten the fy1 value, so we can't recalculate from it.
Simplify the code in both x and y so that we just add 1 to the upper
bound if required.
Thanks to Julian for helpful comments here.
Robin Watts [Tue, 28 Jan 2025 15:50:08 +0000 (15:50 +0000)]
Bug 708274: Tweak antidropout code in the non-AA rasterizer.
In all the normal paths through this code, we apply floorf() to
the coords given to convert to int. Just in the rectangle
case (the specific case that is only used for anti-dropout)
we are currently using a mix of floorf() and ceilf().
This is causing the problem referenced in the bug.
The reason for using floorf and ceilf together is to avoid the
case where the vertical extent becomes 0 - but this is not
actually a problem in this case.
So, we amend the code to use just floorf initially, and we only
resort to using ceilf() if the extent would otherwise be 0.
Thus we still get anti-dropout, but don't affect cases where we
don't need it.
Robin Watts [Wed, 5 Feb 2025 18:21:53 +0000 (18:21 +0000)]
Bug 708254: Fix issue in Fax decoder.
It looks like the handling for "EncodedByteAlign && EndOfLine"
is wrong. Changing it doesn't affect any files in the cluster,
so we move to match gs by doing nothing in this case.
Tor Andersson [Tue, 14 Jan 2025 15:15:48 +0000 (16:15 +0100)]
Allow pdf_lookup_page_number_slow on deleted pages.
If a page cannot be found in the page tree, return -1 instead of
throwing an error.
This is needed if we disable the fast case for page number lookups
when syncing open pages (where we need to recompute the page number
for currently held pdf_page objects, even those that have been deleted
and removed from the page tree).
Sebastian Rasmussen [Sat, 25 Jan 2025 21:11:09 +0000 (22:11 +0100)]
Bump patch release.
Tor Andersson [Thu, 23 Jan 2025 14:21:24 +0000 (15:21 +0100)]
Fix type cast comparisons in fz_atoz.
signed/unsigned comparison warning.
Tor Andersson [Wed, 22 Jan 2025 13:52:36 +0000 (14:52 +0100)]
Add common Noto font name lookup function.
Robin Watts [Wed, 22 Jan 2025 11:33:13 +0000 (11:33 +0000)]
Bug 708266: Improve font Ascent/Descent handling.
In a recent commit (
d0b843e675a6e1f77207f525413f8a0e14bcc4b1),
we changed to use the Ascent/Descent values from the font
descriptor in preference to those from the font itself.
This has produced some issues, because different PDF producers
sometimes put this value as positive, despite it clearly
being specified in the PDF 1.7 spec that it should be negative.
This has in fact been clarified in the 2.0 spec.
Accordingly, if we use this value, and we find a positive
one, we negate it upon use.
Also, we were looking at fontdesc->ascent or descent == 0 to
mean "not present" (it's a compulsory field, it really should
be present!). So rejig to allow for this.
Robin Watts [Tue, 21 Jan 2025 09:09:49 +0000 (09:09 +0000)]
Java bindings: All fz_store_size to be customised.
If FZ_JAVA_STORE_SIZE is predefined at build time, then this
will override the default store size.
Also, if the environment variable FZ_JAVA_STORE_SIZE is defined
at startup, that value will be used instead.
Original version of this patch from Max Kammerer.
Robin Watts [Tue, 21 Jan 2025 11:54:49 +0000 (11:54 +0000)]
Add fz_atoz.
Simple routine to read size_t's from integers.
Limitation: Any unsigned number with bit 63 set will be
read as 0. I think we can live with that!
Tor Andersson [Tue, 14 Jan 2025 15:17:24 +0000 (16:17 +0100)]
Add missing fz_report_error.
Julian Smith [Thu, 9 Jan 2025 22:26:47 +0000 (22:26 +0000)]
scripts/wrap/cpp.py: use virtual destructor in all SWIG Director classes.
Fixes C++ compile warnings and errors from latest valgrind.
Tor Andersson [Tue, 7 Jan 2025 19:03:41 +0000 (20:03 +0100)]
Silence coverity issue.
Tor Andersson [Tue, 7 Jan 2025 18:50:48 +0000 (19:50 +0100)]
Add PyMuPDF temporary build files to .gitignore.
Tor Andersson [Tue, 7 Jan 2025 18:50:35 +0000 (19:50 +0100)]
Bug 708118: Add and use convenience function for loading user CSS.
Robin Watts [Wed, 8 Jan 2025 13:04:17 +0000 (13:04 +0000)]
Fix valgrind error seen with saving pdfs with garbage collection.
This was exposed by the recent commit to improve garbage collection.
When we call renumberobjs, previously this might have shortened the
lists without updating the record of how long the lists are.
Now we use expand_lists so that all the lists stay the same length,
never get shorter, and the record is always updated. This also keeps
the +3 nastiness in just one place.
Sebastian Rasmussen [Fri, 22 Nov 2024 18:05:59 +0000 (19:05 +0100)]
Process both widgets and annotations when rewriting images.
Robin Watts [Mon, 6 Jan 2025 18:43:52 +0000 (18:43 +0000)]
Tweak vector handling in page segmentation.
Allow a 1 point margin around vectors when considering usage for
page segmentation.
Without this we can occasionally see hairline gaps between adjacent
table columns due to the vagaries of floating point.
Tor Andersson [Tue, 7 Jan 2025 15:11:04 +0000 (16:11 +0100)]
Only include latest object versions when gathering object streams.
See bug 708035 for an example file that triggers this error when
deleting page 1 and saving with object streams.
Tor Andersson [Thu, 21 Nov 2024 16:14:58 +0000 (17:14 +0100)]
Bug 708144: Add PDF_NAME(AFRelationship).
This property is defined in PDF reference 2.0.
Provide it for convenience of library users, and possible future use by
mupdf itself.
Robin Watts [Tue, 24 Dec 2024 12:36:59 +0000 (12:36 +0000)]
Bug 708210: Minimise size of softmasks before rendering.
Robin Watts [Mon, 23 Dec 2024 00:07:41 +0000 (00:07 +0000)]
Bug 708211: Mutool clean produces object 0 with invalid gen num.
The spec says it should always be 65535, but we're updating it...
Robin Watts [Fri, 15 Nov 2024 18:13:53 +0000 (18:13 +0000)]
Fix bbox calculation in segmentation.
Julian Smith [Tue, 3 Dec 2024 10:29:08 +0000 (10:29 +0000)]
Improved tesseract exceptions.
fz_new_pdfocr_band_writer(): propogate original exception instead of
creating a new one.
ocr_init(): use different exception messages for different errors.
Sebastian Rasmussen [Mon, 2 Dec 2024 15:35:57 +0000 (16:35 +0100)]
Bug 708171: When deleting widget fields, compare the objects, not their pointers.
Robin Watts [Sat, 23 Nov 2024 17:29:18 +0000 (17:29 +0000)]
Fix for JBIG2 data not having the correct filter attached.
Credit to Sebastian for spotting this.
Sebastian Rasmussen [Wed, 20 Nov 2024 02:24:07 +0000 (03:24 +0100)]
Bug 708114: Only accept encryption dictionaries if within xref.
Previously the encryption dictionary reference might be beyond
the xref triggering ASAN.
Sebastian Rasmussen [Thu, 23 Jan 2025 00:41:37 +0000 (01:41 +0100)]
Bump patch release.
Sebastian Rasmussen [Wed, 22 Jan 2025 17:25:20 +0000 (18:25 +0100)]
Add PDF_CLEAN_STRUCTURE_* documentation.
Julian Smith [Wed, 22 Jan 2025 16:48:07 +0000 (16:48 +0000)]
scripts/wrap/cpp.py: update pdf_rearrange_pages2() with new `structure` arg.
Sebastian Rasmussen [Wed, 22 Jan 2025 14:09:17 +0000 (15:09 +0100)]
jni: Add new final argument to updated pdf_rearrange_pages call.
Robin Watts [Tue, 14 Jan 2025 12:49:43 +0000 (12:49 +0000)]
pdf_rearrange_pages: option for keeping/dropping structure.
Add a new option to pdf_clean_options, and a new argument to
pdf_rearrange_pages. This controls whether structure trees
are kept or dropped during page rearrangements. By default,
structure trees are dropped.
Robin Watts [Tue, 10 Dec 2024 13:09:59 +0000 (13:09 +0000)]
Bump patch release.
Robin Watts [Thu, 5 Dec 2024 19:17:29 +0000 (19:17 +0000)]
Bug 708128: Fix sanitisation of clipping paths.
The PDF operator stream in this file does:
W <path> n /Image Do
The filter code therefore currently tries to deal with the 'W'
without a path being defined. Currently it bounds it, finds it
is an empty rect, and therefore suppresses the image.
According to the PDF spec though, W and W* are not supposed to
look at the path at the point they are called. Instead they
merely set it up so that a side effect of the next 'path painting
operator' (in this case 'n') is to intersect the clip path
AFTER than operator has taken place.
This commit therefore changes the behaviour of the filter to
follow this. All we do when we get a 'W' or a 'W*' is to set a
new clip_op field. Whenever we then process a path painting
operator, we look at this field, and process W or W* as
appropriate.
We are at pains to forward the W or W* operator AFTER the path
has been processed, but before the path painting operator
is sent. This should keep bugged renderers such as Preview
happy.
The has meant that the culler handling has changed slightly
in that the culler now doesn't deal with CLIP paths separately
to FILL or STROKE paths, but rather now as a combination (e.g.
CLIP_FILL_PATH). Trying to deal differently with CLIP and FILL
paths (for example) would probably have gone wrong in the past
so there is no real downside here.
Julian Smith [Mon, 9 Dec 2024 16:44:43 +0000 (16:44 +0000)]
scripts/wrap/cpp.py: fix leak in C++ wrappers of fz_convert_pixmap().
fz_convert_pixmap() returns a kept reference, but the C++ wrappers
did not know this so did an additional keep, which results in a leak.
The fix is to add name 'convert' in function_name_implies_kept_references().
Robin Watts [Wed, 4 Dec 2024 20:40:34 +0000 (20:40 +0000)]
SText: Add bits to char flags word indicating filled/stroked/clipped.
This enables people to spot 'invisible' text.
Robin Watts [Thu, 5 Dec 2024 16:20:53 +0000 (16:20 +0000)]
Bump patch release.
Tor Andersson [Thu, 28 Nov 2024 22:53:12 +0000 (23:53 +0100)]
Fix error in stext-to-html color conversion!
Sebastian Rasmussen [Fri, 29 Nov 2024 04:00:27 +0000 (05:00 +0100)]
Bump patch release.
Robin Watts [Wed, 20 Nov 2024 16:40:55 +0000 (16:40 +0000)]
Fix double dropping of fz_stream in document handlers.
This only affects document handlers that insist on file backed
streams (i.e. sodochandler, currently).
If we ask to convert a stream to be file backed, we may get
the same stream back instantly (indicating that it is already
file backed). Do NOT drop it in this case.
Sebastian Rasmussen [Fri, 29 Mar 2024 19:30:52 +0000 (03:30 +0800)]
Bug 707704: Represent JBIG2 segment length by uint_32 as per spec.
Previously the segment length was represented by a signed integer,
so when pdf_parse_jbig2_segment_header() parsed the segment length
the value read from the file could be interpreted as negative. Later
on in pdf_copy_jbig2_segments() a data pointer is incremented by
the segment length, which could lead to an out of bounds pointer and
a subsequent crash.
This fix is to represent the parsed value as an 32-bit unsigned integer
as mentioned in the specification.
Tor Andersson [Tue, 19 Nov 2024 12:15:07 +0000 (13:15 +0100)]
Close output before dropping in fz_office_to_html.
Robin Watts [Fri, 15 Nov 2024 15:09:21 +0000 (15:09 +0000)]
Fix SText device when creating structure from tags.
I had failed to update the code that calculates bboxes
and does bidirectional ordering in the presence of
structure tags.
Sebastian Rasmussen [Sat, 16 Nov 2024 12:12:47 +0000 (13:12 +0100)]
jni: Fix bug where return value from search was of the wrong type.
Document.search(), Page.search(), and DisplayList.search() all suffered
from the same bug where they are declared to return Quad[][] but the
last call in the JNI wrappers called ArrayList.toArray() which returns
Object[].
Desktop java accepted type of casting without any compilation or runtime
complaints. Compiling for Android worked equally well, but searching in
the app, the app crashed at runtime with:
JNI DETECTED ERROR IN APPLICATION: attempt to return an instance of
java.lang.Object[] from fitz.Quad[][] fitz.Page.search(java.lang.String).
The fix for this problem is to at the end of the JNI wrappers call
<T> T[] ArrayList.toArray(T[]). But even with this code change the app
still crashes at runtime with:
JNI DETECTED ERROR IN APPLICATION: the return type of CallVoidMethod
does not match boolean java.util.ArrayList.add(java.lang.Object)
Which was resolved by calling the function using CallBooleanMethod and
handling the return value suitably.
With these two changes searching now works well in both desktop Java
and Android.
Julian Smith [Fri, 15 Nov 2024 15:19:31 +0000 (15:19 +0000)]
scripts/wrap/: Allow fix of broken auto-dependencies in PyMuPDF builds.
PyMuPDF builds use build directories containing `Py_LIMITED_API=*` which
results in auto-dependency files containing things like:
build/PyMuPDF-amd64-shared-tesseract-Py_LIMITED_API=0x030a0000-release/source/fitz/context.o: foo.h
Unfortunately the `=` breaks thngs because it causes make to treat this
as setting variable `build/PyMuPDF-amd64-shared-tesseract-Py_LIMITED_API`
to `0x030a0000-release/source/fitz/context.o: foo.h`, not as a `<target>:
<prerequisites>` rule.
PyMuPDF will be changed to use build directories without the `=` such as
build/PyMuPDF-amd64-shared-tesseract-Py_LIMITED_API_0x030a0000-release/, which
we now accept.
Sebastian Rasmussen [Fri, 15 Nov 2024 22:57:37 +0000 (23:57 +0100)]
docs: Update list of important changes for 1.25.0.
Tor Andersson [Fri, 15 Nov 2024 13:19:32 +0000 (14:19 +0100)]
Update CHANGES with API change to structured text color/argb field.
Tor Andersson [Mon, 4 Nov 2024 16:24:37 +0000 (17:24 +0100)]
mutool run: Add insertEmbeddedFile and deleteEmbeddedFile functions.
Sebastian Rasmussen [Thu, 14 Nov 2024 14:28:30 +0000 (15:28 +0100)]
ttf: Change unused variable usage so that clang does not warn.
Robin Watts [Tue, 29 Oct 2024 14:49:09 +0000 (14:49 +0000)]
Move stext 'color' to be 'argb'.
Move vectors into line with chars. We now return a consistent
representation.
'color' has been deliberately changed to 'argb' as a) it's
clearer, and b) it forces people to update their code and not
suddenly see failures with alpha being unexpectedly in the top
8 bits.
Sebastian Rasmussen [Thu, 14 Nov 2024 06:56:28 +0000 (07:56 +0100)]
cff: Check boundaries of values parsed out of FDSelect.
This fixes Coverity issue 430251.
Sebastian Rasmussen [Wed, 13 Nov 2024 23:02:50 +0000 (00:02 +0100)]
cff: Increase charstring stack so we can handle mislablelled CFF2 fonts.
Sebastian Rasmussen [Wed, 13 Nov 2024 13:36:24 +0000 (14:36 +0100)]
tff: The number of glyphs is a 16-bit unsigned integer.
Sebastian Rasmussen [Mon, 11 Nov 2024 04:18:47 +0000 (05:18 +0100)]
ttf: Add additional unicode cmaps.
2201_-_transparent_image_covers_background.pdf object 68 0 R has
only a single (0,1) cmap using format 4. Now we can cope with that!
Sebastian Rasmussen [Sat, 9 Nov 2024 13:11:24 +0000 (14:11 +0100)]
ttf: Handle 3,0 cmaps for symbolic fonts.
Both 1487_-_right_margin_cut_off_when_printing.pdf font objects
7 0 R and 14 0 R and PDFIA1.7_SUBSET/CATX1641.pdf font objects
93 0 R, 99 0 R and 105 0 R contain symbolic fonts that all contain
a single (3,0) cmap, while MuPDF previously only accepted (1,0)
cmaps.
The fix is to simply add a check for (3,0) cmaps and these two
files can now be subset correctly.
Sebastian Rasmussen [Sun, 10 Nov 2024 23:37:42 +0000 (00:37 +0100)]
ttf: Avoid out of bounds access of GID renumbering array when subsetting hmtx table.
Sebastian Rasmussen [Fri, 8 Nov 2024 15:33:33 +0000 (16:33 +0100)]
ttf: Remember to update second half of hmtx table.
A hmtx table consists of two parts:
* an array of entries with both advance widths and left bearing values
* an array of entries with only left bearing values
Combined, both arrays are meant to provide horizontal metrics for all
the glyphs. The number of glyphs is parsed out from the font file's
maxp table.
The length of the first array is parsed out from the font file's hhea
table. The length of the second array is computed by taking the number of
glyphs and subtracting the number of entries in the first array.
Previously, the first array length, stored in orig_num_long_hor_metrics
and was capped by the hmtx table size, while the second array length,
stored in max16, was capped both by the hmtx table size and the number of
glyphs. This meant that max16 could end up being a lower value than
orig_num_log_hor_metrics, in which case the second array entries were
never updated.
Three changes are provided in this commit:
* Rename array lengths in the code to long_metrics and short_metrics
respectively.
* Use separate counter to count the number of entries in the first and
the second arrays.
* Make sure to cap both array lengths by the number of glyphs in the
font.
Sebastian Rasmussen [Tue, 5 Nov 2024 13:57:33 +0000 (14:57 +0100)]
ttf: Correct offset bugs in the loca table subsetting.
The offsets in the loca table must be increasing, all offsets
must be filled in, and the length of the glyph data is the
difference between an offset and the next offset.
The previous code didn't fill in the offsets of unused glyphs
between used glyphs' offsets, this caused complaints when opening
the font in freetype or fontforge.
In addition the length computed for one glyph's data was not
applied to the remapped glyph id, but to the glyph id after that
one, causing the glyph appearance to be applied to another random
glyph.
Sebastian Rasmussen [Sun, 3 Nov 2024 03:33:46 +0000 (04:33 +0100)]
ttf: Correct format of post table.
Sebastian Rasmussen [Sat, 2 Nov 2024 02:10:27 +0000 (03:10 +0100)]
ttf: When doing bounds checking for cmap4 size, account for table offset.
Sebastian Rasmussen [Sat, 2 Nov 2024 02:09:30 +0000 (03:09 +0100)]
ttf: Zero out unused cmap entries, not just the first gid in the cmap.
Sebastian Rasmussen [Mon, 14 Oct 2024 01:01:58 +0000 (03:01 +0200)]
ttf: Fix computation of searchRange in TTF header.
searchRange is 16 * (max pow2 <= to NumTables).