PHP package to read metadata and extract covers from eBooks (.epub
, .cbz
, .cbr
, .cb7
, .cbt
, .pdf
) and audiobooks (.mp3
, .m4a
, .m4b
, .flac
, .ogg
).
Supports Linux, macOS and Windows.
Note
This package favors eBooks in open formats such as
.epub
or.cbz
and which be parsed with native PHP, so for the best possible experience we recommend converting the eBooks you use. If you want to know more about eBook ecosystem, you can read documentation.
This package was built for bookshelves-project/bookshelves
, a web app to handle eBooks.
- PHP version >= 8.1
- PHP extensions:
Type | Supported | Requirement | Uses |
---|---|---|---|
.epub , .cbz |
✅ | N/A | N/A |
.cbt |
✅ | N/A | N/A |
.cbr |
✅ | rar PHP extension or p7zip binary |
PHP rar or p7zip |
.cb7 |
✅ | p7zip binary |
p7zip binary |
.pdf |
✅ | Optional (for extraction) imagick PHP extension |
smalot/pdfparser |
.mp3 , .m4a , .m4b , .flac , .ogg |
✅ | N/A | kiwilan/php-audio |
Warning
Works with
kiwilan/php-archive
, for some formats (.cbr
and.cb7
)rar
PHP extension orp7zip
binary could be necessary. Some guides to install these requirements are available onkiwilan/php-archive
.
- 🔎 Read metadata from eBooks and audiobooks
- 🖼️ Extract covers from eBooks and audiobooks
- 📚 Support metadata
- eBooks:
- Comics:
CBAM
(Comic Book Archive Metadata) :ComicInfo.xml
format from ComicRack and maintained byanansi-project
PDF
withsmalot/pdfparser
- Audiobooks:
ID3
,vorbis
andflac
tags withkiwilan/php-audio
- 🔖 Chapters extraction (
EPUB
only)
- Add
.mobi
,.azw
,.azw3
support - Add
.djvu
support - Add
.fb2
,.lrf
,.pdb
,.snb
support - Add
.epub
creation support - Add
.epub
metadata update support
You can install the package via composer:
composer require kiwilan/php-ebook
With eBook files (.epub
, .cbz
, .cba
, .cbr
, .cb7
, .cbt
, .pdf
) or audiobook files (mp3
, m4a
, m4b
, flac
, ogg
).
$ebook = Ebook::read('path/to/ebook.epub');
$ebook->path(); // string => path to ebook
$ebook->filename(); // string => filename of ebook
$ebook->extension(); // string => extension of ebook
$ebook->title(); // string
$ebook->authors(); // BookAuthor[] (`name`: string, `role`: string)
$ebook->authorMain(); // ?BookAuthor => First BookAuthor (`name`: string, `role`: string)
$ebook->description(); // ?string
$ebook->copyright(); // ?string
$ebook->publisher(); // ?string
$ebook->identifiers(); // BookIdentifier[] (`value`: string, `scheme`: string)
$ebook->publishDate(); // ?DateTime
$ebook->language(); // ?string
$ebook->tags(); // string[] => `subject` in EPUB, `keywords` in PDF, `genres` in CBA
$ebook->series(); // ?string => `calibre:series` in EPUB, `series` in CBA
$ebook->volume(); // ?int => `calibre:series_index` in EPUB, `number` in CBA
For pages count, you can use these methods:
$ebook->pagesCount(); // ?int => estimated pages count (250 words by page) in `EPUB`, `pageCount` in PDF, `pageCount` in CBA
$ebook->wordsCount(); // ?int => words count in `EPUB`
Note
For performance reasons, with
EPUB
,pagesCount
andwordsCount
are only available on demand. If you usevar_dump
to check eBook, these properties will benull
.
Some metadata can be stored into extras()
method, without typing, directly from metadata.
$ebook->extras(); // array<string, mixed> => additional data for book
$ebook->extra(string $key); // mixed => safely extract data from `extras` array
To get additional data, you can use these methods:
$ebook->metadata(); // ?EbookMetadata => metadata with parsers
$ebook->metaTitle(); // ?MetaTitle, with slug and sort properties for `title` and `series`
$ebook->format(); // ?EbookFormatEnum => `epub`, `pdf`, `cba`
$ebook->cover(); // ?EbookCover => cover of book
And to test if some data exists:
$ebook->isArchive(); // bool => `true` if `EPUB`, `CBA`
$ebook->isAudio(); // bool => `true` if `mp3`, `m4a`, `m4b`, `flac`, `ogg`
$ebook->hasMetadata(); // bool => `true` if metadata exists
$ebook->hasCover(); // bool => `true` if cover exists
Ebook::class
contains many informations but if you want to access to raw metadata, metadata()
method is available.
$ebook = Ebook::read('path/to/ebook.epub');
$metadata = $ebook->metadata();
$metadata->module(); // Used into parsing can be any of `EbookModule::class`
$metadata->epub(); // `EpubMetadata::class`
$metadata->pdf(); // `PdfMetadata::class`
$metadata->cba(); // `CbaMetadata::class`
$metadata->audiobook(); // `AudiobookMetadata::class`
$metadata->isEpub(); // bool
$metadata->isPdf(); // bool
$metadata->isCba(); // bool
$metadata->isAudiobook(); // bool
Can be set if book's title is not null.
$ebook = Ebook::read('path/to/ebook.epub');
$metaTitle = $ebook->metaTitle(); // ?MetaTitle
$metaTitle->slug(); // string => slugify title, like `the-clan-of-the-cave-bear`
$metaTitle->slugSort(); // string => slugify title without determiners, like `clan-of-the-cave-bear`
$metaTitle->slugLang(); // string => slugify title with language and type, like `the-clan-of-the-cave-bear-epub-en`
$metaTitle->serieSlug(); // ?string => slugify series title, like `earths-children`
$metaTitle->serieSort(); // ?string => slugify series title without determiners, like `earths-children`
$metaTitle->serieLang(); // ?string => slugify series title with language and type, like `earths-children-epub-en`
$metaTitle->slugSortWithSerie(); // string => slugify title with series title and volume, like `earths-children-01_clan-of-the-cave-bear`
$metaTitle->uniqueFilename(); // string => unique filename for storage, like `jean-m-auel-earths-children-01-clan-of-the-cave-bear-en-epub`
Cover can be extracted from ebook.
$ebook = Ebook::read('path/to/ebook.epub');
$cover = $ebook->cover(); // ?EbookCover
$cover->path(); // ?string => path to cover
$cover->content(bool $toBase64 = false); // ?string => content of cover, if `$toBase64` is true, return base64 encoded content
Note
- For
imagick
PHP extension.- For Audiobook, cover can be extracted with
mp3
but not with other formats.
With EPUB
, metadata are extracted from OPF
file, META-INF/container.xml
files, you could access to these metatada but you can also get chapters from NCX
file. And with chapters()
method you can merge NCX
and HTML
chapters to get full book chapters with label
, source
and content
.
$ebook = Ebook::read('path/to/ebook.epub');
$epub = $ebook->metadata()?->epub();
$epub->container(); // ?EpubContainer => {`opfPath`: ?string, `version`: ?string, `xml`: array}
$epub->opf(); // ?OpfMetadata => {`metadata`: array, `manifest`: array, `spine`: array, `guide`: array, `epubVersion`: ?int, `filename`: ?string, `dcTitle`: ?string, `dcCreators`: BookAuthor[], `dcContributors`: BookContributor[], `dcDescription`: ?string, `dcPublisher`: ?string, `dcIdentifiers`: BookIdentifier[], `dcDate`: ?DateTime, `dcSubject`: string[], `dcLanguage`: ?string, `dcRights`: array, `meta`: BookMeta[], `coverPath`: ?string, `contentFile`: string[]}
$epub->ncx(); // ?NcxMetadata => {`head`: NcxMetadataHead[]|null, `docTitle`: ?string, `navPoints`: NcxMetadataNavPoint[]|null, `version`: ?string, `lang`: ?string}
$epub->chapters(); // EpubChapter[] => {`label`: string, `source`: string, `content`: string}[]
$epub->html(); // EpubHtml[] => {`filename`: string, `head`: ?string, `body`: ?string}[]
$epub->files(); // string[] => all files in EPUB
Note
For performance reasons, with
ncx
,html
andchapters
are only available on demand. If you usevar_dump
to check metadata, these properties will benull
.
composer test
Please see CHANGELOG for more information on what has changed recently.
spatie
forspatie/package-skeleton-php
kiwilan
forkiwilan/php-archive
,kiwilan/php-audio
,kiwilan/php-xml-reader
The MIT License (MIT). Please see License File for more information.