0% found this document useful (0 votes)
6 views

Ext4 Features

The document outlines the features and benefits of the Ext4 filesystem, including compatibility with Ext3, increased filesystem and file size limits, and improved performance through features like extents, multiblock allocation, and delayed allocation. It also discusses additional enhancements such as fast fsck, journal checksumming, and online defragmentation. Finally, it provides guidance on how to create, migrate, or mount existing Ext3 filesystems as Ext4 without losing data.

Uploaded by

nideham547
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Ext4 Features

The document outlines the features and benefits of the Ext4 filesystem, including compatibility with Ext3, increased filesystem and file size limits, and improved performance through features like extents, multiblock allocation, and delayed allocation. It also discusses additional enhancements such as fast fsck, journal checksumming, and online defragmentation. Finally, it provides guidance on how to create, migrate, or mount existing Ext3 filesystems as Ext4 without losing data.

Uploaded by

nideham547
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 9

2.

EXT4 features

2.1. Compatibility
Any existing Ext3 filesystem can be migrated to Ext4 with
an easy procedure which consists in running a couple of
commands in read-only mode (described in the next
section). This means that you can improve the performance,
storage limits and features of your current filesystems
without reformatting and/or reinstalling your OS and
software environment. If you need the advantages of Ext4
on a production system, you can upgrade the filesystem.
The procedure is safe and doesn't risk your data (obviously,
backup of critical data is recommended, even if you aren't
updating your filesystem :). Ext4 will use the new data
structures only on new data, the old structures will remain
untouched and it will be possible to read/modify them when
needed. This means, that, of course, that once you convert
your filesystem to Ext4 you won't be able to go back to Ext3
again (although there's a possibility, described in the next
section, of mounting a Ext3 filesystem with Ext4 without
using the new disk format and you'll be able to mount it
with Ext3 again, but you lose many of the advantages of
Ext4).

2.2. Bigger filesystem/file sizes


Currently, Ext3 support 16 TB of maximum filesystem size,
and 2 TB of maximum file size. Ext4 adds 48-bit block
addressing, so it will have 1 EB of maximum filesystem size
and 16 TB of maximum file size. 1 EB = 1,048,576 TB (1 EB
= 1024 PB, 1 PB = 1024 TB, 1 TB = 1024 GB). Why 48-bit
and not 64-bit? There are some limitations that would need
to be fixed before making Ext4 fully 64-bit capable, which
have not been addressed in Ext4. The Ext4 data structures
have been designed keeping this in mind, so a future update
to Ext4 will implement full 64-bit support at some point. 1
EB will be enough (really until that happens. (Note: The
code to create filesystems bigger than 16 TB is -at the time
of writing this article- not in any stable release of e2fsprogs.
It will be in future releases.)

2.3. Sub directory scalability


Right now the maximum possible number of sub directories
contained in a single directory in Ext3 is 32000. Ext4 breaks
that limit and allows a unlimited number of sub directories.

2.4. Extents
The traditionally Unix-derived filesystems like Ext3 use a
indirect block mapping scheme to keep track of each block
used for the blocks corresponding to the data of a file. This
is inefficient for large files, specially on large file delete and
truncate operations, because the mapping keeps a entry for
every single block, and big files have many blocks -> huge
mappings, slow to handle. Modern filesystems use a
different approach called "extents". An extent is basically a
bunch of contiguous physical blocks. It basically says "The
data is in the next n blocks". For example, a 100 MB file can
be allocated into a single extent of that size, instead of
needing to create the indirect mapping for 25600 blocks (4
KB per block). Huge files are split in several extents.
Extents improve the performance and also help to reduce
the fragmentation, since an extent encourages continuous
layouts on the disk.

2.5. Multiblock allocation


When Ext3 needs to write new data to the disk, there's a
block allocator that decides which free blocks will be used
to write the data. But the Ext3 block allocator only allocates
one block (4KB) at a time. That means that if the system
needs to write the 100 MB data mentioned in the previous
point, it will need to call the block allocator 25600 times
(and it was just 100 MB!). Not only this is inefficient, it
doesn't allow the block allocator to optimize the allocation
policy because it doesn't knows how many total data is
being allocated, it only knows about a single block. Ext4
uses a "multiblock allocator" (mballoc) which allocates
many blocks in a single call, instead of a single block per
call, avoiding a lot of overhead. This improves the
performance, and it's particularly useful with delayed
allocation and extents. This feature doesn't affect the disk
format. Also, note that the Ext4 block/inode allocator has
other improvements, described in detail in this paper.

2.6. Delayed allocation


Delayed allocation is a performance feature (it doesn't
change the disk format) found in a few modern filesystems
such as XFS, ZFS, btrfs or Reiser 4, and it consists in
delaying the allocation of blocks as much as possible,
contrary to what traditionally filesystems (such as Ext3,
reiser3, etc) do: allocate the blocks as soon as possible. For
example, if a process write()s, the filesystem code will
allocate immediately the blocks where the data will be
placed - even if the data is not being written right now to
the disk and it's going to be kept in the cache for some time.
This approach has disadvantages. For example when a
process is writing continually to a file that grows,
successive write()s allocate blocks for the data, but they
don't know if the file will keep growing. Delayed allocation,
on the other hand, does not allocate the blocks immediately
when the process write()s, rather, it delays the allocation of
the blocks while the file is kept in cache, until it is really
going to be written to the disk. This gives the block
allocator the opportunity to optimize the allocation in
situations where the old system couldn't. Delayed allocation
plays very nicely with the two previous features mentioned,
extents and multiblock allocation, because in many
workloads when the file is written finally to the disk it will
be allocated in extents whose block allocation is done with
the mballoc allocator. The performance is much better, and
the fragmentation is much improved in some workloads.

2.7. Fast fsck


Fsck is a very slow operation, especially the first step:
checking all the inodes in the file system. In Ext4, at the end
of each group's inode table will be stored a list of unused
inodes (with a checksum, for safety), so fsck will not check
those inodes. The result is that total fsck time improves
from 2 to 20 times, depending on the number of used inodes
(http://kerneltrap.org/Linux/Improving_fsck_Speeds_in_Ext4
). It must be noticed that it's fsck, and not Ext4, who will
build the list of unused inodes. This means that you must
run fsck to get the list of unused inodes built, and only the
next fsck run will be faster (you need to pass a fsck in order
to convert a Ext3 filesystem to Ext4 anyway). There's also a
feature that takes part in this fsck speed up - "flexible block
groups" - that also speeds up filesystem operations.

2.8. Journal checksumming


The journal is the most used part of the disk, making the
blocks that form part of it more prone to hardware failure.
And recovering from a corrupted journal can lead to
massive corruption. Ext4 checksums the journal data to
know if the journal blocks are failing or corrupted. But
journal checksumming has a bonus: it allows one to convert
the two-phase commit system of Ext3's journaling to a
single phase, speeding the filesystem operation up to 20%
in some cases - so reliability and performance are improved
at the same time. (Note: the part of the feature that
improves the performance, the asynchronous logging, is
turned off by default for now, and will be enabled in future
releases, when its reliability improves)

2.9. "No Journaling" mode


Journaling ensures the integrity of the filesystem by keeping
a log of the ongoing disk changes. However, it is know to
have a small overhead. Some people with special
requirements and workloads can run without a journal and
its integrity advantages. In Ext4 the journaling feature can
be disabled, which provides a small performance
improvement.

2.10. Online defragmentation


(This feature is being developed and will be included in
future releases). While delayed allocation, extents and
multiblock allocation help to reduce the fragmentation, with
usage filesystems can still fragment. For example: You write
three files in a directory and continually on the disk. Some
day you need to update the file of the middle, but the
updated file has grown a bit, so there's not enough room for
it. You have no option but fragment the excess of data to
another place of the disk, which will cause a seek, or
allocate the updated file continually in another place, far
from the other two files, resulting in seeks if an application
needs to read all the files on a directory (say, a file manager
doing thumbnails on a directory full of images). Besides, the
filesystem can only care about certain types of
fragmentation, it can't know, for example, that it must keep
all the boot-related files contiguous, because it doesn't know
which files are boot-related. To solve this issue, Ext4 will
support online fragmentation, and there's a e4defrag tool
which can defragment individual files or the whole
filesystem.

2.11. Inode-related features


Larger inodes, nanosecond timestamps, fast extended
attributes, inodes reservation...
 Larger inodes: Ext3 supports configurable inode sizes
(via the -I mkfs parameter), but the default inode size
is 128 bytes. Ext4 will default to 256 bytes. This is
needed to accommodate some extra fields (like
nanosecond timestamps or inode versioning), and the
remaining space of the inode will be used to store
extend attributes that are small enough to fit it that
space. This will make the access to those attributes
much faster, and improves the performance of
applications that use extend attributes by a factor of 3-
7 times.
 Inode reservation consists in reserving several inodes
when a directory is created, expecting that they will be
used in the future. This improves the performance,
because when new files are created in that directory
they'll be able to use the reserved inodes. File creation
and deletion is hence more efficient.
 Nanoseconds timestamps means that inode fields like
"modified time" will be able to use nanosecond
resolution instead of the second resolution of Ext3.

2.12. Persistent preallocation


This feature, available in Ext3 in the latest kernel versions,
and emulated by glibc in the filesystems that don't support
it, allows applications to preallocate disk space:
Applications tell the filesystem to preallocate the space, and
the filesystem preallocates the necessary blocks and data
structures, but there's no data on it until the application
really needs to write the data in the future. This is what P2P
applications do in their own when they "preallocate" the
necessary space for a download that will last hours or days,
but implemented much more efficiently by the filesystem
and with a generic API. This have several uses: first, to
avoid applications (like P2P apps) doing it themselves
inefficiently by filling a file with zeros. Second, to improve
fragmentation, since the blocks will be allocated at one
time, as contiguously as possible. Third, to ensure that
applications has always the space they know they will need,
which is important for RT-ish applications, since without
preallocation the filesystem could get full in the middle of
an important operation. The feature is available via the libc
posix_fallocate() interface.

2.13. Barriers on by default


This is an option that improves the integrity of the
filesystem at the cost of some performance (you can disable
it with "mount -o barrier=0", recommended trying it if
you're benchmarking). From this LWN article: "The
filesystem code must, before writing the [journaling]
commit record, be absolutely sure that all of the
transaction's information has made it to the journal. Just
doing the writes in the proper order is insufficient;
contemporary drives maintain large internal caches and will
reorder operations for better performance. So the
filesystem must explicitly instruct the disk to get all of the
journal data onto the media before writing the commit
record; if the commit record gets written first, the journal
may be corrupted. The kernel's block I/O subsystem makes
this capability available through the use of barriers; in
essence, a barrier forbids the writing of any blocks after the
barrier until all blocks written before the barrier are
committed to the media. By using barriers, filesystems can
make sure that their on-disk structures remain consistent at
all times."
3. How to use Ext4
This is the first stable version of Ext4, so even if the whole
development and release of this filesystems has been
slowed down and delayed a lot to guarantee the same level
of stability that you'd expect from the current Ext3
implementation, the usual rules of any ".0" software apply.
One very important thing to keep in mind is that there is
NOT Ext4 GRUB support. Well, that wasn't exactly true:
There is grub support, but it's not very spread in the
currently available distros. There's ext4 support in the
GRUB2 development branch, but since GRUB2 is not stable,
most of distros are using the 0.9x versions.
There are available grub2 packages in Ubuntu and debian-
derived distros as the grub-pc package. In the 0.9x branch,
there's not official support, but there's a Google SoC project
that developed support for it, and Google finds patches. So
choose yourself. The next release of distros based in Linux
2.6.28 will probably have support in one way or another.
The safe option is to keep your /boot directory in a partition
formatted with Ext3.
You also need an updated e2fsprogs tool, of course, the
latest stable version -1.41.3- is recommended.
NOTE: At least in debian-derived distros, including Ubuntu,
converting your filesystem to Ext4 when using a initramfs
results into a non-booting system, apparently even when
you enable the "ext4dev compatibility" option". The problem
is that the fstype klibc utility detects the ext4 filesystem as
ext3, and tries to mount it as ext3, and fails. The fix is to
pass the "rootfstype=ext4" option (without the quotes) in
the kernel command line.
Switching to Ext4 it's very easy. There are three different
ways you can use to switch:

3.1. Creating a new Ext4 filesystem from the scratch


 The easiest one, recommended for new installations.
Just update your e2fsprogs package to Ext4, and create
the filesystem with mkfs.ext4.

3.2. Migrate existing Ext3 filesystems to Ext4


You need to use the tune2fs and fsck tools in the filesystem,
and that filesystem needs to be unmounted. Run:
 tune2fs -O extents,uninit_bg,dir_index /dev/
yourfilesystem
After running this command you MUST run fsck. If you
don't do it, Ext4 WILL NOT MOUNT your filesystem. This
fsck run is needed to return the filesystem to a consistent
state. It WILL tell you that it finds checksum errors in the
group descriptors - it's expected, and it's exactly what it
needs to be rebuilt to be able to mount it as Ext4, so don't
get surprised by them. Since each time it finds one of those
errors it asks you what to do, always say YES. If you don't
want to be asked, add the "-p" parameter to the fsck
command, it means "automatic repair":
 fsck -pDf /dev/yourfilesystem
There's another thing that must be mentioned. All your
existing files will continue using the old indirect mapping to
map all the blocks of data. The online defrag tool will be
able to migrate each one of those files to a extent format
(using a ioctl that tells the filesystem to rewrite the file with
the extent format; you can use it safely while you're using
the filesystem normally)

3.3. Mount an existing Ext3 filesystem with Ext4


without changing the format
You can mount an existing Ext3 filesystem with Ext4 but
without using features that change the disk format. This
means you will be able to mount your filesystem with Ext3
again. You can mount an existing Ext3 filesystem with
"mount -t ext4 /dev/yourpartition /mnt". Doing this without
having done the conversion process described in the
previous point will force Ext4 to not use the features that
change the disk format, such as extents, it will use only the
features that don't change the file format, such as mballoc
or delayed allocation. You'll be able to mount your
filesystem as Ext3 again. But obviously you'll be losing the
advantages of the Ext4 features that don't get used...

You might also like