400 lines
13 KiB
Plaintext
400 lines
13 KiB
Plaintext
===============================================
|
|
CacheFiles: CACHE ON ALREADY MOUNTED FILESYSTEM
|
|
===============================================
|
|
|
|
Contents:
|
|
|
|
(*) Overview.
|
|
|
|
(*) Requirements.
|
|
|
|
(*) Configuration.
|
|
|
|
(*) Starting the cache.
|
|
|
|
(*) Things to avoid.
|
|
|
|
(*) Cache culling.
|
|
|
|
(*) Cache structure.
|
|
|
|
(*) Security model and SELinux.
|
|
|
|
|
|
========
|
|
OVERVIEW
|
|
========
|
|
|
|
CacheFiles is a caching backend that's meant to use as a cache a directory on
|
|
an already mounted filesystem of a local type (such as Ext3).
|
|
|
|
CacheFiles uses a userspace daemon to do some of the cache management - such as
|
|
reaping stale nodes and culling. This is called cachefilesd and lives in
|
|
/sbin.
|
|
|
|
The filesystem and data integrity of the cache are only as good as those of the
|
|
filesystem providing the backing services. Note that CacheFiles does not
|
|
attempt to journal anything since the journalling interfaces of the various
|
|
filesystems are very specific in nature.
|
|
|
|
CacheFiles creates a proc-file - "/proc/fs/cachefiles" - that is used for
|
|
communication with the daemon. Only one thing may have this open at once, and
|
|
whilst it is open, a cache is at least partially in existence. The daemon
|
|
opens this and sends commands down it to control the cache.
|
|
|
|
CacheFiles is currently limited to a single cache.
|
|
|
|
CacheFiles attempts to maintain at least a certain percentage of free space on
|
|
the filesystem, shrinking the cache by culling the objects it contains to make
|
|
space if necessary - see the "Cache Culling" section. This means it can be
|
|
placed on the same medium as a live set of data, and will expand to make use of
|
|
spare space and automatically contract when the set of data requires more
|
|
space.
|
|
|
|
|
|
============
|
|
REQUIREMENTS
|
|
============
|
|
|
|
The use of CacheFiles and its daemon requires the following features to be
|
|
available in the system and in the cache filesystem:
|
|
|
|
- dnotify.
|
|
|
|
- extended attributes (xattrs).
|
|
|
|
- openat() and friends.
|
|
|
|
- bmap() support on files in the filesystem (FIBMAP ioctl).
|
|
|
|
- The use of bmap() to detect a partial page at the end of the file.
|
|
|
|
It is strongly recommended that the "dir_index" option is enabled on Ext3
|
|
filesystems being used as a cache.
|
|
|
|
|
|
=============
|
|
CONFIGURATION
|
|
=============
|
|
|
|
The cache is configured by a script in /etc/cachefilesd.conf. These commands
|
|
set up cache ready for use. The following script commands are available:
|
|
|
|
(*) brun <N>%
|
|
(*) bcull <N>%
|
|
(*) bstop <N>%
|
|
(*) frun <N>%
|
|
(*) fcull <N>%
|
|
(*) fstop <N>%
|
|
|
|
Configure the culling limits. Optional. See the section on culling
|
|
The defaults are 7% (run), 5% (cull) and 1% (stop) respectively.
|
|
|
|
The commands beginning with a 'b' are file space (block) limits, those
|
|
beginning with an 'f' are file count limits.
|
|
|
|
(*) dir <path>
|
|
|
|
Specify the directory containing the root of the cache. Mandatory.
|
|
|
|
(*) tag <name>
|
|
|
|
Specify a tag to FS-Cache to use in distinguishing multiple caches.
|
|
Optional. The default is "CacheFiles".
|
|
|
|
(*) culltable <log2size>
|
|
|
|
Specify the size of the tables holding the lists of cullable objects in
|
|
the cache. The bigger the number, the faster and more smoothly that
|
|
culling can proceed when there are many objects in the cache, but the
|
|
more memory will be consumed by cachefilesd.
|
|
|
|
The quantity is specified as log2 of the size actually required, for
|
|
example 12 indicates a table of 4096 entries and 13 indicates 8192
|
|
entries. The permissible values are between 12 and 20, the latter
|
|
indicating 1048576 entries. The default is 12.
|
|
|
|
(*) resume_thresholds <blocks> <files>
|
|
|
|
Scanning to refill the cull table is suspended when all the objects in
|
|
a cache are pinned by a live network filesystem in the kernel and
|
|
there's nothing available to cull. Scanning is resumed when the kernel
|
|
releases sufficient objects that either the number of objects released
|
|
exceeds the files parameter here or the cumulative i_blocks values
|
|
exceed the blocks parameter. Either threshold can be disabled by
|
|
specifying it as "-".
|
|
|
|
The default is to ignore the block threshold and to resume when five or
|
|
more files have been released.
|
|
|
|
(*) debug <mask>
|
|
|
|
Specify a numeric bitmask to control debugging in the kernel module.
|
|
Optional. The default is zero (all off).
|
|
|
|
|
|
==================
|
|
STARTING THE CACHE
|
|
==================
|
|
|
|
The cache is started by running the daemon. The daemon opens the cache proc
|
|
file, configures the cache and tells it to begin caching. At that point the
|
|
cache binds to fscache and the cache becomes live.
|
|
|
|
The daemon is run as follows:
|
|
|
|
/sbin/cachefilesd [-d]* [-s] [-n] [-N] [-f <configfile>]
|
|
|
|
The flags are:
|
|
|
|
(*) -d
|
|
|
|
Increase the debugging level. This can be specified multiple times and
|
|
is cumulative with itself.
|
|
|
|
(*) -s
|
|
|
|
Send messages to stderr instead of syslog.
|
|
|
|
(*) -n
|
|
|
|
Don't daemonise and go into background.
|
|
|
|
(*) -N
|
|
|
|
Disable culling and scanning to fill the cull table.
|
|
|
|
(*) -f <configfile>
|
|
|
|
Use an alternative configuration file rather than the default one.
|
|
|
|
|
|
===============
|
|
THINGS TO AVOID
|
|
===============
|
|
|
|
Do not mount other things within the cache as this will cause problems. The
|
|
kernel module contains its own very cut-down path walking facility that ignores
|
|
mountpoints, but the daemon can't avoid them.
|
|
|
|
Do not create, rename or unlink files and directories in the cache whilst the
|
|
cache is active, as this may cause the state to become uncertain.
|
|
|
|
Renaming files in the cache might make objects appear to be other objects (the
|
|
filename is part of the lookup key).
|
|
|
|
Do not change or remove the extended attributes attached to cache files by the
|
|
cache as this will cause the cache state management to get confused.
|
|
|
|
Do not create files or directories in the cache, lest the cache get confused or
|
|
serve incorrect data.
|
|
|
|
Do not chmod files in the cache. The module creates things with minimal
|
|
permissions to prevent random users being able to access them directly.
|
|
|
|
|
|
=============
|
|
CACHE CULLING
|
|
=============
|
|
|
|
The cache may need culling occasionally to make space. This involves
|
|
discarding objects from the cache that have been used less recently than
|
|
anything else. Culling is based on the access time of data objects. Empty
|
|
directories are culled if not in use.
|
|
|
|
Cache culling is done on the basis of the percentage of blocks and the
|
|
percentage of files available in the underlying filesystem. There are six
|
|
"limits":
|
|
|
|
(*) brun
|
|
(*) frun
|
|
|
|
If the amount of free space and the number of available files in the cache
|
|
rises above both these limits, then culling is turned off.
|
|
|
|
(*) bcull
|
|
(*) fcull
|
|
|
|
If the amount of available space or the number of available files in the
|
|
cache falls below either of these limits, then culling is started.
|
|
|
|
(*) bstop
|
|
(*) fstop
|
|
|
|
If the amount of available space or the number of available files in the
|
|
cache falls below either of these limits, then no further allocation of
|
|
disk space or files is permitted until culling has raised things above
|
|
these limits again.
|
|
|
|
These must be configured thusly:
|
|
|
|
0 <= bstop < bcull < brun < 100
|
|
0 <= fstop < fcull < frun < 100
|
|
|
|
Note that these are percentages of available space and available files, and do
|
|
_not_ appear as 100 minus the percentage displayed by the "df" program.
|
|
|
|
The userspace daemon scans the cache to build up a table of cullable objects.
|
|
These are then culled in least recently used order. A new scan of the cache is
|
|
started as soon as space is made in the table. Objects will be skipped if
|
|
their atimes have changed or if the kernel module says it is still using them.
|
|
|
|
|
|
===============
|
|
CACHE STRUCTURE
|
|
===============
|
|
|
|
The CacheFiles module will create two directories in the directory it was
|
|
given:
|
|
|
|
(*) cache/
|
|
|
|
(*) graveyard/
|
|
|
|
The active cache objects all reside in the first directory. The CacheFiles
|
|
kernel module moves any retired or culled objects that it can't simply unlink
|
|
to the graveyard from which the daemon will actually delete them.
|
|
|
|
The daemon uses dnotify to monitor the graveyard directory, and will delete
|
|
anything that appears therein.
|
|
|
|
|
|
The module represents index objects as directories with the filename "I..." or
|
|
"J...". Note that the "cache/" directory is itself a special index.
|
|
|
|
Data objects are represented as files if they have no children, or directories
|
|
if they do. Their filenames all begin "D..." or "E...". If represented as a
|
|
directory, data objects will have a file in the directory called "data" that
|
|
actually holds the data.
|
|
|
|
Special objects are similar to data objects, except their filenames begin
|
|
"S..." or "T...".
|
|
|
|
|
|
If an object has children, then it will be represented as a directory.
|
|
Immediately in the representative directory are a collection of directories
|
|
named for hash values of the child object keys with an '@' prepended. Into
|
|
this directory, if possible, will be placed the representations of the child
|
|
objects:
|
|
|
|
INDEX INDEX INDEX DATA FILES
|
|
========= ========== ================================= ================
|
|
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
|
|
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
|
|
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
|
|
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...FP1ry
|
|
|
|
|
|
If the key is so long that it exceeds NAME_MAX with the decorations added on to
|
|
it, then it will be cut into pieces, the first few of which will be used to
|
|
make a nest of directories, and the last one of which will be the objects
|
|
inside the last directory. The names of the intermediate directories will have
|
|
'+' prepended:
|
|
|
|
J1223/@23/+xy...z/+kl...m/Epqr
|
|
|
|
|
|
Note that keys are raw data, and not only may they exceed NAME_MAX in size,
|
|
they may also contain things like '/' and NUL characters, and so they may not
|
|
be suitable for turning directly into a filename.
|
|
|
|
To handle this, CacheFiles will use a suitably printable filename directly and
|
|
"base-64" encode ones that aren't directly suitable. The two versions of
|
|
object filenames indicate the encoding:
|
|
|
|
OBJECT TYPE PRINTABLE ENCODED
|
|
=============== =============== ===============
|
|
Index "I..." "J..."
|
|
Data "D..." "E..."
|
|
Special "S..." "T..."
|
|
|
|
Intermediate directories are always "@" or "+" as appropriate.
|
|
|
|
|
|
Each object in the cache has an extended attribute label that holds the object
|
|
type ID (required to distinguish special objects) and the auxiliary data from
|
|
the netfs. The latter is used to detect stale objects in the cache and update
|
|
or retire them.
|
|
|
|
|
|
Note that CacheFiles will erase from the cache any file it doesn't recognise or
|
|
any file of an incorrect type (such as a FIFO file or a device file).
|
|
|
|
|
|
==========================
|
|
SECURITY MODEL AND SELINUX
|
|
==========================
|
|
|
|
CacheFiles is implemented to deal properly with the LSM security features of
|
|
the Linux kernel and the SELinux facility.
|
|
|
|
One of the problems that CacheFiles faces is that it is generally acting on
|
|
behalf of a process that is in a security context that is not appropriate for
|
|
accessing the cache - either because the files in the cache are inaccessible to
|
|
that process, or because if the process creates a file in the cache, it'll be
|
|
inaccessible to other processes.
|
|
|
|
The way CacheFiles works is to temporarily change the security context (fsuid,
|
|
fsgid and actor security label) that the process acts as - without changing the
|
|
security context of the process when it the target of an operation performed by
|
|
some other process (so signalling and suchlike still work correctly).
|
|
|
|
|
|
When the CacheFiles module is asked to bind to its cache, it:
|
|
|
|
(1) Finds the security label attached to the root cache directory and uses
|
|
that as the security label with which it will create files. By default,
|
|
this is:
|
|
|
|
cachefiles_var_t
|
|
|
|
(2) Finds the security label of the process which issued the bind request
|
|
(presumed to be the cachefilesd daemon), which by default will be:
|
|
|
|
cachefilesd_t
|
|
|
|
and asks LSM to supply a security ID as which it should act given the
|
|
daemon's label. By default, this will be:
|
|
|
|
cachefiles_kernel_t
|
|
|
|
SELinux transitions the daemon's security ID to the module's security ID
|
|
based on a rule of this form in the policy.
|
|
|
|
type_transition <daemon's-ID> kernel_t : process <module's-ID>;
|
|
|
|
For instance:
|
|
|
|
type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t;
|
|
|
|
|
|
The module's security ID gives it permission to create, move and remove files
|
|
and directories in the cache, to find and access directories and files in the
|
|
cache, to set and access extended attributes on cache objects, and to read and
|
|
write files in the cache.
|
|
|
|
The daemon's security ID gives it only a very restricted set of permissions: it
|
|
may scan directories, stat files and erase files and directories. It may
|
|
not read or write files in the cache, and so it is precluded from accessing the
|
|
data cached therein; nor is it permitted to create new files in the cache.
|
|
|
|
|
|
The policy source files are for reference installed as:
|
|
|
|
/usr/share/doc/cachefilesd/cachefilesd.te
|
|
/usr/share/doc/cachefilesd/cachefilesd.fc
|
|
/usr/share/doc/cachefilesd/cachefilesd.if
|
|
|
|
By default, the cache is located in /var/cache/fscache, but if it is desirable
|
|
that it should be elsewhere, than either the above policy files must be
|
|
altered, or an auxiliary policy must be installed to label the alternate
|
|
location of the cache.
|
|
|
|
For instructions on how to add an auxiliary policy to enable the cache to be
|
|
located elsewhere when SELinux is in enforcing mode, please see:
|
|
|
|
/usr/share/doc/cachefilesd/move-cache.txt
|
|
|
|
When the cachefilesd RPM is installed; alternatively, the document can be found
|
|
in the sources.
|