-
Notifications
You must be signed in to change notification settings - Fork 2
Fuse Over Amazon
Announcement: s3fs-fuse is moved from s3fs on googlecode after v1.74. Please submit usage/support questions to the Issues area instead of as a comment to this Wiki page. Thanks!
- v1.79 fixed many bugs etc.
- v1.78 supported for SSE-C, and fixed some bugs.
- v1.77 fixed curl ssl problems etc.
- v1.76 fixed some bugs
- v1.75 fixed some bugs and for MacOSX build
- v1.74 initial version in Github, same as in googlecodes v1.74
Older version is in GoogleCodes, please refer to it for the version before v1.74.
s3fs is a FUSE filesystem that allows you to mount an Amazon S3 bucket as a local filesystem. It stores files natively and transparently in S3 (i.e., you can use other programs to access the same files). Maximum file size=64GB (limited by s3fs, not Amazon).
s3fs is stable and is being used in number of production environments, e.g., rsync backup to s3.
Important Note: Your kernel must support FUSE, kernels earlier than 2.6.18-164 may not have FUSE support (see issue #140). Virtual Private Servers (VPS) may not have FUSE support compiled into their kernels.
To use it:
- Get an Amazon S3 account! http://aws.amazon.com/s3/
- Download, compile and install, (see [Installation Notes](Installation Notes))
- Specify your Security Credentials (Access Key ID & Secret Access Key) by one of the following methods:
- using the passwd_file command line option
- setting the AWSACCESSKEYID and AWSSECRETACCESSKEY environment variables
- using a .passwd-s3fs file in your home directory
- using the system-wide /etc/passwd-s3fs file
- do this:
/usr/bin/s3fs mybucket /mnt
That's it! the contents of your amazon bucket "mybucket" should now be accessible read/write in /mnt.
The s3fs password file has this format (use this format if you have only one set of credentials):
accessKeyId:secretAccessKey
If have more than one set of credentials, then you can have default credentials as specified above, but this syntax will be recognized as well:
bucketName:accessKeyId:secretAccessKey
If you want to use IAM account, you can get AccessKey/secretAccessKey pair on AWS S3 console.
Note: The credentials files may not have lax permissions as this creates a security hole. ~/.passwd-s3fs may not have others/group permissions and /etc/passwd-s3fs may not have others permissions. Set permissions on these files accordingly:
% chmod 600 ~/.passwd-s3fs
% sudo chmod 640 /etc/passwd-s3fs
s3fs supports mode (e.g., chmod), mtime (e.g, touch) and uid/gid (chown). s3fs stores the values in x-amz-meta custom meta headers, and uses x-amz-copy-source to efficiently change them.
s3fs has a caching mechanism: You can enable local file caching to minimize downloads, e.g., :
/usr/bin/s3fs mybucket /mnt -ouse_cache=/tmp
Hosting a cvsroot on s3 works! Although you probably don't really want to do it in practice. E.g., cvs -d /s3/cvsroot init. Incredibly, mysqld also works, although I doubt you really wanna do that in practice! =)
s3fs works with rsync! (as of svn 43) as of r152 s3fs uses x-amz-copy-source for efficient update of mode, mtime and uid/gid.
s3fs will retry s3 transactions on certain error conditions. The default retry count is 2, i.e., s3fs will make 2 retries per s3 transaction (for a total of 3 attempts: 1st attempt + 2 retries) before giving up. You can set the retry count by using the "retries" option, e.g., "-oretries=2".
-
default_acl (default="private")
-
the default canned acl to apply to all written s3 objects, e.g., "public-read"
-
see http://docs.amazonwebservices.com/AmazonS3/2006-03-01/index.html?RESTAccessPolicy.html "Canned Access Policies" for the full list of canned acls
-
any created files will have this canned acl
-
any updated files will also have this canned acl applied!
-
prefix (default="") (coming soon!)
-
a prefix to append to all s3 objects
-
retries (default="2")
-
number of times to retry a failed s3 transaction
-
use_cache (default="" which means disabled)
-
local folder to use for local file cache
-
use_rrs (default="" which means diabled)
-
use Amazon's Reduced Redundancy Storage
-
use_sse (default is disable)
-
not specify use_sse option
default is SSE-DISABLE -
"use_sse" or "use_sse=1"(old type parameter)
uses Amazon S3-managed encryption keys -
"use_sse=custom:'filepath'" or "use_sse='filepath'"(old type parameter)
uses customer-provided encryption keys.
The custom key file must be 600 permission.
The file can have some lines, each line is one SSE-C key.
The first line in file is used as Customer-Provided Encryption Keys for uploading and changing headers etc.
If there are some keys after first line, those are used downloading object which are encrypted by not first key.
So that, you can keep all SSE-C keys in file, that is SSE-C key history. -
"use_sse=custom"
If you specify "custom"("c") without file path, you need to set custom key by load_sse_c option or AWSSSECKEYS environment.
(AWSSSECKEYS environment has some SSE-C keys with ":" separator.)
This option is used to decide the SSE type.
So that if you do not want to encrypt a object object at uploading, but you need to decrypt encrypted object at downloaing, you can use load_sse_c option instead of this option. -
"use_sse=kmsid" or "use_sse=kmsid:'kms id'"
uses the master key which you manage in AWS KMS.
You can use "k" for short "kmsid".
If you san specify SSE-KMS type with your 'kms id' in AWS KMS, you can set it after "kmsid:"(or "k:").
If you specify only "kmsid"("k"), you need to set AWSSSEKMSID environment which value is 'kms id'. -
notice
You must be careful about that you can not use the KMS id which is not same EC2 region. -
load_sse_c - specify SSE-C keys
-
Specify the custom-provided encription keys file path for decrypting at duwnloading.
If you use the custom-provided encription key at uploading, you specify with "use_sse=custom".
The file has many lines, one line means one custom key.
So that you can keep all SSE-C keys in file, that is SSE-C key history.
AWSSSECKEYS environment is as same as this file contents. -
passwd_file (default="")
-
specify the path to the password file, over-rides looking for the password in in $HOME/.passwd-s3fs and /etc/passwd-s3fs
-
ahbe_conf (default="" which means disabled)
-
This option specifies the configuration file path which file is the additional HTTP header by file(object) extension.
-
public_bucket (default="" which means disabled)
-
anonymously mount a public bucket when set to 1, ignores the $HOME/.passwd-s3fs and /etc/passwd-s3fs files
-
connect_timeout (default="10" seconds)
-
time to wait for connection before giving up
-
readwrite_timeout (default="30" seconds)
-
time to wait between read/write activity before giving up
-
max_stat_cache_size (default="10000" entries (about 4MB))
-
maximum number of entries in the stat cache
-
url (default="http://s3.amazonaws.com")
-
sets the url to use to access amazon s3, e.g., if you want to use https then set url=https://s3.amazonaws.com
-
stat_cache_expire (default is no expire)
-
specify expire time(seconds) for entries in the stat cache.
-
enable_noobj_cache (default is disable)
-
enable cache entries for the object which does not exist.
-
nodnscache
-
s3fs is always using dns cache, this option make dns cache disable.
-
nomultipart
-
disable multipart uploads.
-
multireq_max (default="500")
-
maximum number of parallel request for listing objects.
-
parallel_count (default="5")
-
number of parallel request for downloading/uploading large objects. s3fs uploads large object(over 20MB) by multipart post request, and sends parallel requests. This option limits parallel request count which s3fs requests at once.
-
fd_page_size (default="52428800"(50MB))
-
number of internal management page size for each file discriptor.
-
enable_content_md5 (default is disable)
-
verifying uploaded data without multipart by content-md5 header.
-
noxmlns
-
disable registing xml name space for response of ListBucketResult and ListVersionsResult etc. Default name space is looked up from "http://s3.amazonaws.com/doc/2006-03-01".
-
iam_role ( default is no role )
-
set the IAM Role that will supply the credentials from the instance meta-data. specify only IAM role name.
-
nocopyapi
-
for a distributed object storage which is compatibility S3 API without PUT(copy api). If you set this option, s3fs do not use PUT with "x-amz-copy-source"(copy api).
-
norenameapi
-
for a distributed object storage which is compatibility S3 API without PUT(copy api). This option is a subset of nocopyapi option.
-
use_path_request_style
-
Enable compatibility with S3-like APIs which do not support the virtual-host request style, by using the older path request style.
-
dbglevel ( default="crit" )
-
Set the debug message level. set value as crit(critical), err(error), warn(warning), info(information), dbg(debug) to debug level. default debug level is critical. If s3fs run with "-d" option, the debug level is set information. When s3fs catch the signal SIGUSR2, the debug level is bumpup.
-
curldbg
-
Put the debug message from libcurl when this option is specified.
If enabled via "use_cache" option, s3fs automatically maintains a local cache of files in the folder specified by use_cache. Whenever s3fs needs to read or write a file on s3 it first downloads the entire file locally to the folder specified by use_cache and operates on it. When fuse release() is called, s3fs will re-upload the file to s3 if it has been changed. s3fs uses md5 checksums to minimize downloads from s3. Note: this is different from the stat cache (see below).
Local file caching works by calculating and comparing md5 checksums (ETag HTTP header).
The folder specified by use_cache is just a local cache. It can be deleted at any time. s3fs re-builds it on demand. Note: this directory grows unbounded and can fill up a file system dependent upon the bucket and reads to that bucket. Take precaution by using a quota system or routinely clearing the cache (or some other method).
s3fs supports chmod (mode) and touch (mtime) by virtue of "x-amz-meta-mode" and "x-amz-meta-mtime" custom meta headers. as of r149 s3fs uses x-amz-copy-source, this means that s3fs no longer needs to operate in a brute-force manner; much faster now (one minor performance-related corner case left to solve... /usr/bin/touch)
The stat cache stores file information in memory and can improve performance. It's default setting is to store 10,000 entries which can account for about 4 MB of memory usage. When the stat cache fills up, entries with a low hit count are deleted first. The size of the stat cache is controllable with an option.
s3fs uses /etc/mime.types to "guess" the "correct" content-type based on file name extension. This means that you can copy a website to s3 and serve it up directly from s3 with correct content-types. Uknown file types are assigned "application/octet-stream".
Due to S3's "eventual consistency" limitations file creation can and will occasionally fail. Even after a successful create subsequent reads can fail for an indeterminate time, even after one or more successful reads. Create and read enough files and you will eventually encounter this failure. This is not a flaw in s3fs and it is not something a FUSE wrapper like s3fs can work around. The retries option does not address this issue. Your application must either tolerate or compensate for these failures, for example by retrying creates or reads. For more details, see Eventual Consistency
s3fs runs with libcurl, then if you use libcurl with libnss, s3fs requires libcurl after version 7.21.5. If you use lbcurl(with libnss) under version 7.21.5, s3fs leaks memory. You don't mind about libcurl version when libcurl linked OpenSSL library instead of libnss.
Older changes list is in GoogleCodes, please refer to it for the version before r501.
- What do I need to know?
- /usr/bin/s3fs
- /var/log/messages
- an entry in /etc/fstab (optional - ** requires fuse to be fully installed ** issue #115)
- the file $HOME/.passwd-s3fs or /etc/passwd-s3fs (optional)
- the folder specified by use_cache (optional) a local file cache automatically maintained by s3fs, enabled with "use_cache" option, e.g., -ouse_cache=/tmp
- the file /etc/mime.types
- map of file extensions to Content-types
- on Fedora /etc/mime.types comes from mailcap, so, you can either (a) create this file yourself or (b) do a yum install mailcap
- stores files natively and transparently in amazon s3; you can access files with other tools, e.g., jets3t
- Why do I get "Input/output error"?
- Does the bucket exist?
- Are your credentials correct?
- Is your local clock within 15 minutes of Amazon's? (RequestTimeTooSkewed)
- How do I troubleshoot it?
- tail -f /var/log/messages
- Use the fuse -f switch, e.g., /usr/bin/s3fs -f my_bucket /mnt
- Its still not working!
- Try updating your version of libcurl: I've used 7.16 and 7.17
- Q: when I mount a bucket only the current user can see it; other users cannot; how do I allow other users to see it? ...why do I see "d?????????" in directory listings? A: use 'allow_other'
- /usr/bin/s3fs -o allow_other mybucket /mnt
- or from /etc/fstab: s3fs#mybucket /mnt fuse _netdev,allow_other 0 0
- Q: How does the local file cache work?
- A: It is unbounded! if you want you can use a cron job (e.g., script in /etc/cron.daily) to periodically purge "~/.s3fs"... due to the reference nature of posix file systems a periodic purge will not interfere with the normal operation of s3fs local file cache...!
- Q: s3fs uses x-amz-meta custom meta headers... will s3fs clobber any existing x-amz-meta custom header headers?
- A: No!
- Q: I renamed a folder and now all of the files in that folder are gone! What the?!?
- A: Rename it back and your files will be back. s3fs does not support deep directory rename and doesn't check for it either.
- Q: I have 'Check bucket failed' errors when trying to mount an S3-like storage, what now ?
- A: try using the use_path_request_style option.
- Q: Could not connect at booting
- A: Try to add “_netdev” option to s3fs entry in fstab, it waits mounting until network up.
- A: Start netfs service on your instance, it loads fuse module to system(by modprobe).
- server side copies are not possible - due to how FUSE orchestrates the low level instructions, the file must first be downloaded to the client and then uploaded to the new location
- permissions: using -o allow_other, even though files are owned by root 0755, another use can make changes
- use default_permissions option?!?
- better error logging for troubleshooting.
- need to parse response on, say, 403 and 404 errors, etc... and log 'em!
Here is a list of other Amazon S3 filesystems:
- ElasticDrive
- PersistentFS
- https://fedorahosted.org/s3fs/
- http://code.google.com/p/s3fs-fuse/
- http://s3backer.googlecode.com
- http://code.google.com/p/s3ql
- https://github.com/danilop/yas3fs
Other tools that combine with s3fs in useful ways:
- S3Proxy - allows applications using the S3 API to access other object stores, e.g., EMC Atmos, Microsoft Azure, OpenStack Swift