XtreemFS

Tuesday, October 27, 2015

XtreemFS 1.5.1.84 (Unstable) is Available

We are currently working on the next stable release, which will be available soon. We published updated unstable builds via the openSUSE Build Service containing the following features of the upcoming release:

New quota implementation: We introduced volume quotas with XtreemFS 1.5.1. However, quotas were only enforced by the MRC service, which caused the possibility to write above a quota while a client holds a valid file handle. Our new quota implementation allows an exact enforcement while protecting against malicious clients. User and group quotas are now available beside the existing volume quotas. Tests have shown that the overhead of the new protocol is negligible. Note that quotas are currently not compatible with volumes that have been created with older XtreemFS versions. This will be available with the next stable release.
File system access tracing: We added a policy interface the the OSD service to trace read and write requests to files. We ship policies to write an access trace to a file or a network socket. We will add a RabbitMQ based policy for the stable release.
JNI based libxtreemfs: The Java version of libxtreemfs was rewritten using JNI, which improves the performance for parallel access and brings missing features to Java developers. A common code-base for Java and C++ will help to ship upcoming features quickly to both of the libraries.
Improved Hadoop adapter: The support for multiple volumes in the Hadoop adapter was extended. Furthermore, the Hadoop integration benefits from the JNI based libxtreemfs and supports asynchronous writes now.

We are thankful for any feedback. Please use our public mailing list or the Github issue tracker to report any problems.

The development of this release has been funded by the European Union Seventh Framework Program in the HARNESS project under grant agreement number 318521.

Tuesday, March 24, 2015

Consistency while Adding and Removing Replicas

As mentioned in the release notes for XtreemFS 1.5.1 the adding and removing of replicas got more robust. Previously there had been border cases where access based on an outdated replica set could result in data inconsistencies. With XtreemFS 1.5.1 a protocol has been established that ensures consistency in any case.

To understand why those inconsistencies could occur, you have to recall the nature of file access in distributed systems like XtreemFS where metadata and file data is separated.
Opening a file results in a call to the Metadata and Replica Catalog (MRC) which returns the set of replicas and a capability. File data access, like reading or writing, is done directly among the client and the Object Storage Devices (OSD) that are listed in the replica set. The MRC is no longer involved, since access is granted as long as the capability is valid.

Data access in XtreemFS

It is apparent that without further action, different clients could obtain different replica sets for the same file, in case replicas have been added or removed in-between. As the quorum required by the R/W replication is also established by the replicas listed in the replica set, it is possible that different clients access data on a non intersecting sub sets of the replicas.

Consider for example the case that a file has five replicas called A, B, C, D and E. Then a valid majority is A, B and C, even if D and E are not online. This could happen for example if a link between some regions is highly unstable. If the replicas A, B and C are to be removed now, it has to be ensured both, that no client is allowed to write anymore data to A, B and C based on the old replica set, and that data previously not replicated to D and E is transferred to them prior to the installation of the new replica set consisting of just D and E.

The protocol that has been introduced with XtreemFS 1.5.1 does just that. It is extending the replica set to contain a version number, denies access based on outdated versions and uses the MRC to coordinate changes to the replica set between the involved replicas.

The coordination is central to the protocol and involves three stages. First a majority of the old replicas is getting invalidated, to ensure the data does not change during the second stage. The second stage ensures that the latest file data is transferred and updated on majority of the new replicas. Third and lastly the new replica set is installed with an incremented version number.

After the installation the new replica set is returned to clients opening the file. The installation on the replicas happens implicitly if a replica set with a higher version number is encountered.
Then again, if a client tries to access a replica with an outdated replica set is is denied. In XtreemFS 1.5.1 both the libxtreemfs for Java and C++ are handling errors based on outdated replica sets transparently by reloading the replica set from the MRC and retrying the request.

The new feature allows easily adding and removing of replicas for users and guarantees data consistency. Although it is compatible with clients build from the previous versions, it is recommend to update clients and servers simultaneously to profit from the transparent reloading of outdated replica sets.

Thursday, March 12, 2015

XtreemFS 1.5.1 Released

A new stable release of the distributed file system XtreemFS is available. XtreemFS 1.5.1 comes with the following major features:

Improved Hadoop support: The Hadoop Adapter supports Hadoop-2.x and other applications running on the YARN platform.
Consistent adding and removing replicas for R/W replication: Replica consistency is ensured while adding and removing replicas, xtfs_scrub can replace failed replicas automatically.
Improved SSL mode: The used SSL/TLS version is selectable, strict certificate chain checks are possible, the SSL code on client and server side was improved.
Better support for mounting XtreemFS using /etc/fstab: All mount parameters can be passed to the client by mount.xtreemfs -o option=value.
Initial version of an LD_PRELOAD based client: The client comes in the form of a library that can be linked to an application via LD_PRELOAD. File system calls to XtreemFS are directly forwarded to the services without FUSE. The client is intended for systems without FUSE or performance critical applications (experimental).
The size of a volume can be limited: Added quota support on volume level. The capacity limits are currently checked while opening a file on the MRC.
OSD health monitoring: OSDs can report their health, e.g. determined by SMART values, to the DIR. The results are aggregated in the DIR web interface. The default OSD selection policy can skip unhealthy OSDs.
Minor bugfixes and improvements across all components: See the CHANGELOG for more details and references to the issue numbers.

Furthermore we provide Dockerfiles to run the XtreemFS services in containers. The Dockerfiles are available in a separate Git repository at https://github.com/xtreemfs/xtreemfs-docker.

To ease contributing to XtreemFS for new developers, we added a Vagrantfile to the XtreemFS Git repository that allows setting up a virtual machine having all dependencies to build XtreemFS automatically.

The development of this release was partially funded by the European Commission in the HARNESS project under Grant Agreement No. 318521, as well as the German projects FFMK, GeoMultiSens and BBDC.

Monday, January 12, 2015

XtreemFS on AWS

We found this nice tutorial on running XtreemFS in the Amazon EC2 cloud. The video gives a brief introduction about installing the XtreemFS packages, configuring the services, creating and mounting volumes.
More complex setups using the AWS Elastic IP service require some additional effort. Please consider the following discussion on our mailinglist for details.

Thursday, October 30, 2014

XtreemFS in Docker Containers

Recently, we were running the XtreemFS services in Docker containers for one of our current research projects and would like to share our experiences. Docker is a container based virtualization solution that provides a certain level of isolation between applications running on the same machine.

Docker images are generated using a Dockerfile. Dockerfiles contain some metadata and a sequence of instructions that is executed to generate the image. Container images are derived from a base image, e.g. a standard Ubuntu Linux, and store only the changes made to this base image. As all XtreemFS services (DIR. MRC and OSD) are shipped in a common binary file (XtreemFS.jar), we created an xtreemfs-common image that contains the binaries and service specific images that inherit from the common image. The service specific images (xtreemfs-dir, xtreemfs-mrc, and xtreemfs-osd) contain only a service specific call to start each of the services.

An application running in a Docker container is required to stay in foreground during the lifetime of the container, otherwise the container will terminate. This means for XtreemFS that we are not able to use our service specific init scripts to start the DIRs, MRCs, and OSDs. We extracted the relevant parts from the init scripts and created a CMD call, i.e. the command that will be executed after starting a container. As the XtreemFS logs are directly written to stdout and no longer to a file, one can easily use the docker logs call to check what happens in a container.

A critical part of running a distributed file system in containers is to ensure that all file system contents are stored persistently, even beyond the lifetime of the container. Our Dockerfiles make use of Docker volumes to store file system contents. A volume is nothing else than a directory, which is mapped from the host machine to the container. The CMD call of our containers expect the service configuration to be placed in /xtreemfs_data, which have to be mapped as a volume to the container. Beside the configuration file, this volume can also be used to store file system contents. However, any other place is possible.

Mapping the XtreemFS configuration files to a container by using a volume has also the advantage that our Docker images are generic and reusable. As a user can specify volumes and ports that have to be mapped to a container during its start, one can create an arbitrary XtreemFS service configuration files, named dirconfig.properties, mrcconfig.properties, or osdconfig.properties, and map all affected directories and ports at the container start time.

After mapping network ports to a container, the underlying service is reachable via the IP address of the host. The XtreemFS services register themselves at the directory service (DIR) and propagate their own addresses. While running in containers, the services are not aware of the host's address they are reachable by. Each container knows only its address from an internal virtual network. We can go around this problem by setting the hostname parameter in the MRC and OSD configurations to the public address or name. This workaround has previously been used to run services that are reachable via a NAT.

We provide the described Dockerfiles on Github. The repository contains a README file with usage instructions. We may consider to publish them in the Docker index after additional testing and evaluating their use. The containers are currently derived from an Ubuntu base image and take the latest XtreemFS version from out GIT repository. The Dockerfiles can be easily adapted to other Linux distributions or XtreemFS releases. We would be happy to get any feedback.

Monday, August 11, 2014

Mounting XtreemFS Volumes using Autofs

Autofs is a useful tool to mount networked file systems automatically on access, for instance on machines without a permanent network connectivity like notebooks. We prepared a short tutorial that describes how to use automounter for XtreemFS volumes.

This assumes you'd like a shared directory called /scratch/xtfs/shared across all of your machines and anyone can read/write to it. While I use /scratch in this example, more traditional /net could be used instead.

Assume all of XtreemFS is installed, set up properly, volumes are created...
Have autofs installed (and started or not).
Create an /etc/auto.master with these contents:

# All xtreemfs volumes will be automounted in /scratch/xtfs
/scratch/xtfs /etc/auto.xtfs
#
# Include /etc/auto.master.d/*.autofs
#
+dir:/etc/auto.master.d
#
# Include central master map if it can be found using
# nsswitch sources.
#
# Note that if there are entries for /net or /misc (as
# above) in the included master map any keys that are the
# same will not be seen as the first read key seen takes
# precedence.
#
+auto.master

Then create an /etc/auto.xtfs (which you'll have to modify for your DIR).

shared -fstype=fuse,allow_other :mount.xtreemfs#dir.example.com/volume-0

Restart autofs (A command similar to this):

sudo /etc/init.d/autofs restart

Do this for each machine on which you'd like to use autofs.

Thanks to Pete for contributing this tutorial!

Tuesday, June 3, 2014

XtreemFS moved to Github

We moved our Git repository from Google Code to Github. The new Project page is available at https://github.com/xtreemfs/xtreemfs. All tickets from the issue tracker have been migrated and are available with the same issue number. Other services like the public mailinglist or the binary package repositories are not affected.

We are looking forward to your feedback and contributions.