Thursday, October 30, 2014

XtreemFS in Docker Containers

Recently, we were running  the XtreemFS services in Docker containers for one of our current research projects and would like to share our experiences. Docker is a container based virtualization solution that provides a certain level of isolation between applications running on the same machine.

Docker images are generated using a Dockerfile. Dockerfiles contain some metadata and a sequence of instructions that is executed to generate the image. Container images are derived from a base image, e.g. a standard Ubuntu Linux, and store only the changes made to this base image. As all XtreemFS services (DIR. MRC and OSD) are shipped in a common binary file (XtreemFS.jar), we created an xtreemfs-common image that contains the binaries and service specific images that inherit from the common image. The service specific images (xtreemfs-dir, xtreemfs-mrc, and xtreemfs-osd) contain only a service specific call to start each of the services.

An application running in a Docker container is required to stay in foreground during the lifetime of the container, otherwise the container will terminate. This means for XtreemFS that we are not able to use our service specific init scripts to start the DIRs, MRCs, and OSDs. We extracted the relevant parts from the init scripts and created a CMD call, i.e. the command that will be executed after starting a container. As the XtreemFS logs are directly written to stdout and no longer to a file, one can easily use the docker logs call to check what happens in a container.

A critical part of running a distributed file system in containers is to ensure that all file system contents are stored persistently, even beyond the lifetime of the container. Our Dockerfiles make use of Docker volumes to store file system contents. A volume is nothing else than a directory, which is mapped from the host machine to the container. The CMD call of our containers expect the service configuration to be placed in /xtreemfs_data, which have to be mapped as a volume to the container. Beside the configuration file, this volume can also be used to store file system contents. However, any other place is possible.

Mapping the XtreemFS configuration files to a container by using a volume has also the advantage that our Docker images are generic and reusable. As a user can specify volumes and ports that have to be mapped to a container during its start, one can create an arbitrary XtreemFS service configuration files, named dirconfig.properties, mrcconfig.properties, or osdconfig.properties, and map all affected directories and ports at the container start time.

After mapping network ports to a container, the underlying service is reachable via the IP address of the host. The XtreemFS services register themselves at the directory service (DIR) and propagate their own addresses. While running in containers, the services are not aware of the host's address they are reachable by. Each container knows only its address from an internal virtual network. We can go around this problem by setting the hostname parameter in the MRC and OSD configurations to the public address or name. This workaround has previously been used to run services that are reachable via a NAT.

We provide the described Dockerfiles on Github. The repository contains a README file with usage instructions. We may consider to publish them in the Docker index after additional testing and evaluating their use. The containers are currently derived from an Ubuntu base image and take the latest XtreemFS version from out GIT repository. The Dockerfiles can be easily adapted to other Linux distributions or XtreemFS releases. We would be happy to get any feedback.

Monday, August 11, 2014

Mounting XtreemFS Volumes using Autofs

Autofs is a useful tool to mount networked file systems automatically on access, for instance on machines without a permanent network connectivity like notebooks. We prepared a short tutorial that describes how to use automounter for XtreemFS volumes.

This assumes you'd like a shared directory called /scratch/xtfs/shared across all of your machines and anyone can read/write to it. While I use /scratch in this example, more traditional /net could be used instead.
  • Assume all of XtreemFS is installed, set up properly, volumes are created...
  • Have autofs installed (and started or not).
  • Create an /etc/auto.master with these contents:
# All xtreemfs volumes will be automounted in /scratch/xtfs
/scratch/xtfs   /etc/auto.xtfs
#
# Include /etc/auto.master.d/*.autofs
#
+dir:/etc/auto.master.d
#
# Include central master map if it can be found using
# nsswitch sources.
#
# Note that if there are entries for /net or /misc (as
# above) in the included master map any keys that are the
# same will not be seen as the first read key seen takes
# precedence.
#
+auto.master
  • Then create an /etc/auto.xtfs (which you'll have to modify for your MRC).
shared -fstype=fuse,allow_other :mount.xtreemfs#mrc.example.com/volume-0
  • Restart autofs (A command similar to this):
sudo /etc/init.d/autofs restart
  • Do this for each machine on which you'd like to use autofs.
Thanks for Pete for contributing this tutorial!

Tuesday, June 3, 2014

XtreemFS moved to Github

We moved our Git repository from Google Code to Github. The new Project page is available at https://github.com/xtreemfs/xtreemfs. All tickets from the issue tracker have been migrated and are available with the same issue number. Other services like the public mailinglist or the binary package repositories are not affected.

We are looking forward to your feedback and contributions.

Thursday, March 27, 2014

Public demo server updated to XtreemFS 1.5

We updated our public demo server to XtreemFS 1.5. To tryout XtreemFS without setting up an own server, just install the client and mount our volume:

mkdir ~/xtreemfs_demo 
mount.xtreemfs demo.xtreemfs.org/demo ~/xtreemfs_demo 
cd ~/xtreemfs_demo

For testing you can create any directories and files as you like. Please do not upload anything illegal or copyrighted material. For legal reasons every file create/write is logged with the IP address and timestamp. Files are automatically deleted every hour.

Wednesday, March 12, 2014

XtreemFS 1.5 released: Improved support for Hadoop and SSDs

Berlin, Germany. Today, we released a new stable version of the cloud file system XtreemFS.
XtreemFS 1.5 (Codename "Wonderful Waffles") comes with the following major changes:

  • Improved Hadoop Support: Read and write buffers were added to improve the performance for small requests. We also implemented support for multiple volumes e.g., to store input and output on volumes with different replication policies.
  • SSDs support: So far, an OSD was optimized for rotating disks by using a single thread for disk accesses. Solid State Disks (SSDs) cope well with simultaneous requests and show a higher throughput with increased parallelism. To achieve more parallelism per OSD when using SSDs, multiple storage threads are supported now.
  • Multi-Homing Support: XtreemFS can be made available for multiple networks and clients will pick the correct address automatically.
  • Multiple OSDs per Machine: Machines with multiple disks have to run an OSD for each disk. We simplified this process with the new xtreemfs-osd-farm init.d script.
  • Bugfixes for Read/Write and Read-Only Replication: We fixed a problem which prevented read/write replicated files to fail-over correctly. Another problem was that the on-demand read-only replication could hang and access was stalled.
  • Replication Status Page: The DIR status page has got a visualization for the current replica status of open files. For example it shows which replica is the current primary or if a replica is unavailable.

Replication Status Page: "osd0" is the backup replica for the open file, "osd1" the primary and "osd2" is currently unavailable.
Tutorial for Read/Write Replication Fail-Over
Do you want to see the new replication status page in action? We prepared a tutorial which walks you through the setup of a read/write replicated XtreemFS volume on a single machine. 

The tutorial lets you stream a video from the volume and simulate the outage of a replica. You'll learn about the details of the XtreemFS replication protocol and why the video stalls for some seconds and then playback resumes. 

XtreemFS in a Briefcase
Our friends at AlmereGrid put the tutorial to the next level: They created a setup of eight Raspberry Pi mini-computers running XtreemFS - packaged in a briefcase! Check their website CloudCase.eu for more details. Here's their video which shows the briefcase and the demonstrated fail-over:


CloudCase - XtreemFS Cloud file system demonstration from contrail-project.

Developing for XtreemFS
Did you know that you can use XtreemFS directly in your application with our C++ and Java client libraries? This way you avoid any overhead due to Fuse and can access advanced XtreemFS features which are only available through the maintenance tool "xtfsutil" otherwise e.g., adding replicas.

From using XtreemFS it's only a small step to dive into the XtreemFS source code itself. We collected several introductory documents for novices in a Google Drive folder "XtreemFS Public". For example, have a look how to setup the XtreemFS Server Java projects in Eclipse. Have fun!

Friday, May 17, 2013

Processing a MRC metadata dump with XSLT


TL;DR We describe how to dump the metadata of an XtreemFS installation to a XML file. The XML dump is filtered for files located on a specific OSD using XSLT. You can use this example for own analyzes of your file system's metadata.

At our institute we run an XtreemFS installation for scientific users. The installation spans 16 OSDs which are hosted at our site and are regularly accessed by three other institutes throughout Germany. During recent maintenance work we lost all chunks of one OSD by human error: I accidentally deleted all chunks of that OSD because I mistook the directory for a backup whereas it was the last remaining copy. Since the installation is meant for temporary scientific data, we decided against replication and backups at deployment to maximize the available capacity. (Single-disk failures are covered by the underlying RAID5 used on each OSD.)


Nonetheless, it was necessary to inform all users about their deleted files. Therefore, I had to find out which files were placed on the affected OSD. XtreemFS stores the list of replicas per file at the MRC (Metadata and Replica Catalog). The MRC allows to dump and restore the metadata in XML format. To find the affected files, I filtered the XML dump using XSLT. This blog post details the required steps. You can use the provided example to run your own analyzes on your file system's metadata.

Create a MRC database dump
You can use the XtreemFS tool xtfs_mrcdbtool to dump or restore the MRC database. The MRC will write/read the dump locally. Therefore, you have to specify where the MRC should write the dump on its machine:
xtfs_mrcdbtool -mrc mrc-host.example.com dump /tmp/dump.xml
This command will tell the MRC to write the database dump to the file /tmp/dump.xml. Make sure that the MRC has write permission for the given path. If you configured an "admin_password" for the MRC, you have to set the option --admin_password as well.

Filter the XML database dump using XSLT 
The MRC database dump is in XML format. The XML tree in the dump contains the file system tree of each volume.

You can use XSLT (Extensible Stylesheet Language Transformations) to filter the dump and transform the output to an even more human-readable form. I've added an example file to our code repository: filter_files.xslt You have to use a XSLT processor to transform the original XML dump. For example, use xsltproc:
xsltproc -o filtered_files_output.txt filter_files.xslt /tmp/dump.xml
The resulting file filtered_files_output.txt will have the following output format:
volume name/path on volume|creation time|file size|file's owner name
Modify the filter_files.xslt file to include or exclude other file attributes. This example handles only files which are (at least partially) placed on an OSD with the UUID "zib.mosgrid.osd15". This is realized by the following instruction in the XSLT file which limits the set of selected "file" elements:
<xsl:template match="file[xlocList/xloc/osd/@location='zib.mosgrid.osd15']">
Write your own XPath expression to realize own filters. If you want all files, just write match="file" without the brackets.

Tuesday, November 13, 2012

XtreemFS 1.4 released at Supercomputing 2012

Salt Lake City, Utah. Today we released XtreemFS 1.4, a new stable release of the cloud file system XtreemFS. This release is the result of almost one thousand changes ("commits") to the code repository, and extensive testing throughout the year. We worked both on major improvements to the existing code and new features:

  • Improved stability: Clients and servers are rock solid now. In particular, we fixed client crashes due to network timeouts and issues with the Read/Write file replication.
  • Asynchronous writes: Once enabled (mount option "--enable-async-writes"), write() requests will be executed in the background. This improves the write throughput without weakening semantics. We recommend to enable async writes.
  • Windows Client (beta): Complete rewrite based on the stable C++ libxtreemfs and using the Dokan alternative Callback File System by EldoS corporation. Try it by mounting our public demo server!
  • Hadoop support: Use XtreemFS as replacement for HDFS in your Hadoop setup. This version of XtreemFS comes with a rewritten Hadoop client based libxtreemfs for Java which also provides data locality information to Hadoop.
  • libxtreemfs for Java: Access XtreemFS directly from your Java application. See the user guide for more information.
  • Vivaldi integration: The Vivaldi replica placement and selection policies enable clients to select close-by replicas based on actual network latencies. These latencies are estimated using virtual network coordinates which are also visualized in the DIR web-interface. Check out the demonstration on the web-interface of our public demo server.
  • Extended OSD Selection: Now you can assign custom attributes to OSDs and limit the placement of files on OSDs based on those attributes.

This version also includes an updated version of the DIR/MRC replication and adds fail-over support for DIR replicas. As DIR/MRC replication is still in a very early stage this feature is intended as technology preview for more experimental users.

We are currently at the Supercomputing 2012 exhibition where we present XtreemFS at the Contrail booth #2535 as part of the Contrail project. Since the event takes place in Salt Lake City, Utah, we decided for "Salty Sticks" as release name for the 1.4 version.

Request for Contributions
As XtreemFS is an open source project, we are always looking forward to external contributions and we believe that this release serves as an ideal starting point for that. Here's an incomplete list of things you might be interested to contribute:
  • chef recipe or puppet configuration for automatic deployment
  • a fancy Qt GUI for the client
  • S3-compatible interface based on the client library libxtreemfs
  • direct integration with Qemu/KVM using the C++ libxtreemfs

XtreemFS Survey
At last, do not forget to fill out our survey if you use/have used/plan XtreemFS.