Thursday, March 27, 2014

Public demo server updated to XtreemFS 1.5

We updated our public demo server to XtreemFS 1.5. To tryout XtreemFS without setting up an own server, just install the client and mount our volume:

mkdir ~/xtreemfs_demo 
mount.xtreemfs demo.xtreemfs.org/demo ~/xtreemfs_demo 
cd ~/xtreemfs_demo

For testing you can create any directories and files as you like. Please do not upload anything illegal or copyrighted material. For legal reasons every file create/write is logged with the IP address and timestamp. Files are automatically deleted every hour.

Wednesday, March 12, 2014

XtreemFS 1.5 released: Improved support for Hadoop and SSDs

Berlin, Germany. Today, we released a new stable version of the cloud file system XtreemFS.
XtreemFS 1.5 (Codename "Wonderful Waffles") comes with the following major changes:

  • Improved Hadoop Support: Read and write buffers were added to improve the performance for small requests. We also implemented support for multiple volumes e.g., to store input and output on volumes with different replication policies.
  • SSDs support: So far, an OSD was optimized for rotating disks by using a single thread for disk accesses. Solid State Disks (SSDs) cope well with simultaneous requests and show a higher throughput with increased parallelism. To achieve more parallelism per OSD when using SSDs, multiple storage threads are supported now.
  • Multi-Homing Support: XtreemFS can be made available for multiple networks and clients will pick the correct address automatically.
  • Multiple OSDs per Machine: Machines with multiple disks have to run an OSD for each disk. We simplified this process with the new xtreemfs-osd-farm init.d script.
  • Bugfixes for Read/Write and Read-Only Replication: We fixed a problem which prevented read/write replicated files to fail-over correctly. Another problem was that the on-demand read-only replication could hang and access was stalled.
  • Replication Status Page: The DIR status page has got a visualization for the current replica status of open files. For example it shows which replica is the current primary or if a replica is unavailable.

Replication Status Page: "osd0" is the backup replica for the open file, "osd1" the primary and "osd2" is currently unavailable.
Tutorial for Read/Write Replication Fail-Over
Do you want to see the new replication status page in action? We prepared a tutorial which walks you through the setup of a read/write replicated XtreemFS volume on a single machine. 

The tutorial lets you stream a video from the volume and simulate the outage of a replica. You'll learn about the details of the XtreemFS replication protocol and why the video stalls for some seconds and then playback resumes. 

XtreemFS in a Briefcase
Our friends at AlmereGrid put the tutorial to the next level: They created a setup of eight Raspberry Pi mini-computers running XtreemFS - packaged in a briefcase! Check their website CloudCase.eu for more details. Here's their video which shows the briefcase and the demonstrated fail-over:


CloudCase - XtreemFS Cloud file system demonstration from contrail-project.

Developing for XtreemFS
Did you know that you can use XtreemFS directly in your application with our C++ and Java client libraries? This way you avoid any overhead due to Fuse and can access advanced XtreemFS features which are only available through the maintenance tool "xtfsutil" otherwise e.g., adding replicas.

From using XtreemFS it's only a small step to dive into the XtreemFS source code itself. We collected several introductory documents for novices in a Google Drive folder "XtreemFS Public". For example, have a look how to setup the XtreemFS Server Java projects in Eclipse. Have fun!

Friday, May 17, 2013

Processing a MRC metadata dump with XSLT


TL;DR We describe how to dump the metadata of an XtreemFS installation to a XML file. The XML dump is filtered for files located on a specific OSD using XSLT. You can use this example for own analyzes of your file system's metadata.

At our institute we run an XtreemFS installation for scientific users. The installation spans 16 OSDs which are hosted at our site and are regularly accessed by three other institutes throughout Germany. During recent maintenance work we lost all chunks of one OSD by human error: I accidentally deleted all chunks of that OSD because I mistook the directory for a backup whereas it was the last remaining copy. Since the installation is meant for temporary scientific data, we decided against replication and backups at deployment to maximize the available capacity. (Single-disk failures are covered by the underlying RAID5 used on each OSD.)


Nonetheless, it was necessary to inform all users about their deleted files. Therefore, I had to find out which files were placed on the affected OSD. XtreemFS stores the list of replicas per file at the MRC (Metadata and Replica Catalog). The MRC allows to dump and restore the metadata in XML format. To find the affected files, I filtered the XML dump using XSLT. This blog post details the required steps. You can use the provided example to run your own analyzes on your file system's metadata.

Create a MRC database dump
You can use the XtreemFS tool xtfs_mrcdbtool to dump or restore the MRC database. The MRC will write/read the dump locally. Therefore, you have to specify where the MRC should write the dump on its machine:
xtfs_mrcdbtool -mrc mrc-host.example.com dump /tmp/dump.xml
This command will tell the MRC to write the database dump to the file /tmp/dump.xml. Make sure that the MRC has write permission for the given path. If you configured an "admin_password" for the MRC, you have to set the option --admin_password as well.

Filter the XML database dump using XSLT 
The MRC database dump is in XML format. The XML tree in the dump contains the file system tree of each volume.

You can use XSLT (Extensible Stylesheet Language Transformations) to filter the dump and transform the output to an even more human-readable form. I've added an example file to our code repository: filter_files.xslt You have to use a XSLT processor to transform the original XML dump. For example, use xsltproc:
xsltproc -o filtered_files_output.txt filter_files.xslt /tmp/dump.xml
The resulting file filtered_files_output.txt will have the following output format:
volume name/path on volume|creation time|file size|file's owner name
Modify the filter_files.xslt file to include or exclude other file attributes. This example handles only files which are (at least partially) placed on an OSD with the UUID "zib.mosgrid.osd15". This is realized by the following instruction in the XSLT file which limits the set of selected "file" elements:
<xsl:template match="file[xlocList/xloc/osd/@location='zib.mosgrid.osd15']">
Write your own XPath expression to realize own filters. If you want all files, just write match="file" without the brackets.

Tuesday, November 13, 2012

XtreemFS 1.4 released at Supercomputing 2012

Salt Lake City, Utah. Today we released XtreemFS 1.4, a new stable release of the cloud file system XtreemFS. This release is the result of almost one thousand changes ("commits") to the code repository, and extensive testing throughout the year. We worked both on major improvements to the existing code and new features:

  • Improved stability: Clients and servers are rock solid now. In particular, we fixed client crashes due to network timeouts and issues with the Read/Write file replication.
  • Asynchronous writes: Once enabled (mount option "--enable-async-writes"), write() requests will be executed in the background. This improves the write throughput without weakening semantics. We recommend to enable async writes.
  • Windows Client (beta): Complete rewrite based on the stable C++ libxtreemfs and using the Dokan alternative Callback File System by EldoS corporation. Try it by mounting our public demo server!
  • Hadoop support: Use XtreemFS as replacement for HDFS in your Hadoop setup. This version of XtreemFS comes with a rewritten Hadoop client based libxtreemfs for Java which also provides data locality information to Hadoop.
  • libxtreemfs for Java: Access XtreemFS directly from your Java application. See the user guide for more information.
  • Vivaldi integration: The Vivaldi replica placement and selection policies enable clients to select close-by replicas based on actual network latencies. These latencies are estimated using virtual network coordinates which are also visualized in the DIR web-interface. Check out the demonstration on the web-interface of our public demo server.
  • Extended OSD Selection: Now you can assign custom attributes to OSDs and limit the placement of files on OSDs based on those attributes.

This version also includes an updated version of the DIR/MRC replication and adds fail-over support for DIR replicas. As DIR/MRC replication is still in a very early stage this feature is intended as technology preview for more experimental users.

We are currently at the Supercomputing 2012 exhibition where we present XtreemFS at the Contrail booth #2535 as part of the Contrail project. Since the event takes place in Salt Lake City, Utah, we decided for "Salty Sticks" as release name for the 1.4 version.

Request for Contributions
As XtreemFS is an open source project, we are always looking forward to external contributions and we believe that this release serves as an ideal starting point for that. Here's an incomplete list of things you might be interested to contribute:
  • chef recipe or puppet configuration for automatic deployment
  • a fancy Qt GUI for the client
  • S3-compatible interface based on the client library libxtreemfs
  • direct integration with Qemu/KVM using the C++ libxtreemfs

XtreemFS Survey
At last, do not forget to fill out our survey if you use/have used/plan XtreemFS.

Thursday, October 11, 2012

XtreemFS User Survey

XtreemFS is free software with an anonymous download, and therefore we
only know a fraction of our users. If you are using, have been using,
or plan to use XtreemFS, we would love to hear from you!

To that end, we ask you to fill out this year's XtreemFS survey:
https://docs.google.com/spreadsheet/viewform?formkey=dFNzSnQyb2VqTXZOSXJhVnlkc1FPQlE6MQ

We know that this will take a few minutes of your time, but your
responses will help us tremendously.

If you feel uncomfortable sharing specific information, just skip the
question. But be assured that your information will not be shared with
anyone.

For any questions or direct feedback, write to felix@xtreemfs.org

Wednesday, July 11, 2012

What is object-based storage (and what it is not)

TL;DR Object-based storage is a term that categorizes the internal architecture of a file system, it is not a particular features set or interface. While the internal architecture of a file system has many implications for its performance and features, its outer appearance remains that of a file system.

We have often stressed the fact that XtreemFS is an object-based file system. While talking to our users, however, we have realized that this term causes more confusion than enlightenment. I blame this poor choice on our academic ignorance, and I hope I can clean up the confusion a bit. In the end, object is not a very descriptive term and most people associate it with object-oriented programming (totally unrelated) or the objects in Amazon's S3 system (only somewhat related).

In storage, an object is a variable-sized, but limited container of bytes. You probably wonder why this trivial concept deserves its own term and became relevant to the storage community at all. Well, this has mostly two aspects - first the name itself, then its main property, namely the fact that it is variable-sized.

Block, Blocks, Blocks

While storage hardware keeps a series of bytes, no storage hardware exports byte-level interfaces (disks, tapes, flash, even RAM). The reason is efficiency: addressing single bytes would yield long and many addresses (metadata overhead), but also reading and writing single bytes is inefficient (think checksums, latency, seeking, etc). The unit that is actually used is blocks, a fixed-size container of bytes.

File systems organize blocks into larger and variable-sized containers. This is also true for distributed file systems. As many distributed file systems do not run on bare hardware, they can actually chose a certain block size. There is wide range of file systems, where the block size for all files in the file system is fixed. Such a system is not very flexible: you need to chose a block size that fits all and in turn all you file sizes should have simliar size. There was a saying about Google's GFS (a block-based file system with a 64MB block size): it can hold any set of files, as long as they're large and not too many.

There is a second aspect of blocks shared between local and distributed file systems. Blocks are agnostic about files, ie. a block does not know which file it belongs to. While that's a no-brainer for local file systems, storage servers of block-based distributed file systems are somewhat degraded because they only store anonymous blocks. Only the metadata server knows how the blocks make up files.

Here come the objects

You can imagine the joy in the storage community when systems and standards arrived that allowed choosing a block-size per file. This innovation deserved a new term: object. Objects have also a second aspect that makes them great for distributed file system architectures: they raise the abstraction level a bit by making the storage server aware of the object's belonging. Objects are not addressed by a file-agnostic block identifier as blocks are, but by file identifier and sequential object number. This has many advantages for the architecture as storage servers can actually host file system logic (like for replication), which they are equipped for when they run on commodity hardware.

As I hinted earlier: objects are not super-relevant for the user, because all file systems make you work with files (XtreemFS even posix files). And Amazon's S3 objects are not the objects we are talking about here, because they are not size-limited. They are rather files without a hierarchical namespace.

Wednesday, November 2, 2011

XtreemFS 1.3.1 Available Now!

XtreemFS 1.3.1 is available now, which adds some minor features and fixes a wide range of bugs that were disclosed since the previous stable release.

Bug fixes mainly relate to the new client and the replication infrastructure. We included the following features:
  • Metadata replication. Although XtreemFS 1.3.0 already provides some preliminary support for MRC and DIR replication, the feature turned out to be fairly unstable. With XtreemFS 1.3.1, we took a big step forward in this respect. Especially the MRC replication has been thoroughly tested now and offers automatic fail-over on the client side.
  • Asynchronous writes. We enhanced the client with preliminary support for asynchronous writes. However, the feature is currently limited to non-replicated and failure-free scenarios, as it has not yet been integrated with the client's internal retry mechanism.
  • Monitoring. We added a service monitoring infrastructure. It is based on SNMP and provides information about the internal state of an XtreemFS service, such as the current memory consumption, I/O throughput, the number of stored files, etc. We also added a corresponding Ganglia plug-in.
  • OSD drain tool. We included a utility to remove OSDs from an XtreemFS installation. The tool relocates all files from the respective OSD to other OSDs and gracefully shuts down the OSD.
  • Gentoo overlay. To simplify the use of XtreemFS for our Gentoo users, we added a Gentoo overlay for installing XtreemFS on Gentoo.
For a more detailed overview of the changes, please refer to our change log. We also updated the list of known issues and moved them to the issue tracker.

We further noticed that we caused some confusion with the 1.3.0 release, as it was sometimes referred to as a release candidate. The website as well as the servers and clients themselves always stated "1.3.0 RC1" as version. However, the packaging did not allow us to release a "1.3.0-RC1" version and therefore we ended up publishing "1.3.0" packages. Blogs and news websites also referred to the released version as "1.3.0". So, we'll leave it that way: XtreemFS version 1.3.0 RC1 is regarded as 1.3.0 and now we're releasing the next version, 1.3.1.

Since we released XtreemFS 1.3.0 in August, we got quite a lot of feedback on the mailing list - thanks a lot to our user community for consistently helping us to improve XtreemFS!

To be able to quickly respond to the needs of our users, we decided to establish a separate repository with unstable packages, which we update frequently. Unstable packages are less thoroughly tested than stable releases, but they allow us to fix bugs and provide new features on short notice. A link to the unstable repository can now be found on our website at http://www.xtreemfs.org/download.php#unstable.