Tuesday, February 6, 2007

OpenAFS

Yesterday, I attended a local Ubuntu user groups meeting about OpenAFS. OpenAFS is a network file system like NFS. AFS was developed by IBM, but they open sourced it a few years ago. The Swedish university KTH have developed their own variant called Arla.
Several major universities are using AFS, like KTH, Stanford and CMU.

The first question you may ask your self is Why AFS instead of NFS?
The normal NFS versions is 2 or 3. These versions is bad when using them over internet or large WAN. If you use the normal Unix security mechanism, its insecure. Finally, if a file area moves from one file server to another you have to remount it on each client mounting it. AFS solves these problems.

Are there any drawbacks? Unfortunately, yes. You must have a kerberos server and it is more complex to set up than NFS or SMB.

An AFS system consists of three different components; File server, database server and client. All three can be running on nearly any operating system. Clients are the computers that are mounting the file systems. All AFS file systems are mounted at /afs. You can then use symbolic links or mount bind to relocate them in your file system tree.

The file servers are where the files are stored. An AFS system can have several file servers.

Finally, there are database servers. It is here the magic happens. The database server knows what file servers we have and on which file server each data are stored. There can be several database servers that replicate the information between each other.

AFS works well over WAN and Internet due to two reasons. The clients have a large cache and if this is not enough a data area can be read only replicated to a file server close to the client. the client is then reading data from the file server close to it, but writes to the orginal far away. Where to read from and write to is handles by the database servers.

The security is solved using kerberos.

If one file server becomes full, you can move some of the data to another file server. the database servers will then tell the clients where to find the moved data.

I have since a long time ago been running kerberos at home. I am now thinking about if I want to run AFS at home. But I do not know if it is worth the extra complexity. If I had to handle file servers for a distributed complex organization, I would take a closer look at OpenAFS.

I think I will prefer NFSv4 which solves most of the problems.

You can find OpenAFS packages in the Ubuntu universe repository.

1 comment:

Erik Forsberg said...

Apart from the complex setup required to run AFS - I configured OpenAFS a few years ago, and it was not a walk in the park.., there are also other things to think about :

*) AFS is not POSIX. This means for example that when you do close() on a file, you can't be sure that it is synced to persistent storage on disk. It may be in the client cache. Also, I don't think there's any guarantees that other clients will see the changes immediately.

*) Given the need for callbacks to the clients, it's hard for AFS to operate in a firewalled environment. At least, I believe the callbacks from server (with information such that "hey, this file has changed, reload it in your cache!" need to connect directly to the client.

Regarding NFSv4.. well, let's hope the implementation in Linux gets mature enough to use soon.. Personally, I would have liked if the Linux NFS people would have concentrated on fixing bugs and performance trouble in the NFS3 client and server...