Currently, resource sharing and performance management for high-performance distributed file systems pursue incomplete or incorrect goals. Server-centric performance metrics, such as I/O latency and throughput, are insensitive to client state and cannot distinguish the urgency of file system operations by I/O type or among different clients. Storage networks rely on flow control to apportion bandwidth, which is irrelevant for all configurations that are not network bound. Client acts selfishly, trying to maximize their throughput, which congests server and network resources.
In response, we are developing holistic resource management algorithms for high-performance distributed file systems that use online auctions to maximize application-perceived performance. The system is holistic in that it manages all resources, including network bandwidth, server I/O (throughput and IOPS), server CPU, and client and server memory utilization. Online auctions unify multiple heterogeneous resources in a single pricing model, which allows the system to adapt to different configurations and time-varying or workload-varying resource constraints. The focus on application-perceived performance ensures that optimization goals benefit the system’s users (not its servers).
We have also introduced a new dimension in resource management for distributed file systems by managing adaptively the global allocation of memory among clients and servers and the assignment of memory to read caching and write buffering. Emerging high-speed networking technologies bring memory limitations to the forefront, exposing systems to throughput crashes and applications stalls.
We are implementing these techniques in the congestion-aware network file system (CA-NFS), which extends the NFSv4 file system to implement auctions and pricing and the Linux memory manager to implement adaptive read/write scheduling and memory management.



Research