Hopkins Storage Systems Lab

Storage and Database Systems for Science and Engineering

  • Increase font size
  • Default font size
  • Decrease font size
Home Research Past Projects Bypass-Yield Caching

Bypass-Yield Caching

starfolder_nofireWorkloads generated by astronomy queries over the Internet cannot be cached by existing distributed caching techniques. In particular, both Web caching, in proxies and browsers, and semantic (query) caching cannot address the large data sizes in astronomy queries. Specifically, astronomy workloads do not exhibit the query reuse and query containment upon which semantic (query) caching relies. Also, astronomy queries transfer large data items frequently, which flushes Web caches. Scientific database federations are geographically distributed and network bound. Thus, they could benefit from proxy caching. However, existing caching techniques are not suitable for their workloads, which compare and join large data sets. Existing techniques reduce parallelism by conducting distributed queries in a single cache and lose the data reduction benefits of performing selections at each database. We have developed the bypass-yield formulation of caching, which reduces network traffic in wide-area database federations, while preserving parallelism and data reduction. Bypass-yield caching is altruistic; caches minimize the overall network traffic generated by the federation, rather than focusing on local performance. We have developed an adaptive, workload-driven algorithm for managing a bypass-yield cache. We also have developed on-line algorithms that make no assumptions about workload: a k-competitive deterministic algorithm and a randomized algorithm with minimal space complexity. We have verified the efficacy of bypass-yield caching by running workload traces collected from the Sloan Digital Sky Survey through a prototype implementation.

Bypass-Yield Caching for Large-Scale Scientific Database Workloads in the World-Wide Telescope

PI Randal Burns, Department of Computer Science, Johns Hopkins University
co-PIs Ani Thakar, Center for Astrophysical Sciences, Johns Hopkins University
NSF Award IIS-0430848, 10/01/2004-9/30/2007

The World-Wide Telescope (WWT) is a virtual observatory that federates astronomy and astrophysics databases at a global scale, with the ultimate goal of unifying all on-line data and making it available to everyone from everywhere. It dramatically improves the ability to perform multi-spectral and temporal studies by allowing researchers to access many databases with a single query. In its current form, increasing the number of sites and users in the WWT leads inevitably to a network crisis. As data-intensive scientific applications increase in scale, bandwidth constrains the performance of all applications sharing a network. As more scientists and educators adopt and rely on the WWT, the increased bandwidth requirements will degrade the performance of all applications. Federations need to focus on being good "network citizens," using shared resources conscientiously. If not, the workloads generated by these applications will make them unwelcome on public networks.

To avert the network crisis, this project will develop and release an open-source, commodity caching appliance based on two crucial technologies: bypass-yield caching and self-organizing database storage. Bypass-yield caching is an altruistic caching framework for scientific database workloads that balances parallelism in federations against the benefits of caching. It adopts "network citizenship" as its principal goal -- caching in order to minimize network traffic. Database caching introduces an acute storage management problem for which traditional administration is inappropriate. The dynamic creation and destruction of tables in a cache requires automated, incremental storage management with low space overhead. Self-organizing database storage automates storage management and database organization, turning the cache into an administration-free appliance.

Caching appliances are an enabling technology, making it possible for the WWT to accept a large number of users without impeding the performance of shared networks. Open-source software and commodity hardware make the acquisition of the appliance inexpensive and the self-organization of the cache makes it maintenance-free. The caching appliance will enhance astrophysical and astronomy research, making it possible for scientists to conduct experiments and find correlations across heterogeneous data sets at previously unforeseen rates. The WWT and the caching gateways will also bring telescope research and education to communities of users for which it was previously unavailable, particularly undergraduates and high school students. Project plans include outreach in the form of a pilot program that will install and maintain WWT gateways at high schools, colleges, and science museums and libraries, and assist those institutions in curriculum development.

Reports

Publications (from this project)

Personnel:

Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

 

Last Updated on Sunday, 25 January 2009 20:53