Parallel Data Laboratory Talk
- Remote Access - Zoom
- Virtual Presentation - ET
- QING ZHENG
- Ultra System Research Center
- Los Alamos National Laboratory
From NASD to DeltaFS: CMU and Los Alamos' Effort in Building Large-Scale Distributed Filesystem Metadata
It has been a tradition that, every once in a while, we stop and reassess whether we need to build our next filesystems differently. A key previous effort was made by the PDL's NASD project, which decoupled filesystem data communication from metadata management and leveraged object storage devices for scalable data access. Now, as we enter into the exascale age, once again, we need bold ideas for our filesystems if we are to keep up with the rapidly increasing scale of today's parallel computing environments.
In this presentation, we introduce DeltaFS, a research project at CMU and Los Alamos National Lab that rethinks application communication with a distributed parallel filesystem. DeltaFS is based on the premise that at exascale and beyond, synchronization of anything global should be avoided. Conventional parallel filesystems, with fully synchronous and consistent namespaces, mandate synchronization with every file create and other filesystem metadata operations. This is too expensive.
DeltaFS allows parallel computing jobs to self-commit their namespace changes to logs later published to a registry, avoiding the cost of constant global synchronization. Followup jobs selectively merge logs produced by previous jobs as needed, a principle that we term No Ground Truth which allows for scalable sequential data sharing without requiring all jobs to see a single filesystem namespace all the time. By following this principle, DeltaFS leans on the parallelism found when utilizing resources at the nodes where job processes run, improving metadata operation throughput as job processes increase. Synchronization is limited to an as-needed basis that is determined by the needs of followup jobs, through an efficient, log-structured format that lends itself to deep metadata writeback buffering and deferred metadata merging and compaction. Our evaluation shows that no ground truth enables more efficient inter-job communication, reducing overall workflow runtime by significantly improving client metadata operation throughput and resource usage.
Qing Zheng is a Scientist at Los Alamos National Lab's Ultra System Research Center. Before joining Los Alamos, Qing was a PhD student at Parallel Data Lab at Carnegie Mellon University. During his PhD time, Qing did research on distributed filesystem metadata and streaming data indexing. Qing's research was recognized by R&D 100 and Supercomputing Best Paper awards.
Zoom Participation. See announcement.