Research

Our research is currently divided in three main topics:

Software-Defined Storage

The work on software-defined storage (SDS) aims at building a new generation of adaptable and programmable storage solutions that can automatically and efficiently leverage the storage of heterogeneous clusters, IoT devices, as well as cloud, HPC and AI storage services. These solutions, built along a control and a data plane, need to support multiple combinations of well-known distinct storage specializations such as load balancing, caching, replication, security, data reduction (data plane). These combinations need to be automatically configured and managed in order to suit different application performance, energy, security, and dependability requirements (control plane). Namely, we are working on novel:

  • architectures and designs for SDS data and control planes,
  • stackable and programmable storage solutions,
  • user space frameworks for easing the implementation of complex storage solutions.

Selected publications

  • Taming Metadata-intensive HPC Jobs Through Dynamic, Application-agnostic QoS Control. Macedo R, Miranda M, Tanimura Y, Haga J, Ruhela A, Harrell S, Evans T, Pereira J, Paulo J. IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid). 2023
  • PAIO: General, Portable I/O Optimizations With Minor Application Modifications. Macedo R, Tanimura Y, Haga J, Chidambaram V, Pereira J, Paulo J.. USENIX Conference on File and Storage Technologies (FAST). 2022
  • A Survey and Classification of Software-Defined Storage Systems. Macedo R, Paulo J, Pereira J, Bessani, A. ACM Computing Surveys. 2020

Storage Benchmarking and Diagnosing

As the complexity of current storage solutions grows, it becomes increasingly harder to find proper benchmarking and diagnosis tools to assess these systems’ performance, resiliency and security. Our goals for this topic are to design:

  • benchmarking solutions that can accurately evaluate storage systems by providing features such as realistic content generation, storage access patterns, data integrity validation, and fault injection,
  • benchmarking tools that ease the setup, reproducibility and analysis of experiments,
  • scalable black-box monitoring and diagnosis solutions for complex data-centric applications and systems.

Selected publications

  • Toward a practical and timely diagnosis of applications’ I/O behavior. Esteves T, Macedo R, Oliveira R, Paulo J. IEEE Access. 2023
  • CRIBA: A Tool for Comprehensive Analysis of Cryptographic Ransomware's I/O Behavior. Esteves T, Pereira B, P. Oliveira R, Marco J, Paulo J. IEEE International Symposium on Reliable Distributed Systems (SRDS). 2023
  • CaT: Content-aware Tracing and Analysis for Distributed Systems. Esteves T, Neves F, Oliveira R, Paulo J.. ACM/IFIP Middleware conference (Middleware). 2021

Storage Optimizations

With the exponential increase of digital information it becomes critical to find novel designs and storage optimizations that can cope with the storage and retrieval of large amounts of data in a efficient, secure and dependable fashion. Our main research interests include:

  • efficient, dependable and secure data reduction techniques,
  • large-scale storage solutions,
  • storage optimizations tailored for HPC and AI workloads.

Selected publications

  • Accelerating Deep Learning Training Through Transparent Storage Tiering. Dantas M, Leitão D, Cui P, Macedo R, Liu X, Xu W, Paulo J. IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid). 2022
  • S2Dedup: SGX-enabled Secure Deduplication. Miranda M, Esteves T, Portela B, Paulo J. ACM International Systems and Storage Conference (SYSTOR). 2021
  • GenoDedup: Similarity-Based Deduplication and Delta-Encoding for Genome Sequencing Data. Cogo V, Paulo J, Bessani, A. IEEE Transactions on Computers. 2021