Speakers: Markus Wittmann and Georg Hager, NHR@FAU
Slides available at https://hpc.fau.de/files/2022/01/2022-01-11-hpc-cafe-file-systems.pdf
We provide some guidelines for handling large collections of files in your batch jobs on our systems. We have observed phases of heavy overload on NFS file servers when certain types of jobs are running. This is caused by jobs which handle data ineffectively - especially data scattered over many thousands of files, but also data that is accessed frequently -, thereby slowing down file operations to a crawl. This has an impact on all users, not only those who actually cause the problem.
In this talk we give a quick overview of the available file systems and show you some strategies to avoid such situations by using local disks within the compute nodes instead of the shared NFS servers.