为什么用rsync删除大量文件的时候比用rm快
今天研究怎么在Linux下快速删除大量文件,
搜到很多人都说可以用rsync来删除大量文件,
速度比rm要快很多,但是没有人说为什么,
仔细研究了一下原因,总结起来大概就是,一个是列出文件的时候,
在文件非常多的时候会导致慢, 另外就是删除导致B树rebanlance导致开销
rsync减少了这种开销,所以速度比rm要快
==转===========
rm on a directory with millions of files
When a directory needs to be listed readdir() is called on the
directory which yields a list of files. readdir is a posix call,
but the real linux system call being used here is called
'getdents'. Getdents list directory entries by filling a buffer is
entries.
The problem is mainly down to the fact that that readdir()
uses a fixed buffer size of 32Kb to fetch files. As a directory
gets larger and larger (the size increases as files are added) ext3
gets slower and slower to fetch entries and additional readdir's
32Kb buffer size is only sufficient to include a fraction of the
entries in the directory. This causes readdir to loop over and over
and invoke the expensive system call over and over.
。。。。
I revisited this today, because most filesystems store their
directory structures in a btree format, the order of which you
delete files is also important. One needs to avoid rebalancing the
btree when you perform the unlink.
As such I added a sort before deletes occur. The program will
now (on my system) delete 1000000 files in 43 seconds. The closest
program to this was rsync -a --delete which took 60 seconds (which
also does deletions in-order, too but does not perform an efficient
directory lookup).
Efficiently delete large directory containing thousands of
files
linux-快速删除五十万+文件的方法
linux下面快速删除大量文件及快速复制大量小文件