[volunteergrid2-l] disk drive failure statistics

erpo41 at gmail.com erpo41 at gmail.com
Sat Apr 7 02:59:27 UTC 2012


If this is too far off topic for the list, please let me know...

Have you ever noticed that almost everyone seems to have an opinion on
which disk drive manufacturers make the most or least reliable disks?
Have you ever noticed that those people haven't owned a random sample
of more than 1000 drives and haven't kept detailed statistics on which
drives failed and when?

In February of 2007, Google published a paper titled "Failure Trends
in a Large Disk Drive Population"
(http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/disk_failures.pdf)
analyzing the impact of various environmental factors and SMART
readings on disk drive failure rates. This paragraph really caught my
attention:

"Failure rates are known to be highly correlated with drive
models, manufacturers and vintages [18]. Our results do
not contradict this fact. For example, Figure 2 changes
significantly when we normalize failure rates per each
drive model. Most age-related results are impacted by
drive vintages. However, in this paper, we do not show a
breakdown of drives per manufacturer, model, or vintage
due to the proprietary nature of these data."

I don't know about anyone else, but I want that data so I can choose
the most reliable hard drives from the most reliable manufacturers.
Furthermore, I want that data to be made public so hard drive
manufacturers will face real pressure to improve reliability.

I've thought about several schemes for collecting this data from PCs
across the world, but that effort is complicated by the fact that most
desktops are not on and connected to the Internet 24/7. If a PC is off
when its disk fails, or if it's not connected to the Internet, it
won't be able to report the failure ever.

I think you see where I'm going with this. Tahoe-LAFS/VG2 may be the
ideal way to collect this type of data. So, two questions:

1. Is there any reason why someone would object to having the tahoe
client/server collect disk failure statistics and report them to a
central server? Should this feature be opt-in or opt-out?

2. Does anyone see any potential for error in this scheme?

Thanks,
Eric


More information about the volunteergrid2-l mailing list