What’s Going On With Me At Work

My backup server stopped working after a scheduled power outage. I thought we had reached a six sigma situation. Part of my job is to consider how likely things are to happen to determine just how prepared I need to be, the more prepared you need to be, the more money your infrastructure costs.

I had designed our backups to be able to withstand 3 failed hard drives in a span of a week, but now I was seeing abnormalities in 4 hard drives before the server stopped responding. I spent a week trying to revive the 80tb+ of backup data that had accumulated. I was only able to recover 20tb and spent the next week re-generating the rest. For me it was a nail-biter, I don’t like having redundancy that low. 

A month later I realized that the real problem was that the UPS battery was degraded and that the server was entering directly into safety-frozen mode to prevent data-loss should the battery fail which is why the server seemed dead. Facepalm. I was solving the wrong problem.

I am ordering some new UPS’ to replace our current ageing ones.

By Jaime

Happily married and proud father, most often found in Málaga, Spain.

