Update: Thecus N5200 – RAID Repair

Well, I have to admit: I am one lucky guy. I have been running a truly degraded RAID array since day one of my experience with the N5200. To get up to speed you may want to read my other post about the device: Thecus N5200 Review

So, today my replacement hard drive arrived for a failure I noticed recently. I took a look at the RAID and double checked the reason for the replacement. My #2 drive was showing some bad sectors and was listed as a ‘warning’. Not a failed drive but a drive on the way. Better safe then sorry I thought.

I traveled down the steps into the basement where I keep my servers and pulled out Drive 2. The N5200 freaked out with a bunch of beeps that were immediately followed by the device sending me emails screaming about how the array was dead, data was gone and that I should give up living. My first through was… ‘Wait.. What?’ I mean, isn’t the point of RAID 5 that you can loose a drive and still have all your data? I quickly popped Drive 2 back into the device and check to see if my data was still there. Thankfully it was.

At this point I was pretty confused. I knew that my N5200 was telling my that my array was degraded but that all of my drives were good, only drive 2 had a couple of errors but was still fully functional. I had done as much research as I could into the degraded status but Thecus support is really really bad (Read: Doesn’t Exist) so I was basically on my own. The only thing that I had to go on was that good arrays will sometime show as degraded if the device is not rebooted properly (according to a ‘known issue’ for the firmware I was running) and it seemed reasonable enough to me that the device had gone through a power outage and caused the error. All my drives were fine so I continued on with life.

What had happened in reality is that the N5200 built my array without drive 4. Yes, it failed to correctly create the array from day one and never said anything about it to me. The worst part is no where in the interface does it show that. There is no place that tells you the actual status of your array in any detail.

The device is linux based and uses the linux software raid tools, called mdadm, to create the array. That is fine with me, mdadm is great software and works great. I use it all the time in my other linux installations. The problem is that the N5200 does not provide you the information mdadm provides about the array.

To find out all of this information the story begins to get complicated. Basically I was searching through the web interface for the device and stumbled on a section talking about ‘Module Management.’ I did a quick search on Google and voila I found a listing of modules known to work on the N5200. I scanned the list and found two that I liked. One allowed SSH access to the unit, (a very common text based method to control Linux and Unix servers) and a module that created a system level user to allow you to have some power when you logged in.

After installing the modules I logged in as described in the docs for the two modules and ran the command ‘more /proc/mdstat’. What this does is tells the system to report to me what the status of the mdadm raid array is. When I saw the output I was greeted with the system telling me that drive number 4 was not part of the RAID array. I again searched the web based interface to see if I could confirm that in anyway knowing now what to look for but i was unable to find any mention. The web interface, though saying degraded, made no mention of why it was degraded or how and the logs for the system were no more helpful. This is where things got messy…

In order to replaced my failed disk I needed to rebuild the array first using drive 4, otherwise I would be missing two drives from the array and my data would be lost. I tried to rebuild 6 times, each time the process failed. I was stuck. I couldn’t swap out the bad drive and I couldn’t rebuild onto the drive that was missing from the array because the bad drive couldn’t handle the load.

I powered the unit down, brought it over to a computer and copied off all of the data that I could. Saving files I could not easily replace and hoping it would last long enough for the process to complete. My desk was covered in cables and drives. I was like a USB Medusa.

Once that was as complete as I could make it I completely erased the N5200. Deleted the array and swapped out the bad drive. Once done I rebuilt my array and hoped for the best. After the build had completed I was sure to check to make sure the array was doing what I expected. Can’t be too careful it seems…

Over all this was a pretty bad experience. What good is a device that doesn’t tell you what is happening? I would not recommend this to anyone and I’m just thrilled that I was not using this device is a work setting where data loss is a much larger and often times legal issue. If the device can’t be trusted to keep my data safe then what is the point of having it? I was better off using a raid card in a Linux system… Stay away… stay far far away…

Topslakr

3 Replies to “Update: Thecus N5200 – RAID Repair”

  1. I experienced very similar problems with my Thecus N4310! It’s been unstable, even with perfectly good brand new out-the-box HDDs… and randomly looses one or two of the RAID-6 drives, which take forever to rebuild.

    I’ve not had a good experience with Thecus in any way… even the set-up and integration with some Windows clients was a headache.

    I won’t touch anything from Thecus again.

    Gary.

  2. Incidentally, I see there are other people with issues with the Thecus N5200: h**ps://forums.tweaktown.com/publication-discussion/25552-thecus-n5200-pro-nas-device.html

    Gary

  3. I’ve made peace with my Thecus hardware at this point, and actually I’ve bought a few more units on eBay lately. I find they are great, cheap, linux machines and since I’ve stopped using the Thecus software, I’ve been very happy. I have several running Centos 7 and ZFS arrays.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.