Wednesday, October 12, 2011

Blackberry dual outage highlights the need for redundancy in enterprise systems

This week Blackberry has been hit by two outages, both of which appear to be caused by single points of failure (SPOF) within the RIM infrastructure.

In the news today Blackberry said "The messaging and browsing delays... in Europe, the Middle East, Africa, India, Brazil, Chile and Argentina were caused by a core switch failure within RIM's infrastructure" (Source: http://www.bbc.co.uk/news/technology-15243892).They also said "

"Although the system is designed to failover to a back-up switch, the failover did not function as previously tested," (Source: http://news.cnet.com/8301-30686_3-20118882-266/international-blackberry-outage-continues/)

This immediately causes me to ask a few questions:
  • Why wasn't the fail over triggered manually?
  • What was missed in the testing of the switches fail over?
  • Was this an existing issue?
  • When was the DR plan last tested?
  • Had changes been made which invalidated the DR plan?
And this is the major point of DR testing. You can't over everything so sometimes you will have to learn from failures that impact you and incorporate those failure modes into future testing but you should also have the ability to be able to manually failover to be able to quickly recover from a systems problem.
You also have to be absolutely aware of changes that are made which could affect your DR plans and this means every change has to be screened to ensure that you aren't creating an SPOF or that if you are then everyone is aware of it and plans are put forward to plug that gap.

The key in any major outage is to get the system back up, even if it means failing over manually - however, any steps taken to recover the service should be noted in an emergency change request of some description and once this is done and the systems have been recovered it is vital that the change notice is thoroughly reviewed to find out both what went wrong and what could go wrong because recovering from an outage is one thing but it's all for naught if that recovery leads to a potential problem which will bite you later on.
ITIL processes teach a lot of this and implementing these practices can be a pain but its a choice. You either suffer the pain of the paperwork or the pain of the outage.

At least if potential problems are known about they can be more easily dealt with when they appear and bite you and they will appear.

The mobile industry is very much a cut throat industry and this dual outage with Blackberry will do them no good at all because others will seize upon it as a sign of Blackberries weak infrastructure and they will be right.

To recover from this Blackberry need to do a through review of their systems and DR processes and ensure that if this happens again they have the ability to recover from it very rapidly. They are, after all, reliant on their userbase for their income and they have failed a major test.

Friday, September 02, 2011

Looking at Home Storage systems - OpenFiler

Like many geeks I've got a considerable amount of storage at home - currently, that's around 7TB split between various storage systems and it doesn't include space provided by Hard Drives in various bits of hardware I have scattered around the study.

After HP introduced their Microserver offer I decided to get one simply to throw 4 x1TB Hard drives I had floating around into it just to provide additional storage and I thought that Openfiler booting from a 16GB USB stick would be perfect for this job.

It wasn't. I've found that openfiler is a bit cumbersome to get to grips with, the menus don't link from one section to another. So, for example, if you want to create a Windows share it's not obvious where you go - there is nothing for Windows or CIFS on the main menu and once you find it you'll often find that there is a pre-requisitie you need to configure first and thats on a different menu so you have to start again!

None of this would be a major issues except for one thing.

It's slow. Really, really painfully, awfully slow.

I know that the Microserver ships with just 1GB of RAM and that an openfiler system really needs 2GB minimum but menus should not take 5 minutes+ to respond to a click and I notice that other people have been complaining about the same issues.

So, for me Openfiler works, is clunky and slow. I'm going to replace it this weekend with FreeNAS.

Thursday, June 30, 2011

The issue with antivirus software (2)

News has emerged of a new botnet set up that is trying to be indestuctible thanks to hiding in the MBR. According to the BBC article 'Code that hijacks a PC hides in places security software rarely looks and the botnet is controlled using custom-made encryption.' then goes on to say The virus installs itself in a Windows system file known as the master boot record. This file holds the list of instructions to get a computer started and is a good place to hide because it is rarely scanned by standard anti-virus programs. 

Excuse me? MBR viruses are not exactly a new thing. They existed back in Novell days and it was a pain because you'd have to shutdown netware to get to the DOS area to fix the damn thing. To me, this is yet again pointing out the flaw of AV software. It's being lazy and not doing it's job properly.

AV Software is basically arse about face. It scans for things that SHOULD NOT be there whereas it should be scanning for things that SHOULD BE there and considering everything else a threat. It really shouldn't be too much difficulty to have a database of common windows files and the most popular applications/games/utilities in use today along with MD5 hashes and scan against those to ensure the integrity of the system.

Both Vista and Win7 go some way to doing this with things like UAC but UAC needs to be a little more friendly and more granular to configure. If UAC could be configured to stop things editing start up locations without user consent and from modifying key system attributes then Anti-Virus software could start it's very welcome decline into obscurity.

Tuesday, June 07, 2011

The issue with antivirus software

I hate anti-virus software. I really do hate the stuff. This is not a mere dislike but an actual hatred.

The reason for this is quite simple. In IT security terms any security you deploy needs to do it's job with minimal fuss. Too much fuss and the security system outweighs its usefulness and after many tussles with anti-virus software I have come to the conclusion that AV software is a waste of time.

AV software is still far too reactive. It absolutely must have the latest definition files to have any hope of finding anything bad trying to infect the machine and even with all the heuristics switched on they don't seem to have much luck.

As an example, I do all my web browsing in a sandbox thanks to a nice tool called Sandboxie. This tool allows for a sandbox to be created which will contain any downloads, requested  or otherwise, in the sandbox. This means that if a virus gets onto the machine it'll be contained and this exact scenario happened to me not too long ago thanks to a mistyped URL. Examining the contents of the sandbox I saw a very suspicious file which I submitted to VirusTotal. The results from that site are below.


Only four anti-virus programs all with the latest definitions actually spotted a harmful file. The others would have quite happily allowed the application to run and wreck havoc. Not good at all.

It is my belief that the best security is no longer in anti-virus software but in applications which prevent suspicious activity just like the UAC tools Microsoft are now introducing but this technology needs to go further and it should be possible to have as part of the boot process a system which scans active files to ensure that no changes have happened since the last boot and if required revert or delete those files.

Along with these systems I firmly believe that production computers, that is, office computers with email and corporate applications need to be locked down much tighter. Server hardening and desktop hardening need to move forward and better security is needed for portable devices so that they can only work on specific systems. The whole desktop security culture needs a huge revamp and anti-virus software needs to be consigned to the same bin as the floppy disk.

Friday, April 22, 2011

Cloud Computing - Amazon Outage

Of course, just as I extol the virtues of cloud computing and talk about how much I've moved into the cloud Amazon suffers an outage. Opps.

Well, yes and no to the opps. A lot of people have been saying that as it was one of five Amazon datacenters that suffered this outage systems and services should have automatically recovered at another site.
Well, that's not true. you have to remember that Amazon operates it's data centers just like virtual copies of a real datacenter.
What I mean by that is that if you have a service you host in your own data center then lose that data center you'll lose the service. It's up to you as the admin/developer/owner of the service to make sure that you have redundancy set up in another location be that another Amazon datacenter or a datacenter under your own control.

As I said in my previous blog posting, cloud computing is not a panacea and you have to be careful how you use it. This outage is a classic case in point of that comment.

Wednesday, April 20, 2011

Renaming local administrator accounts - good or bad?

A lot of the time I hear the following statement 'Renaming the local administrator account makes it secure'.

No, it doesn't. Renaming the local administrator account just leaves you with a renamed local administrator account. It only makes it secure from people who are too dumb to read SID's but overall adds very little in the scheme of security.

In Windows, the local administrator account, no matter what it is named will always have a SID ending -500. Guest is -501

With that information and a couple of tools you can list out the local accounts, find the administrator and attack the account. Of course, if you have physical access to the hard drive and the drive doesn't use any form of encryption there are plenty of password reset tools out there.

Sunday, April 10, 2011

Moving into the cloud

Without realising it I've found myself moving more things into the cloud. I'm not exactly reliant on the stuff that's in the cloud but I'm certainly using more services out there and it would be an inconvience if I lost those services. I guess you could say that the cloud has been creeping up on me.

It started out just after I got married. The photographer used digital media for the wedding photos and provided them on a DVD. Obviously these needed to be stored somewhere safe and the idea of keeping the DVD in the house where I could lose, throw it away without realising it (would be hard but this is ME I'm talking about) or potentially lose it in an accident/fire/theft not things you want to think about but you must when you are talking about this sort of data.

So, I started looking around for offsite storage. Inititally the thought was to rent a location or something and leave a copy of the DVD there. A bit like a safe deposit box but with easier access then I came across Amazon's S3. This is cloud storage. Absolutely massive cloud storage at that with unlimited space for the user - you pay for what you use and when you look at the amount of data you hold that you really do need backed up it isn't that much.

I have a rule for backup data, I'll only back up data that can't easily be recreated or downloaded. So documents, excel work, password databases, etc, etc and I do this on a monthly or semi-monthly basis with the occasional ad-hoc backup for something specific. So far I'm paying just a few dollars a month for the service and that translates into less than £5 a month. Is your data worth that?

Alongside S3 you've also got Amazon's EC2 (elastic cloud compute) basically virtual servers that you can use. Amazon give you administrator or root access to the machine and lets you get with it. Whilst the server is on you are paying for it. Whilst it's off you pay for the storage. This provides a really nice environment for scenario testing or for externally hosting something a web provider won't allow or like. For example, I'm using an Amazon EC2 service to host a Quake server - just for experimental purposes you understand!


Finally, the last cloud enabled application that I've found that I can't live without is DropBox. For me, this is a killer cloud application.
Dropbox provides an online file storage solution. That's all it does but it does it in such a clever and useful way that it's now invaluable.

All you do is install it and by default you get 2GB for free which you can see via my documents/my dropbox. I've got this installed at home and at work and what it means is that I can drop a document into dropbox and have it available for viewing/editing at home.

This avoids all the complications of having to remember to copy a file to a USB stick and of taking (and possibly losing) the USB stick on the train or in the back of a Taxi. It's out there in the cloud.
The way dropbox works is simply to sync everything in the My Dropbox folder back to the dropbox servers. this means that it'll even work when offline and simply sync the files up when you have an internet connection available again which is an invaluable method and what Microsofts Briefcase and offline files and folders were supposed to have provided.

There have been a lot of questions around security though - i.e. how secure is dropbox and my response to this is simple - it's in the cloud so you need to be careful. Do not put any confidential data on it or if you do encrypt before hand.

Like anything cloud computing is a nice idea and can be used for many things but it's not a panacea and you need to be careful with how you use it.