Reviewing our preconceptions
One of the challenges faced in the IT industry is to break poorly conceived or mistaken preconceptions held by others. What happens when we’re the ones holding on to out dated ideas or are just wrong, as technology has taken another huge leap forward and we’re left standing clutching on to something that’s now infective?
I have been reviewing some documentation I wrote three years ago and at a glance it appeared to be valid, using correct assumptions and only needing minor tweaks to bring it up to date.
John, an ISC reader, emailed in a great comment from a discussion about best practices he was involved in re-enforcing this. Smart people in that room brought out timeless best practice statements such as:
'Logs should be stored separate from the application to prevent out of control logs from filling up the system and causing other components to crash.'
All of which makes perfect sense from a best practice point of view, and I follow this principle for many of the systems I install and manage. Let’s attempt to see if this best practice statement is still valid by asking some simple questions:
- Why are we creating logs in the first place?
- Who looks at them?
- Do the right people have access to the logs?
- Are they of any use?
- Is there any need to archive them or can they be deleted after x amount of time?
- Are we asking the right people about the logs in the first place?
It may come out that having 300 GB of logs, that are on their own fast RAID-ed disks and are backed up nightly is a huge waste of time, money and resources, as no-one every looks, uses or know what to do with them. Having only a week’s worth of logs, taking up 10MB of disk, used only for possible troubleshooting might be the best solution.
So going back to my documentation, I took a hard look at what I’d written. Almost immediately I found I’d fallen in to the generic best practice assumptions pit. They were good at the time, but not now, given the way the business, processes and technology had changed. Needless to say the quick document update stretched in to a number of hours of re-writes, only after talking to various people on a string of questions I need to address. Once the documents had been peer reviewed, signed off and finally upload, I added an entry in to my diary to take time to review and, if necessary, amend these documents six months from now.
Do you apply a review process to security policies, procedures, documents and best practices to ensure they still meet the measures and metrics that make them still relevant, meaningful and fit current business needs?
How do can you ensure that you’re not clinging to best practices or policies that are well past their sell by date?
Can you share any pearls of wisdom to help others avoid automatic adoptions of reasonable sounding, yet poorly thought out, best practices?
Chris Mohan --- ISC Handler on Duty
Comments
Until one day, when you discover you really could use them, and that ONE time need is so critical that it justifies the cost for the 3651 days a decade you don't need old logs. For example to backtrack an intrusion discovered 6 months later and try to mine the logs to figure out what happened, or when X really first went wrong.
Mysid
Jan 25th 2011
1 decade ago
Xarthila
Jan 25th 2011
1 decade ago
If its the former (remote loghost) I have no objection. If its the latter, I've gotten sick of that argument over the past 10 years, and I think its a perfect example of system administrators enacting a policy on auto-pilot that they never reconsider. It used to be that partitioning served useful purposes in keeping often-written partitions like /var away from read-mostly partitions like /usr and contained filesystem corruption and allowed the system to boot. In these days, when I'm managing literally 1000s of systems I'm not worried about that, and systems are more "throw-away" and I can reimage in order to fix filesystem corruption.
Meanwhile, however much you partition just means that you have more things to manage now, and whatever partition that you make small now, will never be future-proof, and just gives less disk space to whatever is filling up and gives you less warning time between the threshold your monitoring agent triggers at.
Also, it is inevitably the application logs that puke, causing the application disk space to fill up, causing the application to crash, causing the server to fail. That does not solve a problem.
I've been using a one-big-slash policy now since 2006 and despite some pretty viciously emotional arguments with other system admins who claimed that i'd wind up causing all kinds of problems -- instead it has just eliminated lots of additional management points that I don't have to worry about and this policy has never once woken me up at night.
Somehow all these fiddly /usr and /var partitions got instilled in the unix culture back in the 80s and 90s and nobody has really reassessed that policy to see if the rationales make sense in an era of dozens or hundreds of throwaway webservers that you can re-image in minutes.
Lamont
Jan 25th 2011
1 decade ago
Now the same incident will NOT repeat itself. Logs are invaluable! Keep them, and most importantly... keep them SAFE!
-Al
Al of Your Data Center
Jan 25th 2011
1 decade ago
I have never had a need to look back more than a few days, but we retain those for a ridiculous time (can't remember the time off hand)logs based on an outdated policy.
-Larry
Jan 25th 2011
1 decade ago
I don't worry about 'future proofing' partitions/disk labels; i use LVM. Allocate as little disk space as possible to partitions, with some breathing room, and set alarms to page if disk utilization is ever high.
Most space is reserved as 'free space' that can be assigned to volumes later. If storage requirements change, I use online LVM expansion to increase the size of volumes and resize2fs filesystems as necessary, to add a new chunk of space.
Mysid
Jan 26th 2011
1 decade ago