Contact
Site: US UK AU |
Nexcess Blog

The Troubleshooter’s Perspective: Advice from a System Administrator

October 2, 2013 0 Comments RSS Feed

It can be fun to solve problems. Luckily for me, working in Nexcess support, I am given the opportunity to do this every day. I get to dig into a problem, find a solution, and then fix the issue and hopefully make you, our customers, happy. Sometimes an issue arises where the fix is not obvious, and that’s when troubleshooting skills come in handy.

Troubleshooting is a mindset. You have to believe that you have the ability to fix the problem and stick with it, even when you feel like you are getting nowhere after hours of frustration. If you don’t believe in yourself enough, you may give up too quickly. As you troubleshoot, tap into your existing knowledge base, rule out possibilities, isolate variables, and track down leads.

Don’t be dismayed when you feel SO HAPPY when you think you’ve found the problem…but then realize, no, that wasn’t it, back to square one. Eventually you’ll get there. You may need to read documentation or ask a colleague for help. Maybe you just need to step away for a moment and grab some more coffee. That’s always when the answer hits you, right? The instant you are no longer searching, then suddenly BOOM! Ultimately, it might be that the customer comes back with new information that inspires a breakthrough.

The best part of this system is that we also learn from each and every problem that we fix, which helps make the troubleshooting process easier the next time. To help with any future problems you may run into, I wrote down a few of our common troubleshooting steps, and because I’m a system administrator and like things organized, it’s a list.

1. Understand the problem

  • This seems simple, but it’s critical. If you go to the doctor, they are going to ask, “What are your symptoms?” Without this information, how can the doctor move forward and help you? To understand the problem is to listen very carefully to those “symptoms.”

2. Check logs

  • The logs provide actual data based on the described problem. To continue the doctor analogy, you may come in saying you feel awful, but the doctor still needs to take your temperature or run tests.
  • By checking the logs yourself, you can save the time a trip to tech support for answers would have taken.
  • In SiteWorx the Error or Access logs can be viewed under Administration > Logs. I would recommend enabling the option that allows you to save logs for up to 7 days.
  • For Magento errors you can view the system.log or exception.log under var/log. Usually, I take a quick glance and view the last 100 or so lines of the file, for example: tail -100 system.log
  • Linux command line tools like grep, sort, head/tail, and awk make it possible to search and format results based on our queries.
  • In WordPress or ExpressionEngine – and most other software applications – you can enable debug mode to get more information on errors.
  • We routinely check other logs (when applicable) on our managed servers as well, but I’ve given examples that customers are able to access.

3. Gather specific information

  • Look to reproduce (if possible), recent changes made (if any, no matter how small!), and identify the times when the changes occurred. When a customer provides this information in a ticket, it gives us a headstart on resolving the problem.
  • If the change is reproducible and we can catch the issue while it’s happening, it is usually a much quicker fix.
  • Capturing MySQL queries that are running, tailing logs to see if an error is being generated by a specific action, and checking running server processes all help as well.
  • Running strace on a PHP process can also provide clues.

4. Search the internet

  • Googling for answers can be helpful, with the huge, huge caveat that many “answers” will not apply in our situation and can sometimes be flat-out wrong.
  • Internet searches can be great for very specific application errors as well as just general Linux know-how.
  • Self-study using online documentation (eg: MySQL, PHP) can also be extremely helpful.

5. Use a third party site

  • If the issue is performance-related (site is up, but slow), sites such as webpagetest.org can help track down specifics and provides a testing site from outside our network. Optimizing databases, cleaning logs, installing various types of caching to speed up sites, and making sure CDN’s are properly configured are just a few examples of the additional things we do.

6. Carefully try fixes (that can be reverted).

  • If a change is going to be made to fix a specific issue, make sure it applies to the server environment (PHP version, OS, etc).
  • It’s also advisable to check syntax, only make one change at a time, create backups in advance, and communicate each change made to all of the people who need to know.

This is just a small sampling of the troubleshooting steps that we use every day, but the list is always growing. Do you have any tips that you’ve learned through your experiences that you would like to share?

Image credit: colinkinner

Posted in: Nexcess