Debugging Apache Rewrites and Redirects

Debugging Apache Rewrites and Redirects

Dec 19

  • Created: Dec 19, 2011 11:03 AM

Debugging Apache Rewrites and Redirects
Apache’s mod_rewrite and mod_alias can be very useful, but they can also be a huge pain to debug when a problem arises. Luckily, there are a few things that can help, and since you’re probably not the first person attempting to rewrite or redirect something in a specific way, you’ll likely be able to find the answer by just searching around intelligently once you understand the basics.

The first resource that I use would be the Apache mod_rewrite and mod_alias documentation. It might seem tedious, but Apache HTTP Server actually has some of the best documentation in the industry. If you spend 10 minutes reading the actual documentation and give understanding it a shot, you’ll probably find the majority of your questions answered. Make sure you check out the section of the mod_rewrite docs that covers the RewriteLog directive, since having a log turns guesswork into something debuggable.

The next tool that I like to use is curl, since it shows you the http headers and lets you see exactly what the client and server are saying to each other. `curl` is a pretty standard utility that comes installed on many flavors of *nix and it’s available for download for many platforms, including Windows, cygwin, and Mac OSX. Let’s check out an example of a simple 302 “Found” redirect:

$ curl --verbose --head --location example.com

* About to connect() to example.com port 80 (#0)
*   Trying 192.0.43.10... connected
* Connected to example.com (192.0.43.10) port 80 (#0)
> HEAD / HTTP/1.1
> User-Agent: curl/7.21.7 (i386-redhat-linux-gnu) libcurl/7.21.7 NSS/3.12.10.0 zlib/1.2.5 libidn/1.22 libssh2/1.2.7
> Host: example.com
> Accept: */*
> 
* HTTP 1.0, assume close after body
< HTTP/1.0 302 Found
HTTP/1.0 302 Found
< Location: http://www.iana.org/domains/example/
Location: http://www.iana.org/domains/example/
< Server: BigIP
Server: BigIP
* HTTP/1.0 connection set to keep alive!
< Connection: Keep-Alive
Connection: Keep-Alive
< Content-Length: 0
Content-Length: 0

< 
* Connection #0 to host example.com left intact
* Issue another request to this URL: 'http://www.iana.org/domains/example/'
* About to connect() to www.iana.org port 80 (#1)
*   Trying 192.0.32.8... connected
* Connected to www.iana.org (192.0.32.8) port 80 (#1)
> HEAD /domains/example/ HTTP/1.0
> User-Agent: curl/7.21.7 (i386-redhat-linux-gnu) libcurl/7.21.7 NSS/3.12.10.0 zlib/1.2.5 libidn/1.22 libssh2/1.2.7
> Host: www.iana.org
> Accept: */*
> 
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Date: Fri, 16 Dec 2011 21:22:21 GMT
Date: Fri, 16 Dec 2011 21:22:21 GMT
< Server: Apache/2.2.3 (CentOS)
Server: Apache/2.2.3 (CentOS)
< Last-Modified: Wed, 09 Feb 2011 17:13:15 GMT
Last-Modified: Wed, 09 Feb 2011 17:13:15 GMT
< Vary: Accept-Encoding
Vary: Accept-Encoding
< Connection: close
Connection: close
< Content-Type: text/html; charset=UTF-8
Content-Type: text/html; charset=UTF-8

< 
* Closing connection #1
* Closing connection #0

If we look hard enough, we can get a lot of info about the request from this. You can see the actual HTTP code of the 302 redirect in the line “< HTTP/1.0 302 Found", with the left angle bracket indicating that this is a response from the server (right angle brackets are commands sent by the client -- curl, in this case).

The above curl command, `curl --verbose --head --location example.com` tells us that curl should be verbose, it should just send a "HEAD" HTTP request instead of GET, and that it should follow any "Location: [...]" responses that it receives from the server (telling it to go to a new location). In this case, as we've seen above, curl will re-run the request with the updated location as the input URL. This behavior mimics that of browsers, but it isn't the default curl behavior.

There are some cases when we might need to more closely simulate a browser. If your application handles GET requests and ignores HEAD, you might need to omit '--head' and instead just use '-o /dev/null' to write the downloaded file out to nowhere. You could also just leave this out, which will dump the page source to your STDOUT (effectively the same as '-o -'). You might also need to specify a user agent string to trigger specific site behavior such as a mobile site. Let's try with example.com again. We're going to simulate a request from an iPhone running iOS 5.0 using Safari, with the '--trace-ascii' for full geek mode which will show us even more details that could be useful for optimized mobile content. We'll also use the '--limit-rate' option for curl to slow down the transfer to a crawl (2G cellular GPRS speeds that an average connection might see -- 25kbit/sec):

$ curl –trace-ascii – –location -o /dev/null –user-agent ‘Mozilla/5.0 (iPhone; CPU iPhone OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3′ –limit-rate 25k example.com
== Info: About to connect() to example.com port 80 (#0)
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 –:–:– –:–:– –:–:–     0== Info:   Trying 192.0.43.10… == Info: connected
== Info: Connected to example.com (192.0.43.10) port 80 (#0)
=> Send header, 198 bytes (0xc6)
0000: GET / HTTP/1.1
0010: User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 5_0 like Mac OS X
0050: ) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A33
0090: 4 Safari/7534.48.3
00a4: Host: example.com
00b7: Accept: */*
00c4: 
== Info: HTTP 1.0, assume close after body
<= Recv header, 20 bytes (0×14)
0000: HTTP/1.0 302 Found
<= Recv header, 48 bytes (0×30)
0000: Location: http://www.iana.org/domains/example/
<= Recv header, 15 bytes (0xf)
0000: Server: BigIP
== Info: HTTP/1.0 connection set to keep alive!
<= Recv header, 24 bytes (0×18)
0000: Connection: Keep-Alive
<= Recv header, 19 bytes (0×13)
0000: Content-Length: 0
<= Recv header, 2 bytes (0×2)
0000: 
  0     0    0     0    0     0      0      0 –:–:– –:–:– –:–:–     0
== Info: Connection #0 to host example.com left intact
== Info: Issue another request to this URL: ‘http://www.iana.org/domains/example/’
== Info: About to connect() to www.iana.org port 80 (#1)
== Info:   Trying 192.0.32.8… == Info: connected
== Info: Connected to www.iana.org (192.0.32.8) port 80 (#1)
=> Send header, 215 bytes (0xd7)
0000: GET /domains/example/ HTTP/1.0
0020: User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 5_0 like Mac OS X
0060: ) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A33
00a0: 4 Safari/7534.48.3
00b4: Host: www.iana.org
00c8: Accept: */*
00d5: 
<= Recv header, 17 bytes (0×11)
0000: HTTP/1.1 200 OK
<= Recv header, 31 bytes (0x1f)
0000: Server: Apache/2.2.3 (CentOS)
<= Recv header, 46 bytes (0x2e)
0000: Last-Modified: Wed, 09 Feb 2011 17:13:15 GMT
<= Recv header, 23 bytes (0×17)
0000: Vary: Accept-Encoding
<= Recv header, 40 bytes (0×28)
0000: Content-Type: text/html; charset=UTF-8
<= Recv header, 22 bytes (0×16)
0000: Accept-Ranges: bytes
<= Recv header, 24 bytes (0×18)
0000: Connection: close     
<= Recv header, 37 bytes (0×25)
0000: Date: Fri, 16 Dec 2011 21:47:36 GMT
<= Recv header, 14 bytes (0xe)
0000: Age: 101    
<= Recv header, 22 bytes (0×16)
0000: Content-Length: 2966
<= Recv header, 2 bytes (0×2)
0000: 
<= Recv data, 1170 bytes (0×492)
0000: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "
0040: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">.<html 
0080: xmlns="http://www.w3.org/1999/xhtml">.<head>..<title>IANA &mdash
00c0: ; Example domains</title>..<!– start common-head –>..<meta htt
0100: p-equiv="Content-type" content="text/html; charset=utf-8" />..<l
0140: ink rel="stylesheet" type="text/css" href="/_css/2008.1/reset-fo
0180: nts-grids.css" />..<link rel="stylesheet" type="text/css" media=
01c0: "screen" href="/_css/2008.1/screen.css" />..<link rel="styleshee
0200: t" type="text/css" media="print" href="/_css/2008.1/print.css" /
0240: >..<link rel="shortcut icon" type="image/ico" href="/favicon.ico
0280: " />..<script type="text/javascript" src="/_js/prototype.js"></s
02c0: cript>..<script type="text/javascript" src="/_js/corners.js"></s
0300: cript>..<script type="text/javascript" src="/_js/common.js"></sc
0340: ript>..<!– end common-head –>..</head>.<body>..<!– start comm
0380: on-bodyhead –>..<div id="header-frame">..<div id="header">..<di
03c0: v id="header-logo"><a href="/"><img src="/_img/iana-logo-pagehea
0400: der.png" alt="Homepage"/></a></div>..<div id="header-nav">..<ul>
0440: ..<li><a href="/domains/">Domains</a></li>..<li><a href="/number
0480: s/">Numbers</a></l
<= Recv data, 1796 bytes (0×704)
0000: i>..<li><a href="/protocols/">Protocols</a></li>..<li><a href="/
0040: about/">About IANA</a></li>..</ul>..</div>..</div>..</div>…<di
0080: v id="body-container">..<div id="body">..<!– end common-bodyhea
00c0: d –>….<h1>Example Domains</h1>…<p>As described in <a href="
0100: /go/rfc2606">RFC 2606</a>,..we maintain a number of domains such
0140:  as EXAMPLE.COM and EXAMPLE.ORG..for documentation purposes. The
0180: se domains may be used as illustrative..examples in documents wi
01c0: thout prior coordination with us. They are ..not available for r
0200: egistration.</p>… <!– start common-bodytail –>..</div>..</di
0240: v>…<div id="footer-frame">..<div id="footer">….<table width=
0280: 100%>..<tr>…<td id="iana-footer-first"><b><a href="/about/">Ab
02c0: out</a></b><br/>.                <a href="/about/presentations/"
0300: >Presentations</a><br/>.                <a href="/about/performa
0340: nce/">Performance</a><br/>…<a href="/reports/">Reports</a><br/
0380: >.                </td>….<td><b><a href="/domains/">Domains</a
03c0: ></b><br/>…<a href="/domains/root/">Root Zone</a><br/>…<a hr
0400: ef="/domains/int/">.INT</a><br/>…<a href="/domains/arpa/">.ARP
0440: A</a><br/>…<a href="/domains/idn-tables/">IDN Repository</a></
0480: td>….<td><b><a href="/protocols/">Protocols</a></b><br/>…<br
04c0: />…<b><a href="/numbers/">Number Resources</a></b><br/>…<a h
0500: ref="/abuse/">Abuse Information</a></td>….<td id="iana-footer-
0540: icann"><img src="/_img/icann-logo-micro.png"><br/>IANA is operat
0580: ed by the<br/><a href="http://www.icann.org/">Internet Corporati
05c0: on for Assigned Names and Numbers</a></td>..</tr>..</table>..<di
0600: v id="footer-beta-feedback">.        <p>Please direct general fe
0640: edback regarding IANA to <a href="mailto:iana@iana.org?subject=G
0680: eneral%20website%20feedback">iana@iana.org</a>.</p>.        </di
06c0: v>…</div>..</div>..<!– end common-bodytail –>…</body>.</ht
0700: ml>.
100  2966  100  2966    0     0  15926      0 –:–:– –:–:– –:–:– 15926
== Info: Closing connection #1
== Info: Closing connection #0

That's a lot of info to digest, but by using curl like this you can see exactly what's going on at the HTTP level, and it's usually easier than a web browser for diagnosing issues with rewrites or things like "Expires: " headers. You might also be interested in a full list of HTTP headers. You can also check out the RFC for HTTP/1.1 Header Field Definitions if you need more info about the engineering specs of a specific header.

Finally, I use a little Firefox extension called Live HTTP Headers which lets you pop open a window and watch the HTTP requests and responses in real-time as your browser makes them. This is great for things where curl wouldn't work or would be excessively complicated, such as AJAX requests on a page.

Basically, if you do some homework and take the time to learn a bit more about the HTTP protocol and the Apache modules that deal with URL modification, you'll be well on your way to becoming a real rewrite ninja. Also, when you're learning, it'll be good to have a few cheatsheets around. I have the mod_rewrite and Regular Expressions sheets hanging up in my office for quick reference.

For extra credit, you can also open up Wireshark and use it to dig deeper into HTTP, TCP/IP, and DNS to see what's really going on behind the scenes when you visit a page.

Posted in: Apache