I recently stumbled across traceroute bad.horse. That's pretty cool, I thought, and then a friend and esteemed colleague pointed out that it's been done before with the opening scrolling text for Star Wars A New Hope. For reasons mis-unbeknownst only to a good psychologist I felt I had to create something like this too. Also, Jeff Goldblum wasn't around to stop me.
I should point out that every time something like this is done a person with a beard and sandals dies a little. And I kind of agree with their sentiment. RFCs, IETF standards and best-practice documents exist for a reason. But anyway.. I felt I had to do this because around 1998 I was experimenting with internet mapping, and used traceroute-like techniques extensively.
First, a quick look at traceroute.. Skip this bit if you already know how it works.. Every IPv4 packet has a TTL (Time-To-Live) field that's decremented every time the packet passes through a Layer 3 (IP) router. When the TTL reaches 0 the router will send an ICMP time exceeded packet back to the source IP address. The sender of the original packet will know that the destination host cannot be reached within that many hops, and will usually return an error to the application layer. A typical default value is 64, which is plenty to reach any host on the internet, and if it doesn't then there's probably a routing loop.
Traceroute, originally written by Van Jacobson, sends a series of packets, the first of which starts with a TTL of 1 and the following packets have a gradually incremented TTL. By seeing which routers return the ICMP time exceeded packet the forward path from the source IP to the target IP can be mapped. When the TTL is high enough for the probing packet to reach the destination IP the response is dependent on the type of the probing packet.
jsp@ks364689:~$ traceroute 8.8.8.8 traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets 1 rbx-60-m2.fr.eu (91.121.210.252) 9.360 ms 9.573 ms 9.565 ms 2 rbx-g1-a9.fr.eu (37.187.231.65) 0.784 ms rbx-g1-a9.fr.eu (37.187.231.101) 0.744 ms rbx-g2-a9.fr.eu (37.187.231.67) 0.747 ms 3 gsw-1-a9.fr.eu (213.251.130.55) 4.021 ms 4.025 ms th2-1-a9.fr.eu (213.251.130.53) 4.008 ms 4 * * * 5 66.249.94.54 (66.249.94.54) 4.931 ms 4.683 ms 209.85.246.131 (209.85.246.131) 4.249 ms 6 google-public-dns-a.google.com (8.8.8.8) 3.907 ms 4.362 ms 3.970 ms jsp@ks364689:~$
Traceroute has many options, but by default it should send UDP packets. It'll also send three of those packets for every TTL value, and it'll increment the TTLs from 1 to 30. Have a look at the traceroute man page for Linux, FreeBSD, Solaris for the options.
In the output above it can be seen that the first hop from the source host is rbx-60-m2.fr.eu and is about 9 ms away. The second hop is actually three different routers for each of the second hop packets - this is probably some kind of round-robin load balancing, or plain and simple route-flapping (when a routing protocol configuration is unstable and the FIBs get changed repeatedly in a short period of time). Hop 4 router(s) did not answer with an ICMP time exceeded packet within the timeout period so there was no way to identify the fourth hop router. This could be because they're administratively configured not to reply with the ICMP message, or they have an address in the private RFC1918 range and an edge router is filtering those replies (since no RFC1918 address packets should be able to get out on to the internet).
Have a look at Appendix A to see the different results you can get from using different traceroute probes.
So how can we fake the hops?
There are three aspects to the spoofing:
Off the top of my head here are a few approaches to creating the responses to the probe packets:
The Star Wars example was inefficient in public IP address usage - the links between VRFs (and also VMs and LNNs from the approaches above) could use at best /30 network addresses, hence why the returned hops weren't consecutively numbered. The bad.horse example was more efficient in that consecutive public IPs were used (albeit out of order), but there were chances to re-use a single IP for a single hostname (that reversed to bad.horse) which weren't taken, instead they reversed to more than one public IP.
To optimise for public IP usage it would be possible to use RFC1918 private addresses between the VMs/VRFs/LNNs and use OpenFlow to renumber the source address of the reply (ICMP time exceeded, port unreachable) to consecutive public IP addresses. Using Linux IPTables might also work, but Linux often has many shortcuts in the packet path and it'd be difficult to SNAT only the reply packets (they actually circumvent the usual NAT table).
The last step, common to all of these approaches, is to fiddle the reverse DNS entries to add the hostnames that spell out the actual desired text. Depending on who does your DNS you may be limited to what you can set the reverse DNS to - many will first check whether the forward DNS maps to the IP. If that's the case then the best thing you could do, I guess, is first.line.yourdomain.tld, second.line.yourdomain.tld. For local testing you can always make changes directly to the /etc/hosts file.
Quoth the Scapy website: "Scapy is a powerful interactive packet manipulation program. It is able to forge or decode packets of a wide number of protocols, send them on the wire, capture them, match requests and replies, and much more.".
I've used Scapy a few times and I can highly recommend it for all kinds of tasks - so I'm going to try the packet forgery approach from above.
The script below reads a list of whitespace delimited IPv4 addresses or stars (*) from stdin and these will form the dummy route that the script will try to fake. We use IPTables and the NFQUEUE target to take packets from the network stack and pass them to the Python script (which of the NFQUEUEs to listen on can be specified on the command line). The script then parses the packet and tells IPTables to drop the packet (otherwise the OS would generate and send an authentic reply). The TTL is used to pick which fake IP in the list is to be the source of the reply, and then depending on the position in the list either an ICMP time-exceeded, or ICMP port-unreachable packet is constructed and finally sent via Scapy (back in to the Linux network stack).
For the sake of simplicity we're only catering for UDP traceroutes. To support TCP and ICMP probes one just has to change the packet returned in line 29.
#! /usr/bin/env python2.7 from scapy.all import * from netfilterqueue import NetfilterQueue import sys # for stdin.read and sys.argv import socket # for inet_ntoa and inet_aton import string # for string.split nfq_num = 1 hops={} def nfq_cb(packet): pkt = IP(packet.get_payload()) # convert the NF packet to a scapy packet packet.drop() # drop the original NF packet as we have a copy # print "src = _%s_ dst = _%s_ ttl = _%s_" % (pkt.src, pkt.dst, pkt.ttl) if pkt.ttl in hops: # find TTL in the hops we're spoofing hop=hops[pkt.ttl]; if hop == "*": # don't send any replies # print "Got * for TTL %s. Dropping packet.." % pkt.ttl pass else: # extract IP header and first 8 bytes of probing packet. Should include options header, hence .ihl inhdr=str(pkt)[:(8 + pkt[IP].ihl * 4)] if pkt.ttl < len(hops): # not our last hop so build ICMP time exceeded response # print "Sending time exceeded packet from %s for TTL %s.." % (hop,pkt.ttl) p=IP(src=hop,dst=pkt.src)/ICMP(type=11,code=0)/inhdr else: # last hop so build ICMP port unreachable response # print "Sending port unreachable from %s for TTL %s.." % (hop,pkt.ttl) p=IP(src=hop,dst=pkt.src)/ICMP(type=3,code=3)/inhdr # The newly formed response packet has an invalid checksum # so delete checksums and rebuild the packet with checksums del(p[ICMP].chksum) del(p[IP].chksum) p=p.__class__(str(p)) send(p) # send spoofed packet to networking stack else: # print "TTL not found." pass ttl=1 if len(sys.argv) == 2: try: nfq_num = int(sys.argv[1]) except ValueError: print "usage: %s [Netfilter Queue number (integer)]" % sys.argv[0] exit(1) for destip in string.split(sys.stdin.read()): try: # check input is valid IPv4 address dip=socket.inet_aton(destip) hops[ttl] = socket.inet_ntoa(dip) print "%s: %s" % (ttl,hops[ttl]) ttl+=1 except socket.error: if destip == "*": hops[ttl] = "*"; print "%s: %s" % (ttl,hops[ttl]) ttl+=1 else: print "%s was not a valid IPv4 address. Exiting.." % destip exit(1) nfq = NetfilterQueue() nfq.bind(nfq_num, nfq_cb) try: print "Waiting for packets on netfilter queue number %d.." % nfq_num nfq.run() except KeyboardInterrupt: pass
You can download the source here.
And this is how you run it.. we need one host to do the faking (machina in this case), and make sure you're root on that host.. and another from whence to traceroute (themis in the example below).
Say we own 10.10.0.0/24 and we'll be using 10.10.0.220-10.10.0.229 for the spoofed hosts. I'm using private addresses, but they can equally be publicly addressable IP addresses (and in fact have to be if people from outside are to see the spoofed traceroute with custom FQDNs).
Firstly we need to create an IP alias interface for the target IP:
root@machina:~# ifconfig eth0:0 10.10.0.229 up root@machina:~#
In practice, however, many routers will try to detect spoofing on directly attached subnets so we need to respond to ARP calls for all addresses we're spoofing for with a few more IP alias interfaces (that, or spoof addresses from a subnet which isn't directly attached):
for i in `seq 0 9`; do ifconfig eth0:$i 10.10.0.22$i up; done
That'll create ten alias interfaces for addresses ending 220-229.
Second, iptables has to be told to forward the incoming UDP packets to our script:
root@machina:~# iptables -A INPUT -d 10.10.0.229 -p udp -j NFQUEUE --queue-num 1 root@machina:~#
But this only has to be done for the traceroute target address.
Thirdly, we need to set up the reverse DNS. We could be using public IP addresses, and we'd be setting up reverse DNS at this point, but the ranges I own are all in production use, so instead I've fiddled the /etc/hosts file on themis from whence we'll traceroute:
root@themis:~# grep 10.10.0.22 /etc/hosts 10.10.0.220 on.the.wall 10.10.0.221 you.take.one.down 10.10.0.222 and.you.pass.it.around 10.10.0.223 then.therell.be 10.10.0.224 THE.END 10.10.0.225 100.bottles.of.beer 10.10.0.226 99.bottles.of.beer 10.10.0.227 98.bottles.of.beer 10.10.0.228 97.bottles.of.beer 10.10.0.229 96.bottles.of.beer
Now that we're set up, we need to generate the order of IP addresses which will map to our "song" lines. Since they follow a repetitive pattern it's easiest to just get gawk to generate them:
root@machina:~# gawk 'END{for(i=225;i<229;i++) print "* 10.10.0." i " 10.10.0.220 10.10.0." i " 10.10.0.221 10.10.0.222 10.10.0.223 10.10.0." i+1 " 10.10.0.220 "; print "* * 10.10.0.224"}' /dev/null * 10.10.0.225 10.10.0.220 10.10.0.225 10.10.0.221 10.10.0.222 10.10.0.223 10.10.0.226 10.10.0.220 * 10.10.0.226 10.10.0.220 10.10.0.226 10.10.0.221 10.10.0.222 10.10.0.223 10.10.0.227 10.10.0.220 * 10.10.0.227 10.10.0.220 10.10.0.227 10.10.0.221 10.10.0.222 10.10.0.223 10.10.0.228 10.10.0.220 * 10.10.0.228 10.10.0.220 10.10.0.228 10.10.0.221 10.10.0.222 10.10.0.223 10.10.0.229 10.10.0.220 * * 10.10.0.224 root@machina:~#
Bear in mind that since the TTL field in the IP header is 8 bits wide there cannot be more than 255 hosts in the list (for the entire route, including the LAN and access network of the probe source).
Now that everything is ready it's time to run the Python script on machina (still as root):
root@machina:~# gawk 'END{for(i=225;i<229;i++) print "* 10.10.0." i " 10.10.0.220 10.10.0." i " 10.10.0.221 10.10.0.222 10.10.0.223 10.10.0." i+1 " 10.10.0.220 "; print "* * 10.10.0.224"}' /dev/null | ~jsp/tracey4b.py WARNING: No route found for IPv6 destination :: (no default route?) 1: * 2: 10.10.0.225 3: 10.10.0.220 4: 10.10.0.225 5: 10.10.0.221 6: 10.10.0.222 7: 10.10.0.223 8: 10.10.0.226 9: 10.10.0.220 10: * 11: 10.10.0.226 >>SNIP<< 36: 10.10.0.220 37: * 38: * 39: 10.10.0.224 Waiting for packets on netfilter queue number 1..
And then, on the other host:
jsp@themis:~$ traceroute -m 50 10.10.0.229 traceroute to 10.10.0.229 (10.10.0.229), 50 hops max, 60 byte packets 1 router (192.168.88.1) 0.377 ms 0.375 ms 0.408 ms 2 * * * 3 100.bottles.of.beer (10.10.0.225) 63.802 ms 115.551 ms 159.541 ms 4 on.the.wall (10.10.0.220) 197.195 ms 243.518 ms 281.155 ms 5 100.bottles.of.beer (10.10.0.225) 327.391 ms 364.975 ms 416.311 ms 6 you.take.one.down (10.10.0.221) 459.093 ms 327.300 ms 373.555 ms 7 and.you.pass.it.around (10.10.0.222) 421.584 ms 469.473 ms 507.332 ms 8 then.therell.be (10.10.0.223) 557.185 ms 577.907 ms 579.634 ms 9 99.bottles.of.beer (10.10.0.226) 589.962 ms 591.517 ms 602.968 ms 10 on.the.wall (10.10.0.220) 598.488 ms 604.717 ms 622.100 ms 11 * * * 12 99.bottles.of.beer (10.10.0.226) 492.553 ms 491.755 ms 492.692 ms 13 on.the.wall (10.10.0.220) 491.708 ms 491.792 ms 491.884 ms 14 99.bottles.of.beer (10.10.0.226) 498.978 ms 503.955 ms 502.893 ms 15 you.take.one.down (10.10.0.221) 495.577 ms 504.929 ms 503.805 ms 16 and.you.pass.it.around (10.10.0.222) 510.969 ms 515.927 ms 516.661 ms 17 then.therell.be (10.10.0.223) 515.764 ms 508.539 ms 503.684 ms 18 98.bottles.of.beer (10.10.0.227) 511.820 ms 515.847 ms 515.778 ms 19 on.the.wall (10.10.0.220) 523.653 ms 520.596 ms 522.101 ms 20 * * * 21 98.bottles.of.beer (10.10.0.227) 375.407 ms 371.738 ms 371.582 ms 22 on.the.wall (10.10.0.220) 367.790 ms 363.902 ms 366.902 ms 23 98.bottles.of.beer (10.10.0.227) 361.510 ms 376.237 ms 371.369 ms 24 you.take.one.down (10.10.0.221) 367.942 ms 359.741 ms 355.891 ms 25 and.you.pass.it.around (10.10.0.222) 342.049 ms 340.572 ms 323.802 ms 26 then.therell.be (10.10.0.223) 324.033 ms 323.896 ms 313.198 ms 27 97.bottles.of.beer (10.10.0.228) 311.710 ms 307.160 ms 311.995 ms 28 on.the.wall (10.10.0.220) 322.494 ms 313.424 ms 311.510 ms 29 * * * 30 97.bottles.of.beer (10.10.0.228) 190.681 ms 184.955 ms 190.165 ms 31 on.the.wall (10.10.0.220) 191.803 ms 194.297 ms 194.330 ms 32 97.bottles.of.beer (10.10.0.228) 192.058 ms 181.690 ms 173.318 ms 33 you.take.one.down (10.10.0.221) 163.022 ms 172.563 ms 181.715 ms 34 and.you.pass.it.around (10.10.0.222) 183.688 ms 190.169 ms 186.677 ms 35 then.therell.be (10.10.0.223) 187.940 ms 177.755 ms 179.568 ms 36 96.bottles.of.beer (10.10.0.229) 183.788 ms 179.897 ms 201.816 ms 37 on.the.wall (10.10.0.220) 193.854 ms 187.976 ms 181.089 ms 38 * * * 39 * * * 40 THE.END (10.10.0.224) 34.470 ms 77.813 ms 114.944 ms jsp@themis:~$
By default traceroute will use UDP probing packets, but it'll also stop after 30 hops, hence the use of -m 50, so our song doesn't get cut off.
The first hop, 192.168.88.1 is a genuine router response.
You'll notice that although we're tracerouting to 10.10.0.229 the traceroute terminates at 10.10.0.224, because that was the last IP on our spoof path, and the one for which our Python script returned an ICMP port unreachable instead of an ICMP time exceeded response.
Don't expect much. This wasn't designed to be performant, just a PoC.
You can download a multi-threaded version of the script HERE (as a producer-consumer queue), but it barely helps performance because of CPython's silly Global Interpreter Lock (I include the script here in case you run a different interpreter), probably compounded by the NetfilterQueue bindings having their own internal locks. Because the packet is being transferred from kernel space to user space it also has to be copied, so the zero-copy optimisation is unfortunately not available (but that overhead is nothing compared to the GIL).
There is some good news though; the spoofing is entirely stateless, so you can run multiple instances of the single-thread script and use iptables to load balance incoming packets between instances...
Simply run N instances of the single threaded spoofing script, passing the same IPs on stdin, but add a different integer as a command line argument to each instance to specify a different NFQUEUE for each instance. Then tell iptables to send consecutive probes to different queues:
root@machina:~# for i in `seq 4 -1 1`; do iptables -A INPUT -d 10.10.0.229 -p udp -m statistic --mode nth --every $i --packet 0 -j NFQUEUE --queue-num $i; done root@machina:~# iptables -nvL INPUT Chain INPUT (policy ACCEPT 14 packets, 972 bytes) pkts bytes target prot opt in out source destination 0 0 NFQUEUE udp -- * * 0.0.0.0/0 10.10.0.229 statistic mode nth every 4 NFQUEUE num 4 0 0 NFQUEUE udp -- * * 0.0.0.0/0 10.10.0.229 statistic mode nth every 3 NFQUEUE num 3 0 0 NFQUEUE udp -- * * 0.0.0.0/0 10.10.0.229 statistic mode nth every 2 NFQUEUE num 2 0 0 NFQUEUE udp -- * * 0.0.0.0/0 10.10.0.229 statistic mode nth every 1 NFQUEUE num 1 root@machina:~# root@machina:~# root@machina:~# iptables -m statistic -h|tail --set-counters PKTS BYTES set the counter during insert/append [!] --version -V print package version. statistic match options: --mode mode Match mode (random, nth) random mode: [!] --probability p Probability nth mode: [!] --every n Match every nth packet --packet p Initial counter value (0 <= p <= n-1, default 0) root@machina:~#
The script isn't CPU bound (just lock bound) so it'll be fine to run two or three times as many instances as CPU cores. The details on the statistic module can be found with iptables -m statistic -h.
The iptables option --queue-balance 0:3 should also work and is simpler, but it will load balance at flow granularity (not packet granularity), and a single source/target pair will be seen as a single flow, so it'll load balance between traceroutes, but not across packets of the same traceroute.
To be or not to be? That is the question..
The answer is Chuck Norris.
For those wondering whether sending ICMP, UDP or TCP probes makes a difference.. yes, it does. Many network admins block traffic they don't expect, so unless they know specifically of a UDP based service hosted on their network they may well block the traffic, especially since it's commonly used in DDoS attacks. For TCP this is less likely. Have a look at the traceroutes below, the first being a TCP SYN traceroute, the second a UDP traceroute. You'll notice that the UDP route gets blocked after hop 7.
root@ks364689:~# traceroute -T www.bbc.co.uk traceroute to www.bbc.co.uk (212.58.244.66), 30 hops max, 60 byte packets 1 rbx-60-m2.fr.eu (91.121.210.252) 0.879 ms 0.863 ms 0.956 ms 2 rbx-g2-a9.fr.eu (37.187.231.103) 0.630 ms rbx-g2-a9.fr.eu (37.187.231.67) 0.805 ms rbx-g2-a9.fr.eu (37.187.231.103) 0.608 ms 3 th2-1-a9.fr.eu (213.251.130.53) 4.047 ms gsw-1-a9.fr.eu (213.251.130.55) 4.020 ms 4.015 ms 4 gsw-1-a9.fr.eu (37.187.36.215) 4.346 ms level3.as3356.fr.eu (178.33.100.222) 3.824 ms gsw-1-a9.fr.eu (37.187.36.215) 4.318 ms 5 ae-121-3507.edge4.London1.Level3.net (4.69.166.9) 7.255 ms ae-120-3506.edge4.London1.Level3.net (4.69.166.5) 7.191 ms level3.as3356.fr.eu (178.33.100.222) 4.042 ms 6 ae-120-3506.edge4.London1.Level3.net (4.69.166.5) 7.219 ms ae-122-3508.edge4.London1.Level3.net (4.69.166.13) 7.507 ms ae-119-3505.edge4.London1.Level3.net (4.69.166.1) 7.472 ms 7 BBC-TECHNOL.edge4.London1.Level3.net (212.113.14.222) 8.638 ms ae-122-3508.edge4.London1.Level3.net (4.69.166.13) 7.502 ms BBC-TECHNOL.edge4.London1.Level3.net (212.113.14.222) 8.585 ms 8 * * * 9 ae0.er01.telhc.bbc.co.uk (132.185.254.109) 7.831 ms * 7.787 ms 10 132.185.255.149 (132.185.255.149) 8.115 ms ae0.er01.telhc.bbc.co.uk (132.185.254.109) 7.955 ms 132.185.255.149 (132.185.255.149) 8.513 ms 11 bbc-vip111.telhc.bbc.co.uk (212.58.244.66) 7.548 ms 7.646 ms 7.621 ms root@ks364689:~# root@ks364689:~# root@ks364689:~# traceroute www.bbc.co.uk traceroute to www.bbc.co.uk (212.58.244.67), 30 hops max, 60 byte packets 1 rbx-60-m2.fr.eu (91.121.210.252) 0.882 ms 0.869 ms 1.082 ms 2 rbx-g2-a9.fr.eu (37.187.231.67) 0.825 ms 0.914 ms 0.901 ms 3 th2-1-a9.fr.eu (213.251.130.53) 4.588 ms 4.560 ms gsw-1-a9.fr.eu (213.251.130.55) 3.897 ms 4 level3.as3356.fr.eu (178.33.100.222) 3.865 ms gsw-1-a9.fr.eu (37.187.36.215) 4.357 ms level3.as3356.fr.eu (178.33.100.222) 3.835 ms 5 ae-121-3507.edge4.London1.Level3.net (4.69.166.9) 7.329 ms ae-119-3505.edge4.London1.Level3.net (4.69.166.1) 7.512 ms level3.as3356.fr.eu (178.33.100.222) 4.053 ms 6 ae-119-3505.edge4.London1.Level3.net (4.69.166.1) 7.481 ms ae-121-3507.edge4.London1.Level3.net (4.69.166.9) 7.324 ms ae-120-3506.edge4.London1.Level3.net (4.69.166.5) 7.300 ms 7 ae-120-3506.edge4.London1.Level3.net (4.69.166.5) 7.478 ms 7.438 ms * 8 * * * 9 * * * 10 * * * 11 * * * 12 * * *^C root@ks364689:~#
Version history:
Mon 05 October 2015 22:06 UTC | initial release |
Wed 07 October 2015 19:15 UTC | Added multi-threaded and load balanced versions |
Thu 08 October 2015 20:14 UTC | Added TCP vs UDP traceroute efficacy example |