Who's online
There are currently 0 users and 20 guests online.
Site Counter
Site Counter: 278792Registered Users: 1988Published Nodes: 331Your IP: 66.249.71.78Since: 2008-09-09
|
|
How to Stop a Hacker - Don't Trust User Input
This hilarious comic strip by xkcd illustrates one of the most important rules of system security: never trust user input. All user input to a program or website should be sanitized by checking and processing it to make sure that it will not do damage to the system.There are several reasons why this is necessary. First of all, programmers have to deal with user ignorance that may lead to user input breaking the system. Second, programmers have to stop deliberate attacks designed to break the system. How Input Can Be Used to Hack a System In the xkcd comic strip the computer savvy mother hacked the school database by naming her son a MYSQL statement that causes the database to destroy the student records. The syntax for storing a new student in the database is probably something similar to:
INSERT INTO Students (StudentName) values('Sample Name');
MySQL statements end with a semicolon, so when a semicolon is encountered it assumes that is the end of the statement, and it executes the next tokens as a separate statement. So this is what happened when the school secretary entered little Bobby's evil name into the database:
INSERT INTO Students (StudentName) values('Robert'); DROP TABLE Students;');
Suddenly instead of having one command which stores the student name in the database the MYSQL parser sees three commands:
INSERT INTO Students (StudentName) values('Robert'); DROP TABLE Students; ');
The first statement stores a new student named Robert. The second command deletes all the student records. The last is just the extra quote and closing parentheses, so it generates an error. However, the damage has already been done and student records are gone.
In this case the input was deliberately designed to break the system. However, consider the case of someone who is registering a new profile on a website. On the server side the programmer has designed the code so that user information is stored in a database. One of the fields is the user's handle.
In this case the user is a pubescent young boy who thinks too much of himself. As his username he enters "Every girl's dream." After completing the rest of the form he hits submit and waits for his new profile. Needless to say, he is mystified when it gives him strange error messages, or perhaps never responds. What happened?
On the server side the code probably did something like this:
INSERT INTO user_database (username) values('Every girl's dream');
When the MYSQL parser looks at this command it sees that the code wants it to store a new record in the user database. Then it looks at the values. First it sees a string:
'Every girl'
Then it finds a bunch of garble:
s dream'
Then it reaches the final closing parentheses. The MYSQL database is going to return an error message and will fail to store the new record because it sees this as a syntax error.
This is a prime example of how user ignorance can also break a system.
So how does the programmer stop this input? There are several simple ways.
Train Human Employees
In the first example, from the xkcd comic strip, the mother's sneaky hacking ploy could have been thwarted by simple employee training. The mother doubtlessly wrote her son's name on a physical paper form, and the data was entered into the computer later by a secretary. Humans are very good at pattern recognition and spotting unusual trends, making them the perfect filter. If the secretary had been trained to show unusual names to the local IT professional or to someone else who knew a lot about computers this hacking attempt could have been stopped before it even started.
In all cases where humans enter data off of physical paper forms they should be trained to determine what type of data they should enter and what they should not enter. "Robert'); DROP TABLE Students;" is obviously not a normal name, so it should be pretty obvious that there is something wrong with it.
Force Form Input to Meet Rigorous Requirements
In cases where information is entered into a database automatically there will be no human checking each entry. Therefore a really easy way to prevent hacking is to limit input to letters or numbers. For example, the name field should only accept letters, and the telephone field should only accept numbers.
There are two ways to do this. Either remove unwanted characters on the server side and then store the new username, or return an error message saying something such as "Only letters may be used in the name field."
With the first technique, when the secretary entered "Robert'); DROP TABLE Students;" into the database the computer would have preprocessed it, removing the quotes, semicolons, and parentheses. When the computer was done all that would be left was "Robert DROP TABLE Students" which, although an unusual name to say the least, would not destroy the database at all because there is no way for it to be interpreted as a command.
Using the second technique the secretary would have gotten an error message saying "Only letters may be entered for the student's name." The secretary would have called Mom up and said "Sorry but we can't enter your son's name in the database. Does he have a nickname or some other name we can use?"
Needless to say neither situation is optimal when you consider that a multicultural environment means that people may have unusual character as part of their name: perhaps dashes or even quotes in a native Hawaiian name. Also, it would be nice to allow people to enter unusually punctuated ASCII art names as forum handles and nicknames.
Another aspect to consider is longer inputs such as the input for entering comments on this post. Users will want to be able to enter quotation marks, semicolons, and other characters that are normally used in text. If you didn't allow these then input would be severely limited.
Clearly forcing input to meet rigorous requirements will not work in these cases. It may work for certain simple fields, but not for longer, more detailed input.
Encode All Special Characters
This simple technique is by far the easiest and most flexible way of stopping users from entering dangerous input. The concept is simple. In HTML each character has a special character code that is used to represent it. For example, the character code for the quotation mark is:
"
If you replace all characters that aren't letters or numbers with a corresponding codes then they will have no effect on the database. There are different ways of doing this. For example, in the xkcd comic the school's system should have escaped all quotes. This is a simple technique that replaces all double quotes with:
\"
and all single quotes with
\'
These two replacements are simple codes that MYSQL recognizes as indicating a quote that is part of the text, not the end of the text string. So the code should have turned Bobby's name into:
Robert\'); DROP TABLE Students;
The name would have stored just fine without breaking the end of the string and interpreting the second half as a command.
Most languages, including my favorite server language PHP, have commands for automatically doing character encoding. It is usually best to use both quotation escaping and HTML entity encoding. There is a simple reason for this.
Consider the following scenario. An evil hacker finds Experiment Garden and decides that he is going to destroy it completely, or at the very least sabotage it. So he posts the following comment on the blog:
'); DROP TABLE POSTS; DROP TABLE COMMENTS;
This evil comment has two parts. First it includes a MYSQL injection statement that attempts to delete the entire blog if the server side code isn't smart enough to escape quotes. The second part is a little more interesting. If the quotes are escaped, but special characters are not encoded, then when this comment is displayed in the browser, the browser will interpret the second part and run it as JavaScript. The evil JavaScript code modifies every link on the page so that it points to.... well you can probably guess what it points to.
That's right, its an evil rickroll.
However, if HTML entity encoding is turned on the code will instead look like:
<script type="text/javascript"> var links = document.getElementsByTagName("a"); for (var i=0; i<links.length; i++) { links[i].href = "http://www.youtube.com/watch?v=oHg5SJYRHA0"; } </script>
This would not be recognized by the browser as code, so it would not be executed. Instead it would be displayed as is on the screen in the comments.
So to summarize, both types of encoding are needed to prevent malicious or just plain ignorant user input from breaking the system.
Conclusion
This brief discussion covers only a few of the basic ways to stop code injection in user input, and stop special characters in input from breaking the code. However, by combining these techniques depending on the type of input and risk level of your application you can plug one of the major security holes that hackers like to exploit.
|
Quotes
Time is our most precious asset, we should invest it wisely.
|
|
Post new comment