Biz & IT —

How security flaws work: SQL injection

This easily avoidable mistake continues to put our finances at risk.

A demonstration of SQL injection in action.

Thirty-one-year-old Laurie Love is currently staring down the possibility of 99 years in prison. Love was recently told he'll face extradition to the US, where he stands accused of attacking systems belonging to the US government. The attack was allegedly part of the #OpLastResort hack in 2013, which targeted the US Army, the US Federal Reserve, the FBI, NASA, and the Missile Defense Agency in retaliation over the tragic suicide of Aaron Swartz as the hacktivist infamously awaited trial.

Love is accused of participating in the #OpLastResort initiative through SQL injection attacks, an increasingly common tactic. SQL injections have recently been detected against state electoral boards, and these attacks are regularly implicated in thefts of financial info. Today, they've become a significant and recurring problem.

SQL injection attacks exist at the opposite end of the complexity spectrum from buffer overflows, the subject of our last in-depth security analysis. Rather than manipulating the low-level details of how processors call functions, SQL injection attacks are generally used against high-level languages like PHP and Java, along with the database libraries that applications in these languages use. Where buffer overflows require all sorts of knowledge about processors and assemblers, SQL injection requires nothing more than fiddling with a URL.

As with buffer overflows, SQL injection flaws have a long history and continue to be widely used in real-world attacks. But unlike buffer overflows, there's really no excuse for the continued prevalence of SQL injection attacks: the tools to robustly protect against them are widely known. The problem is, many developers just don't bother to use them.

One of Microsoft's less valuable innovations

The earliest description of these attacks probably came in 1998, when security researcher Jeff Forristal, writing under the name "rain.forest.puppy," wrote about various features of Microsoft's IIS 3 and 4 Web servers in the hacker publication Phrack.

IIS came with several extensions that provided ways to generate webpages based on data from databases. Then and now, most databases use variants of a language called SQL (Structured Query Language) to manipulate their data. Databases using SQL organize data into tables built up of rows and columns. Each table represents a particular kind of thing. Each column of the table represents a particular fact about that thing, and each row of the table is an instance of that thing. So, for example, a table named "people" might have columns for "age" and "name," with each row in the table representing a specific person. SQL is used for both defining these tables and columns and for manipulating the rows within them.

IIS had several different ways of writing SQL commands—"queries"—to find and retrieve information in an SQL database. The best-known and longest-lived of these is ASP (Active Server Pages), a system for writing webpages with embedded programming (usually using JavaScript or VBScript) that typically include substantial amounts of database access. At the time, IIS also included something called IDC (Internet Database Connector) that was a less flexible way of sending an SQL query to a database and tabulating the results.

Sometimes, the SQL query that these IDC and ASP files used for grabbing information from the database was hardcoded; that is, the same query was used every single time the webpage was loaded. But because the query was often written to take one or more parameters, the data shown on that page could change as those parameters changed.

For example, an online store might have a page named order.asp to display the contents and status of an order. Each order is identified by an order ID, with the order ID passed as a parameter to the page as part of the URL: order.asp?orderID=1234, say, to view the order with ID 1234. The ASP page would take the parameter from the URL and combine it into an SQL query, which was then sent to the database for lookup.

What Forristal noticed was that the way these parameters were combined to build the query meant that an attacker could force the database to execute other queries of the attacker's choosing. This act of subverting the application to run queries chosen by an attacker is called SQL injection.

A popular tool to this day

Since their initial discovery, SQL injection flaws have routinely been discovered in the wild and used to compromise vast quantities of data. While Forristal looked at Microsoft's software first, SQL injection was an industry-wide problem; sites using Java, PHP, ColdFusion, Ruby, and Python have all had SQL injection flaws. Virtually every technology that can be used to build dynamic, database-driven websites is susceptible to SQL injection.

And attacks using SQL injection are abundant. Earlier this year, a Florida man was charged with felony hacking after using SQL injection to read sensitive data from an election site. SQL injection vulnerabilities in the Joomla CMS and popular WordPress plugins have put hundreds of thousands of blogs at risk of attack. Security firm HBGary was devastatingly attacked in 2011 after members of the Anonymous collective discovered SQL injection flaws in a custom-developed content management system. The group responsible for that attack would later go on to call itself Lulzsec.

What makes these attacks particularly valuable to attackers—and devastating to victims—is their dual power to both read sensitive data from a database and to write new or updated data to the database. With this power to inject SQL, attackers could potentially read usernames, passwords, credit card numbers, Social Security numbers, or whatever else happened to be stored in the database.

But they could also add their own data. In a well-designed system, passwords, for example, won't be stored as plain text but instead will be irreversibly transformed into a hashed value. Merely reading the database therefore won't be sufficient to let an attacker log in and snoop around; while the database will contain the necessary username, it won't reveal the password. The ability to write to the database, however, means that an attacker can simply create additional user accounts—ones for which they do know the passwords—and log in with those.

Similarly, if the database is used to store any kind of auditing information, such as tracking logins and other user activity, the ability to write to it means that hackers can clean up after themselves, deleting any record of their attack.

To understand how these attacks happen, we need to understand how Web applications use databases. Basic SQL queries have a fairly simple structure. For example, a call to search for a particular order might look like

SELECT * FROM orders WHERE orderid=1234

The * means "retrieve all the columns," orders is the name of the table of data, and WHERE orderid=1234 restricts the data to only include the rows where the value of the orderid column is 1234. The order ID number being searched for would typically be taken from parameter embedded in the URL. So the 1234 from order.asp?orderID=1234 is combined with SELECT * FROM orders WHERE orderid= to produce the SQL query SELECT * FROM orders WHERE orderid=1234, and that query will be sent to the database.

The fragment of code that does this might look something like:

// first, pull the order ID from the URL query
var orderid = Request.QueryString("orderid");
// then use the order ID to construct a query
var query   = "SELECT * FROM orders WHERE orderid = " + orderid;
// and send the whole string to the database to execute

But two things together make this problematic. First, SQL databases allow multiple queries to be strung together one after the other. To perform two searches, one could write:

SELECT * FROM orders WHERE orderid=1234 SELECT * FROM orders WHERE orderid=5678

Different options and databases might change this a little—for example, by requiring a semicolon between the two queries—but the basic principle is the same.

Second, URLs are not constrained to being numbers; they're just text strings. So while the order ID should be a single number like 1234, it doesn't have to be. An attacker could request, say, order.asp?orderID=1234 SELECT * FROM orders WHERE orderid=5678, putting an SQL fragment into the URL itself. If the application does not take care to protect itself against SQL injection attacks, the query it constructs will include the attacker's code, and the database will run both queries together.

The way the query string is built will often change the exact code the attacker has to write. Integer parameters, as with the order ID, are typically the easiest to handle because they're typically not modified or altered in any way. String parameters can introduce some additional complexity, because strings have to be wrapped in quote marks within the SQL query. The code to do this might look something like:

// first, pull the name from the URL query
var customername = Request.QueryString("name");
// then use the name to construct a query
var query   = "SELECT * FROM customers WHERE customername = '" + customername + "'";
// and send the whole string to the database to execute

In this situation, simply appending an SQL command to the URL won't work because of those single quote marks. Making a request to, say, customer.asp?name=SELECT * FROM orders would produce the following query:

SELECT * FROM customers WHERE customername = 'SELECT * FROM orders'

This won't reveal any data; it will just look for a customer with the rather implausible name of "SELECT * FROM orders". But the attacker can solve this, prepending the attack query with a single quote mark, producing:

SELECT * FROM customers WHERE customername = '' SELECT * FROM orders'

That stray quote mark at the end of the query might cause the database to reject the query, so the usual solution is to comment it out, so that it is ignored, using -⁠-. Accordingly, the URL would look like customer.asp?name=' SELECT * FROM orders -⁠⁠-, and the query would be:

SELECT * FROM customers WHERE customername = '' SELECT * FROM orders --'

Query in, data out

Attackers can do a lot with the ability to simply run queries. Destructive actions in particular, such as deleting data, don't really need much in the way of output. Some databases let you run command-line programs from within queries, which again can be all hackers need to do their dirty work. But generally, an attacker needs to read information as well as write it, both to retrieve valuable data stored in the database and to learn about the names of the database's tables and columns.

Running a query of the attacker's choosing is one thing; seeing the results is another. When a query is run, the usual response from the database is a table of data. Applications vary in the way they react to this. Some applications will simply present the user with every bit of data returned in the table, making it very easy for an attacker to extract information from the system.

Others will expect the result table to have particular columns and ignore any data not included in those columns. This isn't an insurmountable problem for attackers, as they may be able to ensure that the data they want appears in one of those columns by altering the form of the query they write.

Similarly, an application may iterate through every row of data returned by the query and show them all to the attacker. Again, this represents the easiest situation for data exfiltration. Alternatively, if the application expects only a single row to be returned, it may only show the first row that the injected query returns. The application might even issue an error message if it's expecting only a single row to be returned by the database. These situations make extracting data more complex, though again, it can be addressed by structuring the injected query just so.

The other main response to a query is an error message, indicating that something went wrong. Typically when injecting SQL, this will be an error from the database to complain that the query was malformed in some way. Again, applications vary in how helpful they make this error message. The best case for the attacker is when the developer is lazy and simply spits back the full error message to all and sundry. This is often done during development, as it makes the develop/test/debug cycle quicker, but it shouldn't ever happen in real production systems.

To take advantage of this, the attacker can ensure that every injected query results in an error that reveals some information. For example, trying to convert a string into an integer will usually result in an error, and that error message will include the string that couldn't be converted. So, to extract the database name, an attacker could run a query looking like this (in SQL Server):

SELECT * FROM orders WHERE orderid=CONVERT(int, db_name())

db_name() here is an SQL Server function that returns the name of the current database. CONVERT tries to convert that name into an integer. This will fail, and it will produce an error message looking like:

Conversion failed when converting the nvarchar value 'databasenamehere' to data type int.

Thereby revealing the database's name to an attacker.

Channel Ars Technica