At one point or another, any developer worth its salt has had to write a bit of SQL. While ORMs such as Hibernate, NHibernate or Doctrine saves us from having to write long, complicated queries–favouring a more object-oriented approach instead–sometimes this is simply inevitable.
So, if you're not familiar with the syntax, you'll probably fire up a browser and head to the 'documentation' section of your DB vendor of choice, such as MySQL, SQL Server, Oracle or PostgreSQL where you'll see things such as:
// Example from PostgreSQL.org SELECT f.title, f.did, d.name, f.date_prod, f.kind FROM distributors d, films f WHERE f.did = d.did
// Example from Oracle.com SELECT EMPNO, LASTNAME, WORKDEPT, DEPTNAME FROM SAMP.EMPLOYEE JOIN SAMP.DEPARTMENT ON WORKDEPT = DEPTNO AND YEAR(BIRTHDATE) < 1930
// Example from MySQL.com SELECT t1.name, t2.salary FROM employee AS t1 INNER JOIN info AS t2 ON t1.name = t2.name;
While this is somewhat readable, what happens when you have a substantially larger query?
// Example from MSDN.com select s.AdvAssignment from study s inner join Assignment_Status cnd on (s.AdvAssignment = cnd.AdvAssignment) where s.AdvReference in(select adv from @userListGroup) and s.AdvInvalidated IS NULL and(cnd.Digital_Attachment_Nbr > 0 and not(cnd.Adv_Nbr = 0 and IS NULL(cnd.Doc_Flag,'N')='Y') or s.Is_Image= 1) UNION select s.AdvAssignment from study s inner join Assignment_Status cnd on (s.AdvAssignment = cnd.AdvAssignment) left outer join UserAssignment us on( s.AdvAssignment = us.AdvAssignment) where (us.AdvUser = @userid and(us.assigndate =getdate())) and s.AdvInvalidated ISNULL and(cnd.Digital_Attachment_Nbr > 0 and not(cnd.Adv_Nbr = 0 and IS NULL(cnd.Doc_Flag,'N')='Y') or s.Is_Image= 1)
For the moment, let's just restrain from being completely honest and say that "there is (serious) room for improvement".
Lately I've had to tackle a project where I had to deal with very large and complex SQL queries that spanned several schemas and tables and had to be run on a Database and a model that I didn't fully know. What I discovered is that the most common method of 'writing' SQL queries, not only is incredibly error-prone, but also hard to follow, impossible to document and therefore, extremely inefficient.
I started organising the queries in the usual manner, following the best-practices and indenting all the different statements, but still found that it was as inefficient and as 'unreadable' as before.
The gist of it is that most programming languages forces us to follow a certain "sequence" of statements, and SQL is not the exception:
- SELECT this and that columns
- FROM this table
- LEFT JOINING this other table
- ON these two columns
- WHERE something is equal to something else
Additionally, I tried to add some elements of the coding style found in other programming languages. This means that instead of doing:
variable1, variable2, variable3 (where it's a bit complex to add a small comment explaining what these variables are for)
I prefered to use:
So, in order to bring a bit of sanity to the writing of SQL queries, here's what I'm suggesting to use instead:
SELECT s.AdvAssignment AS Assignment FROM study s INNER JOIN Assignment_Status cnd ON ( s.AdvAssignment = cnd.AdvAssignment ) WHERE s.AdvReference IN ( SELECT adv FROM @userListGroup ) AND s.AdvInvalidated IS NULL AND ( cnd.Digital_Attachment_Nbr > 0 AND NOT ( cnd.Adv_Nbr = 0 AND IS NULL ( cnd.Doc_Flag, 'N' ) = 'Y' ) OR s.Is_Image = 1 ) UNION SELECT s.AdvAssignment AS Assignment FROM study s INNER JOIN Assignment_Status cnd ON ( s.AdvAssignment = cnd.AdvAssignment ) LEFT OUTER JOIN UserAssignment us ON ( s.AdvAssignment = us.AdvAssignment ) WHERE ( us.AdvUser = @userid AND ( us.assigndate = getdate() ) ) AND s.AdvInvalidated IS NULL AND ( cnd.Digital_Attachment_Nbr > 0 AND NOT ( cnd.Adv_Nbr = 0 AND IS NULL ( cnd.Doc_Flag, 'N' ) = 'Y' ) OR s.Is_Image = 1 )
At this point, you're probably wondering "Okay... but why?"
I'm glad you asked. Besides the obvious advantage of an improved readability and maintainability, Lemme 'splain the main benefits.
- Putting "main" statements (such as SELECTs, JOINs, WHEREs, etc.) at the start, allows us to quickly identify where we're making the JOINs (if we have any), and allows other developers to see at a glampse if we're doing things such as UNIONs, etc.
- By giving each column it's own line, developers of languages such as C or Java will be able to promptly identify which "fields" are going to be returned by the query more easily
- You'll give DBAs an easier time identifying which tables they'll need to grant you access to in the production servers
- You can use /* */ to add comments in any line explaining what's going on
- You'll be able to quickly find out if you forgot a parenthesis
That's all I have... for now. Do you like it? I'd love to hear your opinion!