Sequential: A new coding style for SQL queries

March 25, 2014

At one point or another, any developer worth its salt has had to write a bit of SQL. While ORMs such as Hibernate, NHibernate or Doctrine saves us from having to write long, complicated queries–favouring a more object-oriented approach instead–sometimes this is simply inevitable.

So, if you're not familiar with the syntax, you'll probably fire up a browser and head to the 'documentation' section of your DB vendor of choice, such as MySQL, SQL Server, Oracle or PostgreSQL where you'll see things such as:

// Example from PostgreSQL.org
SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM distributors d, films f
    WHERE f.did = d.did

Or

// Example from Oracle.com
SELECT EMPNO, LASTNAME, WORKDEPT, DEPTNAME 
     FROM SAMP.EMPLOYEE JOIN SAMP.DEPARTMENT 
     ON WORKDEPT = DEPTNO 
     AND YEAR(BIRTHDATE) < 1930

Or even

// Example from MySQL.com
SELECT t1.name, t2.salary
  FROM employee AS t1 INNER JOIN info AS t2 ON t1.name = t2.name;

While this is somewhat readable, what happens when you have a substantially larger query?

// Example from MSDN.com
select s.AdvAssignment from study s
      inner join Assignment_Status cnd on (s.AdvAssignment = cnd.AdvAssignment)
where
      s.AdvReference in(select adv from @userListGroup)
      and s.AdvInvalidated IS NULL
      and(cnd.Digital_Attachment_Nbr > 0
      and not(cnd.Adv_Nbr = 0 and IS NULL(cnd.Doc_Flag,'N')='Y')
      or s.Is_Image= 1)
UNION
select s.AdvAssignment from study s
      inner join Assignment_Status cnd on (s.AdvAssignment = cnd.AdvAssignment)
      left outer join UserAssignment us on( s.AdvAssignment = us.AdvAssignment)
where (us.AdvUser = @userid
      and(us.assigndate =getdate()))
      and s.AdvInvalidated ISNULL
      and(cnd.Digital_Attachment_Nbr > 0
      and not(cnd.Adv_Nbr = 0 and IS NULL(cnd.Doc_Flag,'N')='Y')
      or s.Is_Image= 1)

For the moment, let's just restrain from being completely honest and say that "there is (serious) room for improvement".

Lately I've had to tackle a project where I had to deal with very large and complex SQL queries that spanned several schemas and tables and had to be run on a Database and a model that I didn't fully know. What I discovered is that the most common method of 'writing' SQL queries, not only is incredibly error-prone, but also hard to follow, impossible to document and therefore, extremely inefficient. 

I started organising the queries in the usual manner, following the best-practices and indenting all the different statements, but still found that it was as inefficient and as 'unreadable' as before.

The gist of it is that most programming languages forces us to follow a certain "sequence" of statements, and SQL is not the exception:

  • SELECT this and that columns
  • FROM this table
  • LEFT JOINING this other table
  • ON these two columns
  • WHERE something is equal to something else

Additionally, I tried to add some elements of the coding style found in other programming languages. This means that instead of doing:

variable1, variable2, variable3 (where it's a bit complex to add a small comment explaining what these variables are for)

I prefered to use:

variable1,
variable2,
variable3

So, in order to bring a bit of sanity to the writing of SQL queries, here's what I'm suggesting to use instead:

 

SELECT 
	s.AdvAssignment AS Assignment

FROM 
	study s

INNER JOIN
	Assignment_Status cnd 
	ON ( s.AdvAssignment = cnd.AdvAssignment )
	
WHERE
	s.AdvReference 
	IN 
	(
		SELECT 
			adv
		FROM 
			@userListGroup
	)
	AND s.AdvInvalidated IS NULL
	AND 
	(
	    cnd.Digital_Attachment_Nbr > 0
	    AND NOT 
	    (
		cnd.Adv_Nbr = 0 
		AND IS NULL ( cnd.Doc_Flag, 'N' ) = 'Y'
	    )
	    OR s.Is_Image = 1
	)
	
UNION

SELECT 
	s.AdvAssignment AS Assignment

FROM
	study s
	
INNER JOIN 
	Assignment_Status cnd 
	ON ( s.AdvAssignment = cnd.AdvAssignment )

LEFT OUTER JOIN 
	UserAssignment us 
	ON ( s.AdvAssignment = us.AdvAssignment )
	
WHERE 
	(
		us.AdvUser = @userid
		AND ( us.assigndate = getdate() )
	)
	AND s.AdvInvalidated IS NULL
	AND 
	( 
		cnd.Digital_Attachment_Nbr > 0
  		AND NOT 
                ( 
			cnd.Adv_Nbr = 0 
			AND IS NULL ( cnd.Doc_Flag, 'N' ) = 'Y'
		)
		OR s.Is_Image = 1
	)

At this point, you're probably wondering "Okay... but why?"

I'm glad you asked. Besides the obvious advantage of an improved readability and maintainability, Lemme 'splain the main benefits.

  • Putting "main" statements (such as SELECTs, JOINs, WHEREs, etc.) at the start, allows us to quickly identify where we're making the JOINs (if we have any), and allows other developers to see at a glampse if we're doing things such as UNIONs, etc.
  • By giving each column it's own line, developers of languages such as C or Java will be able to promptly identify which "fields" are going to be returned by the query more easily
  • You'll give DBAs an easier time identifying which tables they'll need to grant you access to in the production servers
  • You can use /* */ to add comments in any line explaining what's going on
  • You'll be able to quickly find out if you forgot a parenthesis

That's all I have... for now. Do you like it? I'd love to hear your opinion!