When developing stored procedures, there seems to be a lot of emphasis
on "get it done fast." Which means type all lower case, pay little
attention to formatting, and sometimes throw best practices out the window.
Personally, I would rather front-load my development time; I think that the
costs I pay in initial development far outweigh what I might have paid in
maintenance down the road. Making readable and maintainable code that also
performs well and is delivered in a timely manner is something that a lot of us
strive for, but we don't always have the luxury. But I have found that it is
very easy to fall into the good kind of development habits.
A popular adage is, "you can have it
fast, cheap, or good. Pick two." I contend that if you develop habits like
these and use them in all of your database programming, the time difference
between following those methods and doing it the "lazy" way will be
negligible at most; and so, fast and good go hand in hand, rather than trade
off for one another.
Once in a while this "disorder"
slows me down. I come across code that someone else wrote (almost exclusively
it is someone I no longer work with), and I can't even bear to look at it
without first re-writing it. Here is a fake but realistic example of the kinds
of procedures I see:
create proc foo(@i int,@bar int=null,@hr int output,@xd datetime) as
declare @c varchar declare @s nchar(2) declare @x int set @grok='Beverly' set @korg='MA' set @x=5 select customers.customerid,firstname,lastname,orderdate from customers join orders on customers.customerid=orders.customerid where status=@i or status<=@bar and orderdate<=@xd set @hr = @@rowcount select customers.customerid,count(*) from customers left join orders on customers.customerid=orders.customerid where customers.city=@c and customers.state=@s group by customers.customerid having count(*)>=@x return (@@rowcount) |
This kind of feels like the 5th grade all
over again, but when I get handed code like this, I start immediately
visualizing one of those "find all of the things wrong with this
picture" exercises, and feel compelled to fix them all. So, what is wrong
with the above sample, you may ask? Well, let me go through my own personal
(and quite subjective) subconscious checklist of best practices when I write my
own stored procedures. I have never tried to list these all at once, so I may
be all over the place, but hopefully I will justify why I choose to have these
items on my checklist in the first place.
======================
Upper casing T-SQL keywords and built-in
functions
I always use CREATE PROCEDURE and not create
procedure or Create Procedure. Same goes for all of the code throughout my
objects... you will always see SELECT, FROM, WHERE and not select, from, where.
I just find if much more readable when all of the keywords are capitalized.
It's not that hard for me to hold down the shift key while typing these words,
and there are even IDEs that will do this kind of replacement for you (for
example, Apex SQLEdit has a
handy "mis-spelled keyword replacement" feature that I think could be
used for this purpose also). This is probably one of the few areas where Celko
and I actually agree. :-)
======================
Using a proper and consistent naming scheme
Obviously "foo" is a horribly
ridiculous name for a procedure, but I have come across many that were equally
nondescript. I like to name my objects using {target}_{verb}. So for example,
if I have a Customers table, I would have procedures such as:
dbo.Customer_Create
dbo.Customer_Update
dbo.Customer_Delete
dbo.Customer_GetList
dbo.Customer_GetDetails
dbo.Customer_Update
dbo.Customer_Delete
dbo.Customer_GetList
dbo.Customer_GetDetails
This allows them to sort nicely in Object
Explorer / Object Explorer Details, and also narrows down my search quickly in
an IntelliSense (or SQLPrompt) auto- complete list. If I have a stored procedures named in the style
dbo.GetCustomerList, they get mixed up in the list with dbo.GetClientList and
dbo.GetCreditList. You could argue that maybe these should be organized by
schema, but in spite of all the buzz, I have not developed a need or desire to
use schemas in this way. For most of the applications I develop,
ownership/schema is pretty simple and doesn't need to be made more complex.
Of course I NEVER name stored procedures
using the sp_ prefix. See Brian Moran's article in SQL Server Magazine back in 2001. Or just ask anybody. :-) I also avoid other identifying
object prefixes (like usp_). I don't know that I've ever been in a situation
where I couldn't tell that some object was a procedure, or a function, or a
table, and where the name really would have helped me all that much. This is
especially true for the silly (but common) "tbl" prefix on tables. I
don't want to get into that here, but I've always scratched my head at that
one. Views may be the only place where I think this is justified, but then it
should be a v or View_ prefix on the views only; no need to also identify
tables... if it doesn't have a v or View_ prefix, it's a table!
More important than coming up with a proper
naming scheme (because that is mostly subjective), it is much more important
that you apply your naming scheme consistently. Nobody wants to see procedures
named dbo.Customer_Create, dbo.Update_Customer and dbo.GetCustomerDetails.
======================
Using the schema prefix
I always specify the schema prefix when
creating stored procedures. This way I know that it will be dbo.procedure_name
no matter who I am logged in as when I create it. Similarly, my code always has
the schema prefix on all object references. This prevents the database engine
from checking for an object under my schema first, and also avoids the issue
where multiple plans are cached for the exact same statement/batch just because
they were executed by users with different default schemas.
======================
Using parentheses around parameter list
I am not a big fan of using parentheses
around the parameter list. I can't really explain it, as I am a proponent of
consistency, and this is the syntax required when creating user-defined
functions. But I wanted to mention it because you will not see any of my stored
procedures using this syntax. I'm open to change if you can suggest a good
enough reason for me to do so.
======================
Lining up parameter names, data types, and
default values
I find this much easier to read:
CREATE PROCEDURE dbo.User_Update
@CustomerID INT, @FirstName VARCHAR(32) = NULL, @LastName VARCHAR(32) = NULL, @Password VARCHAR(16) = NULL, @EmailAddress VARCHAR(320) = NULL, @Active BIT = 1, @LastLogin SMALLDATETIME = NULL AS BEGIN ... |
...than
this:
CREATE PROCEDURE dbo.User_Update
@CustomerID INT, @FirstName VARCHAR(32) = NULL, @LastName VARCHAR(32) = NULL, @Password VARCHAR(16) = NULL, @EmailAddress VARCHAR(320) = NULL, @Active BIT = 1, @LastLogin SMALLDATETIME = NULL AS BEGIN ... |
======================
Using spaces and line breaks liberally
This is a simple one, but in all comparison
operators I like to see spaces between column/variable and operator. So instead
of @foo int=null or where @foo>1 I would rather see @foo INT = NULL or WHERE
@foo > 1.
I also tend to place at least a carriage
return between individual statements, especially in stored procedures where
many statements spill over multiple lines.
Both of these are just about readability,
nothing more. While in some interpreted languages like JavaScript, size is
king, and compressing / obfuscating code to make it as small as possible does
provide some benefit, in T- SQL you would be hard-pressed to find a case where
this comes into play. So, I lean to the side of readability.
======================
Avoiding data type / function prefixes on
column / parameter names
I often see prefixes like @iCustomerID,
@prmInputParameter, @varLocalVariable, @strStringVariable. I realize why people
do it, I just think it muddies things up. It also makes it much harder to
change the data type of a column when not only do you have to change all the
variable/parameter declarations but you also have to change @iVarName to
@bigintVarName, etc. Otherwise the purpose of the prefixed variable name loses
most of its benefit. So, just name the variable for what it is. If you have a
column EmailAddress VARCHAR(320), then make your variable/parameter declaration
@EmailAddress VARCHAR(320). No need to use @strEmailAddress ... if you need to
find out the data type, just go to the declaration line!
======================
Using lengths on parameters, even when
optional
I occasionally see people define parameters
and local variables as char or varchar, without specifying a length. This is
very dangerous, as in many situations you will get silent truncation at 30
characters, and in a few obscure ones, you will get silent truncation at 1
character. This can mean data loss, which is not very good at all. I have asked
that this silent truncation at least become consistent throughout the product
(see Connect #267605), but nothing has happened yet. Fellow MVP Erland Sommarskog has gone
so far as to ask for the length declaration to become mandatory (see Connect #244395) and, failing that, feels that this should be something that raises a
warning when using his proposed SET STRICT_CHECKS ON setting (see http://www.sommarskog.se/strict_checks.html#nodefaultlength).
======================
Listing output parameters last
My habit is to list OUTPUT parameters last. I
am not sure why that is exactly, except that it is the order that I
conceptually think about the parameters... in then out, not the other way
around.
======================
Using BEGIN / END liberally
I have seen many people write stuff like
this:
CREATE PROCEDURE dbo.ProcedureA
AS SELECT * FROM foo; GO SELECT * FROM bar; GO |
They create the procedure, maybe don't notice
the extra resultset from bar (or shrug it off), and then wonder why they only
get results from foo when they run the procedure. If they had done this:
CREATE PROCEDURE dbo.ProcedureA
AS BEGIN SELECT * FROM foo; GO SELECT * FROM bar; END GO |
Because GO is not a T-SQL keyword but rather
a batch separator for tools like Query Analyzer and SSMS, they would have
received these error messages, one from each batch:
Msg 102, Level 15, State 1, Procedure ProcedureA, Line 4
Incorrect syntax near ';'. Msg 102, Level 15, State 1, Line 2 Incorrect syntax near 'END'. |
Yes, errors are bad, and all that, but I
would rather have this brought to my face when I try to compile the procedure,
then later on when the first user tries to call it.
======================
Using statement terminators
I have quickly adapted to the habit of ending
all statements with proper statement terminators (;). This was always a habit
in languages like JavaScript (where it is optional) and C# (where it is not).
But as T-SQL gets more and more extensions (e.g. CTEs) that require it, I see
it becoming a requirement eventually. Maybe I won't even be working with SQL
Server by the time that happens, but if I am, I'll be ready. It's one extra
keystroke and guarantees that my code will be forward-compatible.
======================
Using SET NOCOUNT ON
I always add SET NOCOUNT ON; as the very
first line of the procedure (after BEGIN of course). This prevents DONE_IN_PROC
messages from needlessly being sent back to the client after every row-affecting
statement, which increases network traffic and in many cases can fool
applications into believing there is an additional recordset available for
consumption.
NOTE
I do not advocate blindly throwing SET NOCOUNT ON into all of your existing stored procedures. If you have existing applications they might actually already be working around the "extra recordset" problem, or there may be .NET applications that are using its result. If you code with SET NOCOUNT ON from the start, and keep track of rows affected in output parameters when necessary, this should never be an issue. Roy Ashbrook got beat up about this topic at a Tampa code camp last summer, and wrote about it here.
I do not advocate blindly throwing SET NOCOUNT ON into all of your existing stored procedures. If you have existing applications they might actually already be working around the "extra recordset" problem, or there may be .NET applications that are using its result. If you code with SET NOCOUNT ON from the start, and keep track of rows affected in output parameters when necessary, this should never be an issue. Roy Ashbrook got beat up about this topic at a Tampa code camp last summer, and wrote about it here.
======================
Using local variables
When possible, I always use a single DECLARE
statement to initialize all of my local variables. Similarly, I try to use a
single SELECT to apply values to those variables that are being used like local
constants. I see code
like this:
declare @foo int
declare @bar int declare @x int set @foo = 5 set @bar = 6 set @x = -1 |
And then some more declare and set statements
later on in the code. I find it much harder to track down variables in longer
and more complex procedures when the declaration and/or assignments can happen
anywhere... I would much rather have as much of this as possible occurring in
the beginning of the code. So for the above I would rather see:
DECLARE
@foo INT, @bar INT, @x INT; SELECT @foo = 5, @bar = 6, @x = -1; |
As a bonus, in SQL Server 2008, the syntax
now supports changing the above into a single statement:
DECLARE
@foo INT = 5, @bar INT = 6, @x INT = -1; |
So much nicer. However, it still leaves a lot
to be desired: I also always use meaningful variables names, rather than @i,
@x, etc.
Also, some people like listing the commas at
the beginning of each new line, e.g.:
DECLARE
@foo INT = 5 ,@bar INT = 6 ,@x INT = -1; |
Not just in variable declarations, but also
in parameter lists, columns lists, etc. While I will agree that this makes it
easier to comment out individual lines in single steps, I find the readability
suffers greatly.
======================
Using table aliases
I use aliases a lot. Nobody wants to read
(never mind type) this, even though I have seen *many* examples of it posted to
the public SQL Server newsgroups:
SELECT
dbo.table_X_with_long_name.column1, dbo.table_X_with_long_name.column2, dbo.table_X_with_long_name.column3, dbo.table_X_with_long_name.column4, dbo.table_X_with_long_name.column5, dbo.table_H_with_long_name.column1, dbo.table_H_with_long_name.column2, dbo.table_H_with_long_name.column3, dbo.table_H_with_long_name.column4 FROM dbo.table_X_with_long_name INNER JOIN dbo.table_H_with_long_name ON dbo.table_X_with_long_name.column1 = dbo.table_H_with_long_name.column1 OR dbo.table_X_with_long_name.column1 = dbo.table_H_with_long_name.column1 OR dbo.table_X_with_long_name.column1 = dbo.table_H_with_long_name.column1 WHERE dbo.table_X_with_long_name.column1 >= 5 AND dbo.table_X_with_long_name.column1 < 10; |
But as long as you alias sensibly, you can
make this a much more readable query:
SELECT
X.column1, X.column2, X.column3, X.column4, X.column5, H.column1, H.column2, H.column3, H.column4 FROM dbo.table_X_with_long_name AS X INNER JOIN dbo.table_H_with_long_name AS H ON X.column1 = H.column1 OR X.column2 = H.column2 OR X.column3 = H.column3 WHERE X.column1 >= 5 AND X.column1 < 10; |
The "AS" when aliasing tables is
optional; I have been trying very hard to make myself use it (only because the
standard defines it that way). When writing multi-table queries, I don't give
tables meaningless shorthand like a, b, c or t1, t2, t3. This might fly for
simple queries, but if the query becomes more complex, you will regret it when
you have to go back and edit it.
======================
Using column aliases
I buck against the trend here. A lot of
people prefer to alias expressions / columns using this syntax:
SELECT [column
expression] AS alias
|
I much
prefer:
SELECT alias = [column expression]
|
The reason is that all of my column names are
listed down the left hand side of the column list, instead of being at the end.
It is much easier to scan column names when they are vertically aligned.
In addition, I always use column aliases for
expressions, even if right now I don't need to reference the column by an
alias. This prevents me from having to deal with multiple errors should I ever
need to move the query into a subquery, or cte, or derived table, etc.
======================
Using consistent formatting
I am very fussy (some co-workers use a
different word) about formatting. I like my queries to be consistently readable
and laid out in a predictable way. So for a join that includes a CTE and a
subquery, this is how it would look:
WITH cte AS
( SELECT t.col1, t.col2, t.col3 FROM dbo.sometable AS t ) SELECT cte.col1, cte.col2, cte.col3, c.col4 FROM cte INNER JOIN dbo.Customers AS c ON c.CustomerID = cte.col1 WHERE EXISTS ( SELECT 1 FROM dbo.Orders o WHERE o.CustomerID = c.CustomerID ) AND c.Status = 'LIVE'; |
Keeping all of the columns in a nice vertical
line, and visually separating each table in the join and each where clause.
Inside a subquery or derived table, I am less strict about the visual
separation, though I still put each fundamental portion on its own line. And I
always use SELECT 1 in this type of EXISTS() clause, instead of SELECT * or
SELECT COUNT(*), to make it immediately clear to others that the query inside
does NOT retrieve data.
======================
Matching case of underlying objects / columns
I always try to match the case of the
underlying object, as I can never be too certain that my application will
always be on a case-sensitive collation. Going back and correcting the case
throughout all of my modules will be a royal pain, at best. This is much easier
if you are using SQL Server 2008 Management Studio against a SQL Server 2008
instance, or have invested in Red-Gate's SQL Prompt, as you will automatically
get the correct case when selecting from the auto-complete list.
======================
Qualifying column names with table/alias
prefix
I always qualify column names when there is
more than one table in the query. Heck, sometimes I even use aliases when there
is only one table in the query, to ease my maintenance later should the query
become more complex. I won't harp on this too much, as fellow MVP Alex
Kuznetsov treated this subject a few days ago.
======================
Using RETURN and OUTPUT appropriately
I never use RETURN to provide any data back
to the client (e.g. the SCOPE_IDENTITY() value or @@ROWCOUNT). This should be
used exclusively for returning stored procedure status, such as ERROR_NUMBER()
/ @@ERROR. If you need to return data to the caller, use a resultset or an
OUTPUT parameter.
======================
Avoiding keyword shorthands
I always use full keywords as opposed to
their shorthand equivalents. "BEGIN TRAN" and "CREATE PROC"
might save me a few keystrokes, and I'm sure the shorthand equivalents are here
to stay, but something just doesn't feel right about it. Same with the
parameters for built-in functions like DATEDIFF(), DATEADD() and DATEPART().
Why use WK or DW when you can use WEEK or WEEKDAY? (I also never understood why
WEEKDAY become DW in shorthand, instead of WD, which is not supported. DW
likely means DAYOFWEEK but that is an ODBC function and not supported directly
in T-SQL at all. That in and of itself convinced me that it is better to take
the expensive hit of typing five extra characters to be explicit and clear.)
Finally, I always explicitly say "INNER JOIN or "LEFT OUTER
JOIN"... never just "join" or "left join." Again, no
real good reason behind that, just habit.
======================
Using parentheses liberally around AND / OR
blocks
I always group my clauses when mixing AND and
OR. Leaving it up to the optimizer to determine what "x=5 AND y = 4 OR b =
3" really means is not my cup of tea. I wrote a very short article about this a few years ago.
======================
So, after all of that, given the procedure I
listed at the start of the article, what would I end up with? Assuming I am
using SQL Server 2008, and that I can update the calling application to use the
right procedure name, to use sensible input parameter names, and to stop using
return values instead of output parameters:
CREATE PROCEDURE dbo.Customer_GetOlderOrders
@OrderStatus INT, @MaxOrderStatus INT = NULL, @OrderDate SMALLDATETIME, @RC1 INT OUTPUT, @RC2 INT OUTPUT AS BEGIN SET NOCOUNT ON; DECLARE @City VARCHAR(32) = 'Beverly', @State CHAR (2) = 'MA', @MinOrderCount INT = 5; SELECT c.CustomerID, c.FirstName, c.LastName, c.OrderDate FROM dbo.Customers c INNER JOIN dbo.Orders o ON c.CustomerID = o.CustomerID WHERE ( o.OrderStatus = @OrderStatus OR o.OrderStatus <= @MaxOrderStatus ) AND o.OrderDate <= @MaxOrderDate; SET @RC1 = @@ROWCOUNT; SELECT c.CustomerID, OrderCount = COUNT(*) FROM dbo.Customers c LEFT OUTER JOIN dbo.Orders o ON c.CustomerID = o.CustomerID WHERE c.City = @City AND c.State = @State GROUP BY c.CustomerID HAVING COUNT(*) >= @MinOrderCount; SET @RC2 = @@ROWCOUNT; RETURN; END GO |
Okay, so it LOOKS like a lot more code,
because the layout is more vertical. But you tell me. Copy both procedures to
SSMS or Query Analyzer, and which one is easier to read / understand? And is it
worth the three minutes it took me to convert the original query? It took me a
few hours to convert this list from my subconscious to you, so hopefully I have
helped you pick up at least one good habit. And if you think any of these are
BAD habits, please drop a line and let me know why!
Source:
No comments:
Post a Comment