9 Tips for Faster SQL Server Applications

Last Updated 5 years ago

Wanna make your SQL Server applications go faster? Here are 9 simple tips that will give you a better chance for success. They’re not carved in stone, but they’re a good place to start.

Query Design Tips

1. Code for readability first. The easier your query is for humans to understand, the more likely SQL Server will be able to understand what you’re trying to do, and the better execution plan you’re likely to get.

2. Avoid scalar user-defined functions. Yes, functions are great for reusable app code, but SQL Server scalar functions are notoriously slow. They run row-by-row against your result set (possibly much more than that, depending on the query), and queries that call scalar functions are single-threaded, regardless of how much work they’re doing. Your functions might go faster in SQL Server 2019, but until then, don’t gamble.

3. Avoid table variables (@tablename). They sound like they’re going to be faster than temp tables, but they get bad row estimates, which drives the rest of your query to get a bad execution plan instead. If you need to store data somewhere temporarily, embrace temp tables (#mytable.) They’re the better of those two choices.

4. Never take user input to build a query. Crafty users will bypass your security using a technique called SQL injection. I know, you think you’ve built in a lot of security, but your code has to be perfectly bug-free every time. The hacker only has to be right once.

When you’re ready to learn more about query design, pick up the book T-SQL Fundamentals by Itzik Ben-Gan. The title sounds deceivingly easy, but this book will take you through several years’ worth of learning, and is great for veterans as well as beginners.

Table Design Tips

1. Start every table with a clustered index on an identity. If you’re just getting started with table design, use this tip to avoid problems with empty pages and forwarded fetches. Here’s an example of how to create a table with this ID built for you:

CREATE TABLE dbo.Customers (
  CustomerId INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
  FirstName VARCHAR(50)
);

CREATE TABLE dbo.Customers (

CustomerId INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

FirstName VARCHAR(50)

);

There are exceptions to the clustered index rule. If you need to generate unique values client-side, you can use GUIDs. If you have a staging table that is loaded from scratch and truncated every day, then a table without a clustered index (a heap) can be fine. However, if you’re not sure about what that means, just stick with a clustered index on an identity as a starting point.

2. Avoid NVARCHAR fields unless you really need ’em. The difference between NVARCHAR and VARCHAR fields is that NVARCHAR can store Unicode, but NVARCHAR requires twice as much storage space (even if you’re not using Unicode.)

3. Avoid MAX datatypes unless you really need ’em. You can use VARCHAR(MAX) and NVARCHAR(MAX) to store up to 2GB of data, but that comes at a performance price later when it’s time to index that data or compare it to other columns. You’re better off naming your length, say VARCHAR(50) or NVARCHAR(1000) based on the longest data you’ll put in the column. Don’t go overboard: SQL Server bases its memory grants in part based on the column definition. Oversize your column, and you’ll oversize your memory grant – which sounds good until you hit concurrency issues.

4. Recommended indexes are not smart, nor in order. When you see index recommendations that would supposedly make your query run faster, they’re generated by the modern equivalent of Clippy, the Office assistant. They’re better than nothing, but they’re rarely accurate. Think of them as nothing more than a hint that it’s time to consider hand crafting an index, and I teach you how in Fundamentals of Index Tuning.

5. Avoid storing files in the database. Store files in a filesystem, and in the database, just store a pointer to the file (either its http location or UNC path.) This keeps your database size lean and mean, lowers your licensing costs (because performance requirements will be lower), and makes your backups & restores faster.

When you’re ready to learn more about table design, pick up the book Pro SQL Server Relational Database Design by Louis Davidson and Jessica Moss.

Follow those tips, and your app has a much better chance of scaling.

Does the Rowmodctr Update for Non-Updating Updates?

My classes have really sucked lately, and I apologize.

71 Comments. Leave new

Pat Paulson
May 21, 2019 9:43 am

Waiting for Identity vs. natural PK debate…..

Reply
- Keith
  May 21, 2019 10:09 am
  
  when in doubt about using a natural key use a surrogate key.
  
  when absolutely certain about using a natural key, use a surrogate key.
  
  Reply
- Steve
  May 21, 2019 11:44 am
  
  You’ll notice one important point in that tip. Brent said clustered index, not primary key. Those are two very different things. You can have a clustered index that’s different than your primary key. Now, having said that, I could make an argument that your primary key makes a good clustered index. If you’re looking up rows based on your primary key, and you need most or all of your data in that row, and your clustered index is a primary key, then you either need a wide covering index or you have to have a lot of key lookups to satisfy those queries.
  
  Reply
  - Steve
    May 21, 2019 11:46 am
    
    haha… I meant if your clustered index is NOTyour primary key.
    
    Reply
tim
May 21, 2019 10:52 am

so if my developers app loves functions and table variables I am in trouble? yeah already realize this

Reply
- Silviu
  May 22, 2019 4:05 am
  
  It could be worse, they could also love nested views.
  
  Reply
  - DiscoBob
    May 22, 2019 5:50 am
    
    Nested views, functions and table variables are in use in the place that I am currently. Do I win anything??? 🙂
    
    Reply
    - Brandon M.
      May 22, 2019 6:00 am
      
      Job security and a drinking problem.
      
      Reply
      - DiscoBob
        May 22, 2019 7:09 am
        
        😀
        Funny that – I am handing my notice in next week!!!!
  - Keith
    May 24, 2019 8:49 am
    
    My favorite so far is a view from a third-party vendor that has four levels of view nesting, and at each level they may join views, tables, functions, computed columns, and some of the views contain subqueries for other views or tables. The view is more than 200 columns wide and the execution plan exceeds 128 steps so the query optimizer just gives up. The biggest table in the database is only about 5 million records, but I have a strong suspicion that the view may be causing multiple table scans against the same tables. Trying to query a date range wider than about 2 months almost always results in a stuck query that requires starting the instance. Everything I could see was also set up with the nolock hint – which I don’t think the query would run at all without it, but also returns a massive number of ghost records for any data that was recently created, as it tends to be updated several dozen times in the first 3-4 hours and in general is pretty active for the first day, with some sporadic updates usually happening for about the next 5 days.
    
    Views are also the only supported way to get any data out of the application, as the vendor fairly regularly updates the underlying * and the views are the only way to make sure the data is correct.
    
    Reply
    - Doug
      May 24, 2019 9:24 am
      
      Dang, you found it! 🙂
      
      Reply
      - Keith
        May 24, 2019 4:09 pm
        
        I used to have an “ETL” that used a view that was nested….within itself. didn’t even know it was possible, but I guess as long as you don’t select that column, which pointed to another column in the view, the query optimizer just ignores it! It works!
Nadav
May 21, 2019 11:01 am

If I store references to files in my database, then they are part of my DATA and should be backed up & restored along with my database. How do I do that and will my backups still be faster if you take into account the time it takes to backup the files?

Reply
- Brent Ozar
  May 21, 2019 11:13 am
  
  Nadav – you back up the files separately using file system backups & replication.
  
  Reply
  - Connor
    May 22, 2019 5:45 am
    
    Just curious – does that open up the possibility of inconsistency between the metadata (in the database) and the files in the event of restore (ie, backups taken at slightly different times)? Are there backup mechanisms that can ensure that consistency?
    
    Reply
    - Doug
      May 22, 2019 6:25 am
      
      The metadata in the database points to the data. Unless you do a backup of the file system data and restore it to a different location, everything should stay in sync.
      
      Reply
    - Phillip
      March 31, 2020 10:32 am
      
      Conner is right. Backing up content separately from metadata does create the possibility of an inconsistent restore.
      
      Most people ignore this restore consistency problem.
      
      The easiest solution to the problem is to take a cold backup, with the content management system shut down until both the content and the metadata are backed up. If you’re sure you want to do hot backups — and nobody really wants downtime for backups — products do exist that claim to make this work. One prominent example is CYA for Documentum. (CYA does mean Cover Your A**.)
      
      I’ve never used CYA. The storage management folks responsible for backup are not terribly interested in exploring boutique backup and recovery products, or in listening to me carry on about this consistency problem.
      
      Storing content in the database makes life hard for the DBA, but it does make for consistent backups.
      
      Reply
Doug
May 21, 2019 12:03 pm

I worked for a company that used the “EVERY table must have an identity column that is indexed.” In most cases it was also the PK.
CREATE TABLE dbo.Gender (ID BIGINT IDENTITY(1,1), GenderCode CHAR(1) NOT NULL, GenderDescription VARCHAR(20) NOT NULL).
ID was the PK and a person’s gender was either 1,2 or 3 (Unknow). ‘M, ‘F’ and ‘U’ would have been so much easier. Francis Smith was a 2 and lived in the state of 43.

Reply
- Brent Ozar
  May 21, 2019 5:36 pm
  
  Doug: https://abcnews.go.com/blogs/headlines/2014/02/heres-a-list-of-58-gender-options-for-facebook-users/
  
  Reply
  - Doug
    May 22, 2019 6:29 am
    
    All of the places I’ve every worked (and I don’t see it changing) have basically two genders and either ‘U’ or NULL for unknown. Social media is a whole other universe and I don’t plan on traveling there any time soon.
    
    Reply
    - Brent Ozar
      May 23, 2019 1:17 am
      
      Doug – OK, cool. I hope you get the chance in the future to work for more open-minded organizations though. It can be a great, eye-opening experience.
      
      Reply
- Keith
  May 21, 2019 6:18 pm
  
  I would prefer that to the mess I have now where employeeIDs (which never change and are never reused) are the primary key, but also get typed in incorrectly from a cloud ERP system that doesnt check for that, on a pretty regular basis. Or on ticket numbers which are globally unique and never wrong, but sometimes will temporarily ETL over from the dispatch system duplicated as the record is updated and the awesomely complex views created by the vendor create duplicates until the updates settle down – or when daylight savings or a errant system time change happens and the app has to figure out how to create correct ticket numbers for a few minutes.
  
  I’d much rather have ridiculous normalization like that than ever have to risk dealing with something you never imagined could be a problem, and being the one of the people that have to deal with it. And if you are reporting on it, you really should have a de-normalized table that contains everything you need and doesn’t need to do any lookups or case statements to translate lookup codes into the data.
  
  Even in your case – I mean it seems ridiculous at first that all that would be set up for a three record table, but look at Brent’s link – facebook has 58 genders now. Would you set up a gender table in 2004 with single letter gender codes as the key, if you knew that 15 years later you would have to update 2.5 billion [dbo].[person] records because there are now more genders than there are letters in the alphabet? What about in 5 years when there is 75?
  
  I’m really not that good or practiced at data design, but every day I fight data design ideas that were valid choices and not unreasonable to believe would be good forever in the early 2000s or late 90s, but now give me a daily headache and present fairly regular data quality challenges.
  
  Reply
Pat Paulson
May 21, 2019 1:23 pm

I had to figure out a third party’s database (supposedly a “partner”) with no data dictionary, and they also used identities for everything. A row on one table looked like 4, 15, 499, 444, 19, 2, 54, 98. This was timesheet/payroll data which we were supposed to integrate with. It took forever to backtrack to find the actual values.

Reply
Marlon Ribunal
May 21, 2019 3:03 pm

I and a teammate had a short discussion yesterday that agreed to Item #2 above, Avoid scalar functions. It didn’t help that the app is highly concurrent because, as Brent said above, “queries that call scalar functions are single-threaded.” 😉

Thanks, Brent!

Reply
speshuric
May 21, 2019 3:41 pm

“Avoid NVARCHAR fields unless you really need ’em.”
It’s a very controversial recommendation in the XXI century
– Almost all user-readable strings should be UNICODE now. Store it in one-byte varchar is really bad and dangerous. All non-english users will curse such DB-designer.
– TSQL’s sysname based on NVARCHAR, C# allow some non-ANSI symbols in identifiers, file name can contain non-ANSI symbols. So “technical” strings should be NVARCHAR too.
– In some collations implicit conversion between varchar/nvarchar can cause index scans. One example (but not the only) is here: https://www.sqlskills.com/blogs/jonathan/implicit-conversions-that-cause-index-scans/
– Moreover, this recommendation itself contains symbol ” ’ ” – this symbol, if I remember correctly, broke my import in MS Access in 2001 year (it’s not standard ascii 7 bit symbol, so one-byte encoding depends on collation and MS Access depends on default encoding). Anyway, quotation is not so simple: https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html

Reply
- Brent Ozar
  May 21, 2019 5:39 pm
  
  Speshuric – if you want to take up twice the space for data that doesn’t need it, that sounds more controversial to me, but best of luck with your work.
  
  Reply
- brian
  May 22, 2019 4:45 am
  
  Wikipedia says 60% of all websites use UTF-8 for international characters.
  NVARCHAR supports UTF-16 but Unicode has grown to need more that 65536 code points.
  
  Reply
  - Doug
    May 22, 2019 5:52 am
    
    I suppose the rule of thumb should be “what is the possibility that this column will need to contain non-ASCII data?” If I am creating a database to contain an inventory of products, customers and sales, and I am strictly U.S. based, chances are, I won’t need anything that is non-ASCII. Even if I expand to French and Spanish names, the various diacritical marks are typically available in ASCII character sets.
    If I am going to be mailing to China or Taiwan, names and addresses are typically going to be ‘Americanized.’ I’m not sure how the U.S. Postal Service would handle a mail-piece that was addressed as:
    ??????????
    43 Hezuo St.
    ??????420
    ???????
    Maybe they can handle it, but I’m certainly not, as a business going to store it as such. I use the English equivalent:
    Richard and Mary Williamson
    43 Hezuo St.
    Taichung City, Fengyuan District, 420
    Taiwan, R.O.C.
    
    I worked for a company where EVERY character column was defined as NVARCHAR(). I was tasked with converting the database to ASCII, not a small feat, but in the end, reduced the database size by 45%. Not to mention that we were a LARGE ETL shop, importing gigabytes of data daily. That data was presented to us as ASCII flat files. The reduction in network traffic alone was phenomenal and ETL processes started running considerably faster, since we’d import ASCII data into NVARCHAR columns in a staging database, then transform and load into NVARCHAR columns in the production databases.
    My take is, use ASCII (CHAR/VARCHAR) unless you have a solid, justifiable reason for using Unicode (NCHAR/NVARCHAR).
    
    Reply
    - Doug
      May 22, 2019 5:53 am
      
      Looks like comments don’t accept Unicode 🙂
      
      Reply
    - brian
      May 23, 2019 6:07 am
      
      Unicode defines about 1,000,000 codes points.
      UTF-8, UTF-16, UTF-32 (BE & LE) are encodings for Unicode that can all represent all code points.
      UTF-8 can store Unicode in a VARCHAR column.
      
      Reply
brian
May 21, 2019 5:56 pm

We just did some bench marking on SQL insert methods from C++ / ODBC.
For our data, say inserting 1000 rows using 1000 SP calls in one transaction takes 1 second.
Then binding an array of 10 and calling same SP 100 times, takes 0.5 seconds.
And using the Bulk Copy (BCP) C interface takes 0.1 seconds.

Your mileage may vary.

Reply
- Brent Ozar
  May 21, 2019 5:58 pm
  
  Brian – yeah, when you have a database, it’s amazing how much faster they work in sets rather than one row at a time. I see folks learning that lesson all the time. Very good on you for benchmarking it! That’s a good sign. If I’ve long thought about having a Comment of the Day Award, and if I had one, I’d give it to you today. More people should try benchmarking.
  
  Reply
ken j ambrose
May 21, 2019 7:32 pm

I _always_ repeat _always_ use both a surrogate ID key and a natural (usually compound) unique business key in every table I design. No exceptions.
You would not believe how much I have learned from the business side of the team when I demand we define a unique business key.
You would not believe how many times we have prevented nonsense duplicates from being created when there is a business key (which of course a surrogate key will often not be able to do)

Reply
DiscoBob
May 22, 2019 5:54 am

Further to my comment above – here is another issue I feel – I may be wrong but why o why does somebody design a table that has a clustered index not defined as the primary key – and the primary key is a non-clustered index on the same columns as the clustered index.
For the life of me I can’t figure out that design strategy – am I missing something???

Reply
- Doug
  May 22, 2019 6:32 am
  
  DiscoBob,
  I don’t think you are missing anything but apparently the person who created that table is short a brick or two.
  
  Reply
  - DiscoBob
    May 22, 2019 7:05 am
    
    😀 😀
    and it is the plural of table 🙁
    
    Reply
- Keith
  May 22, 2019 9:19 am
  
  DiscoBob bad data design. The caveat on using clustered indexes on a table where you may be looking up a certain column more than the identity key column I cannot think of any reason for being other than bad data design. If it’s for reporting, write an etl to create a denormalized table for what you need and put your clustered index wherever you like. Otherwise I just cant think of any good reason why this would ever be necessary if your database is properly normalized.
  
  Reply
Gaby A.
May 22, 2019 6:32 am

Coming from a company that decided to move away from storing files in separate directories and move to FILESTREAM to improve performance, I’m with you. Suddenly, the companies we were hosting for had their database backups blow up overnight with the version upgrade of the product. It’s supposed to be faster because it’s using less resources outside of the SQL Server pool, but I don’t think the developers thought it through regarding us hosting a few thousand DB’s on one server. One client who uploads or downloads lots of documents could potentially affect other clients who don’t do near as much.

Reply
Geert-Jan Thomas
May 22, 2019 6:35 am

When using a GUID as primary key DON’T use a clustered index on that PK, guids are not generated in order so inserting on large tables will be snow sure to index updates.

Reply
- Gaby A.
  May 22, 2019 6:39 am
  
  If you can get ahead of it early on in the design phase, sequential GUID’s are a good solution. That’s easier if the GUID is generated by the table with newsequentialid()…But would be more complicated coming from multiple application servers inserting generated GUIDs from that side.
  
  Reply
Andrius
May 22, 2019 8:14 am

Table variables vs. Temp tables is a topic with both sides having positives and negatives thus you cant’t state one is a better option. You can list the cases where one is better and where other.

Reply
- Brent Ozar
  May 22, 2019 8:15 am
  
  Andrius – sure, I’m interested. Go for it – have you got a case & demo showing that table variables are faster than temp tables?
  
  Reply
  - Andrius
    May 22, 2019 10:34 am
    
    I hope this example gives an idea that with small data variable tables are much faster to work with:
    
    DECLARE @i int = 0
    
    DECLARE @start datetime = GETDATE()
    
    WHILE (@i < 10000)
    BEGIN
    DECLARE @table TABLE (id uniqueidentifier)
    DELETE FROM @table
    
    INSERT INTO @table SELECT NEWID()
    
    SET @i = @i + 1
    END
    
    SELECT DATEDIFF (MILLISECOND, @start, GETDATE()), 'variable table'
    
    SET @i = 0
    SET @start = GETDATE()
    
    WHILE (@i < 10000)
    BEGIN
    DROP TABLE IF EXISTS #temp
    
    SELECT NEWID() id INTO #temp
    
    SET @i = @i + 1
    END
    
    SELECT DATEDIFF (MILLISECOND, @start, GETDATE()), 'temporary table'
    
    On my machine it is varying from 3 to 9 times faster depending on iteration count. I hope you get the idea as this example is pretty trivial.
    
    Furthermore, you cannot work with temporary tables when it comes to functions (I cannot recall if the same applies with procedures), but you can pass user defined table variables to them. (it's a different topic from performance, but I thought I might mention this too. In some cases that's the only way.
    
    Reply
    - Keith
      May 22, 2019 10:44 am
      
      Why not use a cte for that?
      
      Reply
    - Steve
      May 22, 2019 11:04 am
      
      Interesting, but those are not equivalent operations. All you’ve proved is that creating and dropping a table is expensive.
      try altering your procedure so that you declare the temp table once (outside of your loop) and then do the delete and insert like you’re doing for your table variable.
      On my system, the temp table is faster than the table variable.
      DECLARE @i int = 0
      
      DECLARE @start datetime = GETDATE()
      
      WHILE (@i < 10000)
      BEGIN
      DECLARE @table TABLE (id uniqueidentifier)
      DELETE FROM @table
      
      INSERT INTO @table SELECT NEWID()
      
      SET @i = @i + 1
      END
      
      SELECT DATEDIFF (MILLISECOND, @start, GETDATE()), 'variable table'
      
      SET @i = 0
      SET @start = GETDATE()
      create table #table (id uniqueidentifier)
      
      WHILE (@i < 10000)
      BEGIN
      delete from #table
      insert into #table SELECT NEWID()
      
      SET @i = @i + 1
      END
      
      SELECT DATEDIFF (MILLISECOND, @start, GETDATE()), 'temporary table'
      
      Reply
      - Andrius
        May 22, 2019 11:20 am
        
        That is exactly what I was tring to prove: when you are dealing with small amount of data, variable tables are superior. Your example proves the opposite argument – one I never disagreed with 🙂
Dragan
May 22, 2019 10:34 am

Nvarchar tip can be replaced with use Unicode compression.

Reply
- Bruno Martínez
  May 26, 2019 2:02 pm
  
  Latest versions support utf 8, also.
  
  Reply
  - Brent Ozar
    May 26, 2019 6:16 pm
    
    By latest versions, do you mean SQL Server 2019? The one that isn’t available for sale yet?
    
    Reply
K
May 22, 2019 1:27 pm

I disagree strongly with ‘Start every table with a clustered index on an identity’. My advice is to start with the natural key as the clustered PK. It is usually easy to tell when to move to a surrogate key in a design.
I am not against identity columns as the PK where needed – the customer table example shown makes sense because there is probably not a natural key anyway.
But I’ve worked two different places where people used identity columns for most tables and killed performance – even though they had small amounts of data (Less than a 100,000 records in most tables). These identity columns were never used in any ‘where’ clause. The natural key was used and most of these tables didn’t have an index on the natural key in any case. Adding an index on the natural key helped but it was much faster to have the natural key as the primary clustered index. I’ve also wondered why anyone would assign an identity to data with a standard code instead of using the standard code such as a state or country table.
I’ve also worked with billion record tables where I am so glad we designed the identity column as the PK.
This is where thinking about the design is critical.
When I’ve asked people why they used an identity column they usually say ‘I thought I was supposed to because all the other tables are like this’. They generally don’t know the difference between a clustered and non-clustered index either.
So of course, my real advice is to understand when natural keys make sense vs. surrogate keys. You might make a mistake but you are way more likely to make a mistake by not thinking about it.

Reply
- Brent Ozar
  May 23, 2019 1:19 am
  
  K – unfortunately, sooner or later, you’ll find yourself in situations where the business initially told you that their natural key would be unique, and then it isn’t.
  
  Reply
  - K
    May 23, 2019 7:32 am
    
    I’ve had that happen but only rarely.
    Reviewing the table design really helps avoid that.
    (why do companies let junior people create tables with no oversight?)
    Often it’s an easy fix as well.
    I think an occasional PK issue is better than having poor performance in lots of queries.
    I’m talking from many many years of experience. ( enough that I don’t want to say how many 🙂 )
    Good side topic – what ‘fun’ things have people had to fix when the company changed things no one expected?
    
    Reply
    - Keith
      May 23, 2019 8:27 am
      
      In your country codes natural key, is Zimbabwe ZW? or is it RA for Rhodesia?
      
      What is the country code for Myanmar? Is it MM, or is it BM for Burma?
      
      What about Sri Lanka/Ceylon? Or Zaire/Democratic Republic of Congo? Absyinia/Ethiopia? Bangladesh/East Pakistan?
      
      Yugoslavia?
      
      I have a table that uses EmployeeID as a natural key. It get’s typed in, incorrectly all the time. It’s an absolute mess when a user that has left for several years, returns and is issued a new ID. Or once a few years ago, and employeeID was typed in wrong by HR – it was an otherwise valid EmployeeID, but way into the future for which the ID should have be issued and didn’t get noticed until it was time to issue that ID to a new employee a few years later. I’ve had social security numbers changed that were used for a natural key, following identity theft. I’ve had ticket numbers which are generated based on time and date get duplicated during daylight savings or incorrect system time changes (third-party app that stores it in local time instead of UTC).
      
      I’m really skeptical that using a surrogate key is going to be noticeably slower. If you need to make an update, you do an index seek on the surrogate key and update the one column exactly where it is needed. If you make an insert, SQL knows exactly where to insert to record. If you are doing reporting on it, you should be de-normalizing the data first. If the data needs to be live – well then data integrity is going to be important enough that you shouldn’t be risking data integrity by using natural keys.
      
      Reply
      - Keith
        May 23, 2019 8:35 am
        
        I haven’t had enough coffee yet, sorry for my bad typing.
      - ken j ambrose
        May 23, 2019 8:36 am
        
        great examples! Any design that presumes that the business data will provide a permanent unqiue key that the business will never need to change will eventually be proven wrong. Usually sooner rather than later…
      - Doug
        May 23, 2019 8:43 am
        
        Well, for one thing, there are standard ISO country codes, standard State codes for the U.S. I’m certain other countries have standard abbreviations for their states/provinces or whatever they call them. Anyone who uses MR as an abbreviation for Missouri, or UN as an abbreviation for United States of America should be taken outside and flogged.
        I think everyone here agrees there are times when you use a surrogate key and your case of an employee table is probably a very good example. I’ve never worked at a company that uses anything other than a system generated ID as an employee ID, although I have worked for companies that pretty much hide it (i.e. the employee doesn’t know their ID because internally communication is done by name). From a programming perspective, that one was a nightmare.
        
        To answer your question, in the examples you provided above, the country codes are still valid, but you shouldn’t be using RA or MM any more, since they are now ZW/BM. Why not maintain both for historical purposes. If it was me, I’d have an IsActive column in the table and RA/MM would be set to 0. They could still be used for historical purposes, but not available for new entries. Yu could even update the descriptions to something like “Rhodesia (i.e. Zimbabwe).”
        I guess it depends on how pro-active (or lazy) you want to be.
    - Brent Ozar
      May 23, 2019 6:49 pm
      
      K – great, glad you’ve got many years of experience. Sounds like you’ve got it all under control. Take care!
      
      Reply
Pat Paulson
May 22, 2019 2:46 pm

I worked somewhere where they used identity, and the application required much configuration and testing. So that was done on a test system – but the identities did not match between test and prod for pretty much every table, so that was a mess.

Reply
Harsha
May 22, 2019 10:02 pm

Clustered index on Guid which generates random values ?????

Also a clustered index physically sorts the table every time a random guid is created and this might lead to performance problems . Correct me if I interpret it wrong

Reply
- Harsha
  May 22, 2019 10:06 pm
  
  Following up on my previous comment , you can test this poor behavior of having a clustered index on Guid . You see more scanpoint revalidations per sec , as a result of having CI on Guid . Every time sql server is doing a range scan on a page by having a latch , a new Guid insert and boom , it has to scan the page again
  
  Reply
- Jeff Moden
  May 30, 2019 2:16 pm
  
  You’ve interpreted it incorrectly. First, NOTHING in SQL “physically sorts the table” during INSERTs. The closest it comes is to put the rows on the correct pages according to the order of the clustered index and, if affected, the non-clustered index. The table is not sorted. Just the data being inserted.
  
  Shifting gears a bit, while I agree that use of ANY GUID (random or sequential) is usually a terrible choice for many reasons, fragmentation and page splits shouldn’t actually be included. The reason why they DO APPEAR to be such a problem is because most people don’t actually know the correct method to maintain indexes that are based on GUIDs. It even tells you in BOL (thank you Paul Randal) under sys.dm_db_index_physical_stats how to maintain them and other “evenly distributed indexes” but almost no one even reads that. You can build random GUID keyed indexes that can go for months of insertions and NEVER see a page split (good or bad) until just before it’s time to rebuild them. And, yes… I have a test that proves the problem of not maintaining them correctly for multiple fill factors and how 2 simple changes can virtually eliminate page splits and fragmentation on GUID keyed indexes. 1. NEVER use REORGANIZE on a GUID index unless you need to recover LOB space and then don’t count on that working so well. It should only be done on a special one-off case. ALWAYS use REBUILD instead. 2. Stop waiting for damage to occur. You want to do the REBUILD when you’re logical fragmentation just starts to go over 1%.
  
  Yeah… I know. It goes against what everyone mistakenly calls a “Best Practice”. Even Paul Randal says that his 10% Reorg/30% Rebuild numbers were due to MS plaguing him for some sort of standard. Do a deep read on sys.dm_db_index_physical stats and looks for methods “hidden in plain sight” that he talks about.
  
  Random GUIDs are bad for a lot of reasons but page splits and fragmentation aren’t the reasons if you do the maintenance correctly.
  
  Reply
Raffaele
May 23, 2019 1:07 am

Cast and Convert are scalar functions ? IF no , is the same if I use a scalar function with only a cast instruction?

This is the function :

CREATE FUNCTION [dbo].[FromUtcToItalianDate]
(
@d datetime
)
RETURNS datetime
AS
BEGIN

RETURN CONVERT(DATETIME,@d AT TIME ZONE ‘UTC’ AT TIME ZONE ‘W. Europe Standard Time’)

END

thanks for reply

Reply
- Brent Ozar
  May 23, 2019 1:15 am
  
  Raffaele – sadly, all user-defined functions have the huge performance drawback I described in the post, even if they’re doing the minimal amount of work you show here.
  
  Reply
  - Raffaele
    May 23, 2019 2:09 am
    
    And if I use directly “select CONVERT(DATETIME,@d AT TIME ZONE ‘UTC’ AT TIME ZONE ‘W. Europe Standard Time’)” instead of using scalar function ?
    
    thanks
    
    Reply
    - Brent Ozar
      May 23, 2019 2:12 am
      
      Raffaele – it’s not really efficient for me to teach you performance tuning in blog post comments. It sounds like you might be ready for my Fundamentals of Query Tuning class.
      
      Reply
      - Raffaele
        May 23, 2019 3:09 am
        
        Ok , I understand it , I apologize for my insistence
- Dooug
  May 23, 2019 8:14 am
  
  The main issue with using a scalar function in a query is not the example you gave, but when you code something like:
  CREATE FUNCTION dbo.GetJobTitle(@JobID INT) RETURNS VARCHAR(100)
  AS
  BEGIN
  RETURN (SELECT JobName FROM JobTitles WHERE JobID = @JobID);
  END;
  
  Then write:
  SELECT EmployeeName, dbo.GetJobTitle(JobID) AS JobTitle FROM Employee;
  
  Reply
N8
May 23, 2019 9:24 am

I’m guessing #2 doesn’t apply to table functions?
Also here is a old/good article on user-defined-functions.
https://www.captechconsulting.com/blogs/performance-considerations-of-user-defined-functions-in-sql-server-2012

Reply
- Brent Ozar
  May 23, 2019 6:48 pm
  
  N8 – correct, it says “scalar” in there for a reason. Table-valued functions are different, and beyond the scope of this post. Thanks!
  
  Reply
- Jeff Moden
  May 30, 2019 2:18 pm
  
  Just a note to remind folds that there are THREE different types of functions in this area. Scalar, mTVF, and iTVF. iTVFs are the only ones that don’t inherently suck for performance.
  
  Reply
Jeff Moden
May 30, 2019 2:23 pm

With reference to “3. Avoid MAX datatypes unless you really need ’em.”. If you absolutely MUST use them for some reason, make sure that you use sp_tableoption to move them out of row or your scans will suffer greatly. So will page splits and fragmentation if they’re “ExpAnsive” after the initial insert. And, if you’ve ever seen a Clustered Index that contains LOBs with near “0” logical fragmentation and a low page density of something like 75%, it’s probably because of the LOBs that were short enough to be in-row.

Reply