SQL iceberg

2024-08-14 · 63 min read

Ler em português

Introduction

The SQL language, Structured Query Language, first emerged in 1974 and is one of the leading languages to this day, when it comes to databases.

SQL evolved a lot throughout the years and especially over the last two decades, with modern data organization techniques, that came up as a response to new competitors (NoSQL) and new needs (analytics, data warehouse, cloud computing).

This article is a SQL cheatsheet, covering some of the most common commands to those that are rare and have interesting uses. This list tries to be easy to understand for beginners and for those who already have some familiarity with the language.

The examples here are for Microsoft SQL Server, but most of them apply in a similar or equal way for other databases, such as PostgreSQL, MySQL and Oracle.

SQL iceberg

Before we begin

Reads and queries

Data changes

Table structuring

Special

Final conclusions


Before we begin

Installation

To test the examples of this article, I recommend installing SQL Server (download) and SQL Server Management Studio (SSMS) (download).

After the installations, enter SSMS and login in your instance:

  • Windows Authentication
  • Server name: . (dot)
  • Check Trust server certificate

These options may differ according to the configuration during setup.

Then, choose a database in the left panel; if there are none, click with the mouse right button on Databases and create one.

To write a SQL script, click on New query, in the top menu. To run, press F5 or click on Execute.

Sample tables

Let's start with two tables that will be used in our examples: tables Fruit and Family.

Table [dbo].[Fruit]:

IdNameIdFamilyCalories*
1Coconut1354
2Date1282
3Strawberry332
4Watermelon230
5Passionfruit497
6Cherry364
7Pineapple548
8Apple352

* Calories per 100g. Source: Mundo Boa Forma.

Table [dbo].[Family]:

IdFamily
1Arecaceae
2Cucurbitaceae
3Rosaceae
4Passifloraceae
5Bromeliaceae
6Malvaceae

To create them, run the following SQL script in your database.

CREATE TABLE [dbo].[Family](
  [Id] INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
  [Name] NVARCHAR(64) NOT NULL)
GO

INSERT INTO [dbo].[Family] VALUES
  ('Arecaceae'), ('Cucurbitaceae'), ('Rosaceae'),
  ('Passifloraceae'), ('Bromeliaceae'), ('Malvaceae');
GO

CREATE TABLE [dbo].[Fruit](
  [Id] INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
  [Name] NVARCHAR(64) NOT NULL,
  [IdFamily] INT NOT NULL FOREIGN KEY REFERENCES [dbo].[Family](Id),
  [Calories] INT NOT NULL)
GO

INSERT INTO [dbo].[Fruit] VALUES 
  ('Coconut', 1, 354), ('Date', 1, 282), ('Strawberry', 3, 32),
  ('Watermelon', 2, 30), ('Passionfruit', 4, 97), ('Cherry', 3, 64),
  ('Pineapple', 5, 48), ('Apple', 3, 52);
GO

Reads and queries

SELECT

Gets data in a tabular structure.

SELECT [Name], [Calories]
FROM [dbo].[Fruit]
WHERE [Calories] < 50

Returns:

NameCalories
Strawberry32
Watermelon30
Pineapple48

* means all columns:

SELECT *
FROM [dbo].[Fruit]
WHERE [Calories] < 50
IdNameIdFamilyCalories
3Strawberry332
4Watermelon230
7Pineapple548

LEFT, RIGHT, INNER JOINS

Interlaces two or more tables by reference columns; this interlacing can be used for queries, insertions, updates and deletions.

Here we'll focus only on INNER JOIN and LEFT JOIN.

Different types of JOINs on SQL

INNER JOIN

INNER JOIN is a perfect intersecction, that is, the interlaced values must exist in both tables. If it exists in a line of table X, but with no corresponding line in another table Y, the line of table X is not considered. Example:

Table [dbo].[Fruit]:

IdNameIdFamilyCalories
3Strawberry332
4Watermelon230
7Pineapple548

Table [dbo].[Family]:

IdFamily
2Cucurbitaceae
3Rosaceae
5Bromeliaceae

SQL query:

SELECT
    fr.[Name] AS [Name],
    fa.[Name] AS [Family]
FROM [dbo].[Fruit] fr
INNER JOIN [dbo].[Family] fa ON fr.[IdFamily] = fa.[Id]
WHERE fr.[Calories] < 50

Returns:

NameFamily
StrawberryRosaceae
WatermelonCucurbitaceae
PineappleBromeliaceae

LEFT / RIGHT JOIN

LEFT JOIN and RIGHT JOIN are imperfect intersecctions, such that the correspondence value is allowed to not exist in one of the tables; if it doesn't exist, then values from the table with no correspondence will come null.

In the tables above, the Malvaceae family (id = 6) has no registered fruit.

SELECT
    fa.[Name] AS [Family],
    fr.[Name] AS [Fruit]
FROM [dbo].[Family] fa
LEFT JOIN [dbo].[Fruit] fr ON fr.[IdFamily] = fa.[Id]
FamilyFruit
ArecaceaeCoconut
ArecaceaeDate
CucurbitaceaeWatermelon
RosaceaeStrawberry
RosaceaeCherry
RosaceaeApple
PassifloraceaePassionfruit
BromeliaceaePineapple
MalvaceaeNULL

VIEWS

VIEWS are encapsulated queries: instead of always typing a complex query, you can "alias" this query with a view, which simplifies the call. This technique is good for abstracting logic and to control access permissions - an user can call the view, but be blocked from querying the source tables.

View creation:

CREATE OR ALTER VIEW [dbo].[ViewGetFruits] AS

  SELECT
    fr.[Name] AS [Fruit],
    fa.[Name] AS [Family]
  FROM [dbo].[Fruit] fr
  INNER JOIN [dbo].[Family] fa ON fr.[IdFamily] = fa.[Id]

GO

To run:

-- the view behaves like a table
SELECT * FROM [dbo].[ViewGetFruits]

GROUP BY, HAVING

GROUP BY is a grouping instruction and HAVING is a filter applied over groups.

Columns returned by a SELECT with grouping must be specified on the GROUP BY, or be in aggregation functions, like counts, sums, averages, etc.

Example:

SELECT fr.[IdFamily], COUNT(fr.[Id]) AS NumberOfFruits
FROM [dbo].[Fruit] fr
GROUP BY fr.[IdFamily]
IdFamilyNumberOfFruits
12
21
33
41
51

With inner join and filter over the groups:

SELECT fa.[Name] AS [Family], COUNT(fr.[Id]) AS NumberOfFruits
FROM [dbo].[Fruit] fr
INNER JOIN [dbo].[Family] fa ON fr.[IdFamily] = fa.[Id]
GROUP BY fa.[Name]
HAVING COUNT(fr.[Id]) >= 2 -- family w/ 2 or more fruits
FamilyNumberOfFruits
Arecaceae2
Rosaceae3

UNION, INTERSECT, EXCEPT

These are set operators (see set theory in mathematics). They are different from the interlacing that happens in JOINs - while in JOINs the interlacing is between columns, with sets the interlacing is between lines. The examples below show this visually.

  • UNION: Returns elements that belong to one set or another.
  • INTERSECT: Returns elements that belong to one set and another, at the same time.
  • EXCEPT: Returns elements that belong to one set, but not to another.
DECLARE @a TABLE (X INT);
INSERT INTO @a VALUES (1),(2),(3),(4),(5);

DECLARE @b TABLE (X INT);
INSERT INTO @b VALUES (2),(4),(6);

SELECT * FROM @a

UNION -- INTERSECT, EXCEPT

SELECT * FROM @b
ABUNIONINTERSECTEXCEPT
12121
24243
363-5
4-4--
5-5--
--6--

Why use an UNION when I can just write WHERE criterion1 OR criterion2 ?

Sometimes we want to unite lines from different tables:

SELECT * FROM Table1 WHERE criterion1
UNION
SELECT * FROM Table2 WHERE criterion2

What is UNION ALL?

In set theory, a set cannot have repeated elements. However, in SQL queries, sometimes we want to allow duplications in a union, for that, we use UNION ALL, instead of just UNION.

SUBQUERIES

These are queries that run inside others. They can be useful in more complex operations, however, whenever possible, it is best to pick joins over subqueries, for better performance.

In the example below, the fruit family's name is obtained through a subquery:

SELECT
    fr.[Name] AS [Name],
    (SELECT fa.[Name] FROM [dbo].[Family] fa WHERE fa.[Id] = fr.[IdFamily]) AS [Family]
FROM [dbo].[Fruit] fr
WHERE fr.[Calories] < 50

SELF JOIN

Is a join of table with itself. Useful for tree operations.

Below, we have a Employee table and an employee can be someone else's boss.

DECLARE @employee TABLE (Id INT, [Name] NVARCHAR(32), IdBoss INT);
INSERT INTO @employee VALUES
(1, 'Mark', 4),
(2, 'John', NULL),
(3, 'Michael', 2),
(4, 'Rachel', 2),
(5, 'Jessica', 4);

SELECT
  f1.[Name] AS [Employee],
  f2.[Name] AS [Boss]
FROM @employee f1
LEFT JOIN @employee f2 ON f1.[IdBoss] = f2.[Id]
ORDER BY f2.[Id] ASC
EmployeeBoss
JohnNULL
MichaelJohn
RachelJohn
JessicaRachel
MarkRachel

CROSS JOIN

It is the cartesian product between tables, basically, it makes all possible combinations of elements of one table to another.

Consider two tables, IceCream and Topping, and we want to know what combinations we can have.

DECLARE @icecream TABLE (Id INT, [Name] NVARCHAR(32));
INSERT INTO @icecream VALUES
(1, 'Strawberry'), (2, 'Flake'), (3, 'Chocolate');

DECLARE @topping TABLE (Id INT, [Name] NVARCHAR(32));
INSERT INTO @topping VALUES
(1, 'Chocolate syrup'), (2, 'Nuts');

SELECT i.[Name] AS IceCream, t.[Name] AS Topping
FROM @icecream i CROSS JOIN @topping t
ORDER BY i.[Id] ASC, t.[Id] ASC
IceCreamTopping
StrawberryChocolate syrup
StrawberryNuts
FlakeChocolate syrup
FlakeNuts
ChocolateChocolate syrup
ChocolateNuts

PIVOT

Ross asking to rotate the couch - Friends

The PIVOT instruction is very interesting. It transforms lines into columns, by rotating part of the table in 90º. It is very good for crossing two or more columns of a table.

In the example below, we have a table Sales with each line being the quantity of a product sold on a certain day. We want to visualize this in a better way, with date columns, and for that we will use PIVOT.

DECLARE @sales TABLE (IdProduct INT, Quantity INT, SalesDate DATE);
INSERT INTO @sales VALUES
(5, 52, '2024-08-05'),
(3, 902, '2024-08-05'),
(1, 196, '2024-08-06'),
(4, 212, '2024-08-06'),
(3, 488, '2024-08-07'),
(4, 384, '2024-08-07'),
(1, 1121, '2024-08-08'),
(2, 49, '2024-08-08'),
(3, 88, '2024-08-09'),
(2, 12, '2024-08-09'),
(5, 98, '2024-08-09');

SELECT * FROM
(
  SELECT [SalesDate], [IdProduct], [Quantity]
  FROM @sales
) t
PIVOT(
  SUM([Quantity]) FOR [SalesDate] IN ([2024-08-05],[2024-08-06],[2024-08-07],[2024-08-08],[2024-08-09])
) AS pvt;
IdProduct2024-08-052024-08-062024-08-072024-08-082024-08-09
1NULL196NULL1121NULL
2NULLNULLNULL4912
3902NULL488NULL88
4NULL212384NULLNULL
552NULLNULLNULL98

What if I want to generate columns dynamically, with no fixed values for them?

You can generate the command's text at runtime and then run it. This article shows how to do it inside a procedure:

CREATE PROCEDURE dbo.GenerateStudentsReportCards
  @ColumnToPivot  NVARCHAR(255),
  @ListToPivot   NVARCHAR(255)
AS
BEGIN
 
  DECLARE @SqlStatement NVARCHAR(MAX)
  SET @SqlStatement = N'
    SELECT * FROM (
      SELECT [Student], [Subject], [Marks]
      FROM [Grades]
    ) StudentResults
    PIVOT (
      SUM([Marks])
      FOR ['+@ColumnToPivot+']
      IN ('+@ListToPivot+')
    ) AS PivotTable'; 
  EXEC(@SqlStatement)
 
END

To run:

EXEC dbo.GenerateStudentsReportCards
  N'Subject'
  ,N'[Mathematics],[Science],[Geography]'

ROLLUP, CUBE, GROUPING SETS

These are types of groupings that also analyze subgroups.

DECLARE @sales TABLE (IdProduct INT, Quantity INT, SalesDate DATE);
INSERT INTO @sales VALUES
(5, 52, '2024-08-05'),
(3, 902, '2024-08-05'),
(1, 196, '2024-08-06'),
(4, 212, '2024-08-06'),
(3, 488, '2024-08-07'),
(4, 384, '2024-08-07');

SELECT [SalesDate], [IdProduct], SUM([Quantity]) AS [Quantity]
FROM @sales
GROUP BY ROLLUP([SalesDate],[IdProduct]) -- CUBE, GROUPING SETS

ROLLUP

Analyzes subgroups on a pyramid, from the right to left in the declaration. GROUP BY ROLLUP([ColA], [ColB]) will check:

  • [ColA], [ColB] (subgroup of A and B)
  • [ColA], NULL (subgroup of A for all values of B)
  • NULL, NULL (general group)
SalesDateIdProductQuantity
2024-08-053902
2024-08-05552
2024-08-05NULL954
2024-08-061196
2024-08-064212
2024-08-06NULL408
2024-08-073488
2024-08-074384
2024-08-07NULL872
NULLNULL2234

CUBE

Analyzes all possible subgroups. GROUP BY CUBE([ColA], [ColB]) will check:

  • [ColA], [ColB] (subgroup of A and B)
  • [ColA], NULL (subgroup of A for all values of B)
  • NULL, [ColB] (subgroup of B for all values of A)
  • NULL, NULL (general group)
SalesDateIdProductQuantity
2024-08-061196
NULL1196
2024-08-053902
2024-08-073488
NULL31390
2024-08-064212
2024-08-074384
NULL4596
2024-08-05552
NULL552
NULLNULL2234
2024-08-05NULL954
2024-08-06NULL408
2024-08-07NULL872

GROUPING SETS

Analyzes the subgroups of each column, but without crossing them. GROUP BY GROUPING SETS([ColA], [ColB]) checks:

  • [ColA], NULL (subgroup of A for all values of B)
  • NULL, [ColB] (subgroup of B for all values of A)
SalesDateIdProductQuantity
NULL1196
NULL31390
NULL4596
NULL552
2024-08-05NULL954
2024-08-06NULL408
2024-08-07NULL872

WINDOW FUNCTION

They are functions that generate values looking over slices (or windows) of a table. They are very useful for pagination, like product searches.

This is a lengthy subject and with many possibilities, which is why I will not cover it here, but, I recommend this article that explains very well how window functions work.

NOLOCK

NOLOCK, or READ UNCOMMITTED, is a query hint that, as the name suggests, ignores locks.

A regular query uses the standard isolation level, READ COMMITTED, that reads only committed data from a table. In a table with heavy traffic (many simultaneous operations), a query may face concurrency problems with inserts, updates and deletes.

With the NOLOCK hint, the read does not compete with those operations. However, this reading mode includes uncommitted records, in such a way that inconsistent data may appear in the results. Generally speaking, NOLOCK should be used only if there actually is a concurrency problem, and if eventual inconsistencies are tolerable. An example of usage is the extraction of analytical reports.

SELECT *
FROM [dbo].[Fruit] WITH (NOLOCK)
WHERE [Calories] < 50

Data changes

INSERT, UPDATE, DELETE

Adds, modifies and excludes lines from a table.

DECLARE @person TABLE (Id INT, [Name] NVARCHAR(32));

-- inserting with VALUES
INSERT INTO @person VALUES
(1, 'Mario'), (2, 'Caroline');

-- inserting with SELECT
INSERT INTO @person SELECT 3, 'James';

-- changing name
UPDATE @person
SET [Name] = 'Maria'
WHERE [Id] = 1;

-- deleting person
DELETE FROM @person
WHERE [Name] = 'James';

-- final result
SELECT * FROM @person

STORED PROCEDURES

They are scripts that accept parameters and execute actions in the database; they can return table-valued results and numerical values.

The procedure below can be used to add fruits (considering the example from the beginning).

CREATE OR ALTER PROCEDURE [dbo].[ProcAddFruit]
  @fruit NVARCHAR(32),
  @family NVARCHAR(32),
  @calories INT
AS
BEGIN

  -- checks if the fruit is already registered
  IF EXISTS (SELECT 1 FROM [dbo].[Fruit] WHERE [Name] = @fruit)
  BEGIN
    RAISERROR('Fruit already registered!', 16, 1)
    RETURN 1; -- stop execution
  END

  -- adds fruit family if it doesn't exist
  IF NOT EXISTS (SELECT 1 FROM [dbo].[Family] WHERE [Name] = @family)
  BEGIN
    INSERT INTO [dbo].[Family] VALUES (@family);
  END

  -- get family id
  DECLARE @idFamily INT = (SELECT [Id] FROM [dbo].[Family] WHERE [Name] = @family)

  -- adds fruit
  INSERT INTO [dbo].[Fruit] VALUES (@fruit, @idFamily, @calories);

  RETURN 0; -- everything OK

END
GO

To run:

EXECUTE [dbo].[ProcAddFruit]
  @fruit = 'Orange',
  @family = 'Rutaceae',
  @calories = 37

MERGE

Allows simultaneous execution of insertions, updates and deletions, comparing a source table to a target table. This instruction is used for sincronizing different data sources. If the sincronizations are always of the same type (insertions, updates, deletes), then using these specific operations offers better performance.

I will not enter into details here, but, I recommend this article for those interested.

MERGE target_table USING source_table
ON merge_condition
WHEN MATCHED THEN update_statement
WHEN NOT MATCHED THEN insert_statement
WHEN NOT MATCHED BY SOURCE THEN DELETE;

BULK INSERT

BULK INSERT is a mass insertion of data coming from a file, usually in CSV or XML formats, or a flat file.

This is a strategy to move large volumes of data from one place to another, especially if flat files are chosen, because they are very compact and have very high compression rates (example: a 900MB flat file, when zipped, can be reduced to 30MB, depending on the situation).

Consider the flat file below. Note that the starting positions of each field are always the same on a line - fruit name (0..16), family name (16..32), calories (32..35):

Graviola        Annonaceae      66 
Mamão Papaya    Caricaceae      46 
Cambucá         Myrtaceae       66 
Bacuri          Clusiaceae      105
Cupuaçu         Malvaceae       49 

To import a file using BULK INSERT, I recommend 1) using a format file and 2) first insert into an intermediary table, because many times we need to polish data, by trimming whitespaces or converting dates. The following is a XML format file.

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
	<RECORD>
		<FIELD ID="F1" xsi:type="CharFixed" LENGTH="16"/><!-- Fruit name -->
		<FIELD ID="F2" xsi:type="CharFixed" LENGTH="16"/><!-- Family name -->
		<FIELD ID="F3" xsi:type="CharFixed" LENGTH="3"/><!-- Calories -->
		<FIELD ID="F4" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="6"/><!-- Line break -->
	</RECORD>
	<ROW>
		<COLUMN SOURCE="F1" NAME="FruitName" xsi:type="SQLVARYCHAR" LENGTH="16" NULLABLE="NO"/>
		<COLUMN SOURCE="F2" NAME="FamilyName" xsi:type="SQLVARYCHAR" LENGTH="16" NULLABLE="NO"/>
		<COLUMN SOURCE="F3" NAME="Calories" xsi:type="SQLINT"/>
	</ROW>
</BCPFORMAT>

To run a BULK INSERT:

CREATE TABLE #temp1
(Fruit NVARCHAR(16), Family NVARCHAR(16), Calories INT);

BULK INSERT #temp1 
FROM 'C:\MyFolder\flat_file.txt' 
WITH 
(
  CODEPAGE = '65001', -- UTF-8 with BOM encoding
  FORMATFILE = 'C:\MyFolder\format_file.xml', 
  BATCHSIZE = 4000
);

-- polishing
UPDATE #temp1
SET [Fruit] = TRIM([Fruit]),
    [Family] = TRIM([Family])

SELECT * FROM #temp1

DROP TABLE #temp1

Read more on the official docs:

Exporting flat files

Unfortunately, there are no easy ways of generating files straight from the database; for SQL Server, there is only the Import / Export Wizard, which is a graphical interface inside SSMS.

However, a simple console application could read lines from the database and output a flat file from them.


Table structuring

PRIMARY KEY, FOREIGN KEY

The primary key uniquely identifies each row of a table and determines what is the main search column, structuring the table in a B-tree; this makes reading it much faster.

The foreign key is a link between the column of one table to the primary key of another. This increases performance of joins between those tables and is also a form of validation, because a value is only valid if it exists in the foreign table.

Taking the initial example:

CREATE TABLE [dbo].[Fruit](
  [Id] INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
  [Name] NVARCHAR(64) NOT NULL,
  [IdFamily] INT NOT NULL FOREIGN KEY REFERENCES [dbo].[Family](Id),
  [Calories] INT NOT NULL)

INDEXES

They are additional search rails on a table that run parallel to the primary key. They are useful for tables with more than one search criteria.

Imagine a Person table, with columns Id, NationalIdentifier and Name: it's common to search a person by its national identifier or name, but, these columns are not the primary key, which makes searching by them slower. In these cases, the search is done individually line by line, over the whole table; moreover, it's a textual comparison, because NationalIdentifier and Name are texts. Textual comparisons are slower than numerical comparisons, which is the case of Id.

To speed the search, we can create indexes. The most common type is the hash index, which attributes a random number to each indexed value and saves this number next to the record. Whenever a search comes to use this criterion, the comparison will be numerical and with better performance than checking the field's value on every line.

Index creation:

-- one column
CREATE INDEX IX_Person_NationalIdentifier ON [dbo].[Person]([NationalIdentifier]);

-- more than one column
CREATE INDEX IX_Address_CityStateCountry
ON [dbo].[Address]([City],[State],[Country]);

UNIQUE

It's a special type of constraint that blocks repeated values in a column. In most databases, creating an unique constraint also creates an index for this column.

CREATE TABLE [dbo].[Person] (  
  Id INT NOT NULL PRIMARY KEY CLUSTERED,
  NationalIdentifier CHAR(11) NOT NULL,
  [Name] NVARCHAR(64) NOT NULL,
  CONSTRAINT CT_Unique_Person_NationalIdentifier UNIQUE([NationalIdentifier])
);

SPARSE

Sparse is a column modifier that optimizes storage when there are many null values.

In traditional storage, a field has a reserved amount of bytes for it in the row even when the value is null, which is a waste.

With sparse columns, a null value occupies zero bytes on the row and a non-null value occupies the original size plus 4 bytes.

Practical example, NVARCHAR(16) column needs 32 bytes on each row. With sparse, if null, 0 bytes will be used; if not null, 36 bytes.

Percentage of nullsColumn size difference
with SPARSE
0%+12.5%
25%-15.6%
75%-71.9%

Formula: 1 - ((T+4)/T) * (1-N)

  • T : original field size
  • N : percentage of nulls

If there are not many nulls, there is no advantage in using sparse columns, but they save space if the percentage of nullability is high.

CREATE TABLE [dbo].[Person] (  
  Id INT NOT NULL PRIMARY KEY CLUSTERED,
  NationalIdentifier CHAR(11) NOT NULL,
  [Name] NVARCHAR(64) NOT NULL,
  FavouriteColour NVARCHAR(32) SPARSE NULL);

COLUMNSTORE

Columnstore is a storage strategy.

The default storage mode of a table is linear, with rows laid inside pages. A, B and C are columns:

flowchart TB subgraph page[Page] direction LR subgraph row1[Row 1] direction TB A1 B1 C1 end subgraph row2[Row 2] direction TB A2 B2 C2 end subgraph row3[Row 3] direction TB A3 B3 C3 end end

With columnar storage, a page is exclusive for a column:

flowchart TB subgraph pageC[Page] subgraph columnC[Column C] direction LR C1 C2 C3 end end subgraph pageB[Page] subgraph columnB[Column B] direction LR B1 B2 B3 end end subgraph pageA[Page] subgraph columnA[Column A] direction LR A1 A2 A3 end end

What is the advantage of this approach? Columnar storage has compression by default and if values frequently repeat in columns, the storage size can be greatly reduced.

See below how columnar compression works:

Uncompressed columnCompressed column
A5 x A
A2 x B
A3 x A
A
A
B
B
A
A
A

According to SQL Server docs, the storage size can be up to 10x smaller and the reading speed up to 10x higher, compared to a linear storage.

When to use columnstore

  • High number of repetitions of values in vertical directions.
  • Low cardinality (possible variations of values).
  • High volume of data (over 500k lines).
  • Analytical processes: sums, averages, counts.

When not to use columnstore

  • Records that have many updates and deletes, because compression is lost when there are changes.
  • Records that are entities, because columnstore tables have no primary keys.
  • If this table is used for many joins.

Clustered vs nonclustered

A clustered columnstore index is when the table uses columnar storage. A nonclustered columnstore index is when the table uses linear storage and columnstore is an auxiliary index; this is an option when you want the benefits of columnstore (analytical performance) without losing the benefits of linear tables (primary keys).

Table creation

CREATE TABLE [dbo].[EntryFruitFile](
	Fruit NVARCHAR(16) NOT NULL,
	Family NVARCHAR(16) NOT NULL,
	Calories INT NOT NULL,
	INDEX CCI_EntryFruitFile CLUSTERED COLUMNSTORE);

Special

IMPORT DLLs

Yes, it is possible to import DLLs to execute on SQL Server.

Custom DLLs are interesting for use cases where there is no native similar, like: read and write files; HTTP calls; generate and validate hashes.

DLLs must be .NET Framework source code and signed with a strong name.

The process is a bit complex and the steps here are based on this article from C-Sharp Corner.:

  1. Create a Class Library project in .NET Framework
  2. C# code example below:
using System;

namespace MyDotnetSqlProj
{
    public static class Class1
    {
        public static double CalculateHypotenuse(double side1, double side2) =>
            Math.Sqrt(Math.Pow(side1, 2) + Math.Pow(side2, 2));
    }
}
  1. On Visual Studio, click with the mouse right button on the project, Properties, Signing, check Sign the assembly. Choose or make a new strong name key file. If you are creating, you can uncheck Protect my key file with a password.
  2. Build the project.
  3. Now, we must import the assymetric signing key, create a login in the master database and concede UNSAFE ASSEMBLY permission to it:
USE [master]
GO

CREATE ASYMMETRIC KEY AssymetricKey1
FROM EXECUTABLE FILE = 'C:\Projects\MyDotnetSqlProj\bin\Debug\MyDotnetSqlProj.dll';
GO

CREATE LOGIN LoginAssymetricKey1
FROM ASYMMETRIC KEY AssymetricKey1;
GO

GRANT UNSAFE ASSEMBLY TO LoginAssymetricKey1;
GO
  1. In the application database, create an user for the recently created login, import the assembly and enable CLR execution:
USE [YourDB];
GO

CREATE USER UserLoginAssymetricKey1
FOR LOGIN LoginAssymetricKey1;
GO

CREATE ASSEMBLY AssemblyMyLib
FROM 'C:\Projects\MyDotnetSqlProj\bin\Debug\MyDotnetSqlProj.dll'
WITH PERMISSION_SET = SAFE;
GO

sp_configure 'clr enabled', 1;
GO
RECONFIGURE;
GO
  1. Let's create a FUNCTION that calls the DLL's code:
USE [YourDB];
GO

-- for string, use NVARCHAR parameters
CREATE FUNCTION [dbo].[CalculateHypotenuse] (@side1 FLOAT, @side2 FLOAT)
RETURNS FLOAT
AS
  EXTERNAL NAME [AssemblyMyLib].[MyDotnetSqlProj.Class1].[CalculateHypotenuse];
GO
  1. To execute the function:
-- returns 5
SELECT [dbo].[CalculateHypotenuse] (3.0, 4.0) as [Hypotenuse];

-- returns 13
SELECT [dbo].[CalculateHypotenuse] (5.0, 12.0) as [Hypotenuse];

JSON

To handle JSONs inside the database.

DECLARE @json NVARCHAR(100) =
'{"cities": ["Aracaju", "João Pessoa", "Maceió"]}';

-- returns 3 lines with the cities
SELECT * FROM OPENJSON(@json, '$.cities')

-- returns 1 line with the selected city
SELECT JSON_VALUE(@json, '$.cities[1]')

Check more in the docs.

APPLOCKS

They are application locks inside the database. They are different from row locks.

Can be used for concurrency control, such as for reading messages with multiple simultaneous consumers.

The SQL Server functions are sp_getapplock and sp_releaseapplock.

Learn more in the article about asynchronous architecture without queues.

VIEW EXECUTION PLAN

The execution plan describes which steps the database engine needs to perform to execute a command or query.

On SSMS, enabling the Include execution plan on the top menu will include the plan along with query results.

By analyzing the cost of each step, you can understand which performance bottlenecks exist and from there improve your query, by creating indexes, rewriting WHERE and JOIN criteria, and others.

SQL execution plan

Final conclusions

Jack and Rose on the Titanic

SQL is an extremely powerful and complete tool, and acquired new capabilities over the years. It is no wonder that remains being the mainstream for databases up to today; it is the basis for complex systems.

This article covered common and less known SQL features, and still there are many others that could be addressed here with real world applications, therefore, it is worth it to explore the documentation of your database, because for many times it has the answer for a challenge you might be facing.

A

AlexandreHTRB

Campinas/SP,
Brasil