Bringing F# into the SQLCLR

It’s been some-time since my last post, and don’t worry, we’re still going to continue the IMAP server. I’ve been swamped at work, and as a result, haven’t had the time to properly dedicate to writing these posts (especially that series, which is a complex topic).

Excuses aside, today we’re going to talk about something I recently did for work, which is integrating F# into the SQLCLR (part of Microsoft SQL Server).

For those who don’t know, the SQLCLR is a feature of SQL Server that allows one to import .NET assemblies as user-defined functions, or stored procedures. On it’s own it doesn’t sound impressive, but the SQLCLR allows us to significantly improve performance in some cases, and moderately improve it in others.

I won’t go into detail explaining the SQLCLR, a gentleman by the name of Soloman Rutzky does that quite well. I’ll let his “Stairway to SQLCLR” give you the introduction.

No, what I’ll do today is show you how to import F# into the SQLCLR, instead of just C# or VB.NET. The process is about as straightforward as Soloman describes, but there are a few “gotcha’s”, so I’m going to include those in our discussion here today.

Without further ado, let’s get started.

First: create the project and add the `System.Data` reference

The first step is obviously to create a project to hold our SQL code. The project should be an F# Class Library, in .NET Framework (I’m using 4.7.1 and F# Core 4.4.3.0). You’ll want a module for the functions, and in that module you’ll want to open Microsoft.SqlServer.Server, and System.Data.SqlTypes.

Once we’ve done that, we’ll build a function. There are a few rules to creating a function in .NET that can be seen by SQL Server:

The function must have the SqlFunction attribute;
All inputs must be tupled;
All input and output types must be a Sql[Something] type (SqlDouble, SqlInt, etc.);

So, for our example we’re going to use a real-world example from my work: distance calculation from two geo-coded points.

To do this, we’ll build a function that takes 4 double values: two Latitude/Longitude value sets.

let calculateDistance (fromLat : SqlDouble, fromLon : SqlDouble, toLat : SqlDouble, toLon : SqlDouble) : SqlDouble

That’s the signature we’ll use, next, we want to define how SQL should treat the function:

[<SqlFunction(
    IsDeterministic = true,
    IsPrecise = false,
    SystemDataAccess = SystemDataAccessKind.None,
    DataAccess = DataAccessKind.None)>]

This is where life gets special, so let me explain them piece-by-piece:

SqlFunction: this is just the attribute we use, there is also SqlProcedure for stored procedures;
IsDeterministic = true: this value should ONLY be set to true if the function is deterministic, that is, given any input value, it returns one and exactly one output, and that two calls to the function with the same input will result in the same output;
IsPrecise = false: this value should ONLY be set to true if the function uses the DECIMAL or NUMERIC types, and does precise mathematical calculations;
SystemDataAccess = SystemDataAccessKind.None: I’ll be completely honest with you, I don’t know what the difference between this and DataAccess are, but if you do any reading/writing to/from SQL, you should set it to Read, otherwise, probably use None (there’s a small performance cost to setting this to Read, I leave it to you to decide whether or not to do so);
DataAccess = DataAccessKind.None: see above;

So basically, what we did here is define a function and tell SQL what it should expect the function to do. One of the most impotant parts is the IsDeterministic flag: this tells SQL that if it called the function for a set of values, it can reuse that result for any subsequent calls with the same set of values. This means it can memoize the results. If your function has side-effects, do not set this flag to true, or you will get weird results. Basically, if your function is truly “pure” (no side-effects), mark it with IsDeterministic = true.

Next: write the code

Alright, so we’ve covered the hard parts, next, we write the function.

My version of this function used some logic that was specific to my workplace, so I’m going to remove it and we’ll write a vanilla function:

let constMod = 1.852 / 1.61 * 60.
let divPi180 = Math.PI / 180.
let div180Pi = 180. / Math.PI

[<SqlFunction(
    IsDeterministic = true,
    IsPrecise = false,
    SystemDataAccess = SystemDataAccessKind.None,
    DataAccess = DataAccessKind.None)>]
let calculateDistance (fromLat : SqlDouble, fromLon : SqlDouble, toLat : SqlDouble, toLon : SqlDouble) : SqlDouble =
    let fromLat = fromLat.Value
    let fromLon = fromLon.Value
    let toLat = toLat.Value
    let toLon = toLon.Value
    
    let fromLat = fromLat * divPi180
    let toLat = toLat * divPi180
    let fromLon = fromLon * divPi180
    let toLon = toLon * divPi180
        
    constMod *
    (Math.Acos
        ((Math.Sin toLon) * (Math.Sin fromLon) +
         (Math.Cos toLon) * (Math.Cos fromLon) * (Math.Cos (toLat - fromLat))))
    |> SqlDouble

This should be self-explanatory: we basically convert the data and do some simple math on it.

Third: enable SQLCLR

Alright, so that’s that entirety of our .NET code.

Now, we need to enable the SQLCLR, because it’s disabled by default.

The SQLCLR can be enabled through GUI or T-SQL, I prefer to do it through GUI because I typo a lot.

To enable it:

Right click your server in SSMS;
Click “Facets”;
In the “Facet” dropdown select “Surface Area Configuration”;
Change “ClrIntegrationEnabled” to “True”;
Click “OK”;

Easy enough.

Fourth: trust the assembly, and import it

This is one spot where things aren’t completely awesome: the FSharp.Core library isn’t built to natively support a “SAFE” import to SQLCLR, so we have to trust it first.

To trust the assemblies, we’ll want to get a SHA2_512 hash of them, and optionally, a description.

I, personally, don’t care so much about the description at the moment, so I’ll leave that out and let you locate it if you like. Instead, I’m just going to demonstrate how to hash it and trust it.

We need to trust FSharp.Core, and then our assembly:

DECLARE @hash AS BINARY(64) = (SELECT HASHBYTES('SHA2_512', (SELECT * FROM OPENROWSET (BULK 'C:\path\to\bin\dir\FSharp.Core.dll', SINGLE_BLOB) AS [Data])))
EXEC sp_add_trusted_assembly @hash

Then, our assembly:

DECLARE @hash AS BINARY(64) = (SELECT HASHBYTES('SHA2_512', (SELECT * FROM OPENROWSET (BULK 'C:\path\to\bin\dir\MyAssembly.dll', SINGLE_BLOB) AS [Data])))
EXEC sp_add_trusted_assembly @hash

Easy enough.

Because FSharp.Core isn’t built for native SQL Server support (which, if anyone want’s to fix, I’ve included the error at the end of this article), we have to add it with PERMISSION_SET = UNSAFE, which is, well…unsafe.

So, to load our assembly, we need a name, and the path:

CREATE ASSEMBLY [MyAssembly]
AUTHORIZATION dbo
FROM 'C:\path\to\bin\dir\MyAssembly.dll'
WITH PERMISSION_SET = SAFE

Not particularly hard. The name ([MyAssembly]) is not restricted to anything other than the regular NVARCHAR(128) for sysname, it does not need to match anything from the DLL, but probably easier if it does.

Finally: create the function

Alright, so our assembly is imported, we have it available, the last part is creating the function.

To create the function, we start it off like a normal T-SQL UDF:

CREATE FUNCTION CalculateDistance
(
    @fromLat FLOAT,
    @fromLon FLOAT,
    @toLat FLOAT,
    @toLon FLOAT
)
RETURNS FLOAT

If you’ve ever written a T-SQL Scalar-Valued UDF, this should look familiar. We build the signature exactly as we defined it in F#, and that part is super important: the signature cannot vary at all.

Next, we write the UDF:

AS EXTERNAL NAME [MyAssembly].[MyAssembly.Namespace.ModuleName].calculateDistance

The EXTERNAL NAME is a three part name:

The assembly name as specified in CREATE ASSEMBLY;
The assembly namespace and module name, the fully-qualified name of the first outer-container of the function we need;
The function name itself;

Once you’ve created the function, we’re literally all done. You can now call directly into your CLR code:

SELECT dbo.CalculateDistance(@fromLat, @fromLon, @toLat, @toLon)

Demonstrations!

For those who want to see the performance difference, the original T-SQL function is:

CREATE FUNCTION CalculateDistanceUdf
(
    @fromLat FLOAT,
    @fromLon FLOAT,
    @toLat FLOAT,
    @toLon FLOAT
)
RETURNS FLOAT
WITH SCHEMABINDING
AS 
BEGIN
    RETURN (1.852 / 1.61) *
        60 *
        DEGREES(
            ACOS(
                SIN(RADIANS(@toLon)) *
                SIN(RADIANS(@fromLon)) +
                COS(RADIANS(@toLon)) *
                COS(RADIANS(@fromLon)) *
                COS(RADIANS(@toLat) - RADIANS(@fromLat))))
END

The WITH SCHEMABINDING is a hint to try to tell SQL Server to mark the function deterministic, and it is as verified with SELECT OBJECTPROPERTY(OBJECT_ID('[dbo].[CalculateDistanceUdf]'), 'IsDeterministic'), but it still performs significantly slower than the SQLCLR alternative.

I borrowed the test from this article to run mine, and wrote them as follows:

CREATE TABLE Numbers (
    Num INT NOT NULL,
    CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Num)
)
GO

WITH N1(C) AS (SELECT 0 UNION ALL SELECT 0),
N2(C) AS (SELECT 0 FROM N1 AS T1 CROSS JOIN N1 AS T2),
N3(C) AS (SELECT 0 FROM N2 AS T1 CROSS JOIN N2 AS T2),
N4(C) AS (SELECT 0 FROM N3 AS T1 CROSS JOIN N3 AS T2),
N5(C) AS (SELECT 0 FROM N4 AS T1 CROSS JOIN N4 AS T2),
N6(C) AS (SELECT 0 FROM N4 AS T1 CROSS JOIN N4 AS T2 CROSS JOIN N3 AS T3),
Nums(Num) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM N6)
INSERT INTO Numbers(Num) SELECT Num FROM Nums
GO

This inserts 1048576 rows to the Numbers table, so it’s a good-sized test.

Then we can run each of the following three tests:

DECLARE @fromLat AS FLOAT = 100
DECLARE @fromLon AS FLOAT = 100
DECLARE @toLat AS FLOAT = 120
DECLARE @toLon AS FLOAT = 120

SELECT MAX(dbo.CalculateDistance(Num / @fromLat, Num / @fromLon, Num / @toLat, Num / @toLon)) FROM Numbers
GO

DECLARE @fromLat AS FLOAT = 100
DECLARE @fromLon AS FLOAT = 100
DECLARE @toLat AS FLOAT = 120
DECLARE @toLon AS FLOAT = 120

SELECT MAX(dbo.CalculateDistanceUdf(Num / @fromLat, Num / @fromLon, Num / @toLat, Num / @toLon)) FROM Numbers
GO

DECLARE @fromLat AS FLOAT = 100
DECLARE @fromLon AS FLOAT = 100
DECLARE @toLat AS FLOAT = 120
DECLARE @toLon AS FLOAT = 120

SELECT MAX
    (
        (1.852 / 1.61) *
        60 *
        DEGREES(
            ACOS(
                SIN(RADIANS(Num / @toLon)) *
                SIN(RADIANS(Num / @fromLon)) +
                COS(RADIANS(Num / @toLon)) *
                COS(RADIANS(Num / @fromLon)) *
                COS(RADIANS(Num / @toLat) - RADIANS(Num / @fromLat)))))
FROM Numbers
GO

You can run these each individually to time them. My times were roughly 645ms for the SQLCLR, 3369ms for the T-SQL UDF, and 703ms for the inline T-SQL. As you can see, the SQLCLR function is faster than the inline T-SQL, and let’s us encapsulate the logic in a single function. (This actually came about as an issue because we have the calculation there copied-and-pasted over several dozen queries, often 3-8x per query.)

So, that said, in this type of situation (raw math) there’s no reason to use T-SQL for the task, and for something reasonably complex like this, no reason not to abstract it. Dump the code in .NET, write your unit tests, and then deploy the assembly to the SQL server.

Now, that said, there are times I wouldn’t use a SQLCLR function, such as when the math is ultra simple: i.e. * 3, and there are times when a table-valued UDF would be far superior, so I don’t want to make the suggestion that this will always help, just that it’s another thing you can try, and it might actually surprise you.

For anyone curious, attempting to create an assembly in F# throws the following warning:

Warning: The Microsoft .NET Framework assembly ‘fsharp.core, version=4.4.3.0, culture=neutral, publickeytoken=b03f5f7f11d50a3a, processorarchitecture=msil.’ you are registering is not fully tested in the SQL Server hosted environment and is not supported. In the future, if you upgrade or service this assembly or the .NET Framework, your CLR integration routine may stop working. Please refer SQL Server Books Online for more details.

And using a PERMISSION_SET of EXTERNAL_ACCESS or SAFE throws the following error:

CREATE ASSEMBLY failed because type ‘Microsoft.FSharp.Collections.FSharpMap`2’ in safe assembly ‘FSharp.Core’ has a static field ’empty’. Attributes of static fields in safe assemblies must be marked readonly in Visual C#, ReadOnly in Visual Basic, or initonly in Visual C++ and intermediate language.

First: create the project and add the System.Data reference