Accelerated insert data from MSSQL to a remote PostgreSQL 800 times

In the development process of the project there is a need to periodically exchange data between database servers. Suppose we have a data source as SQL Server and the remote PostgreSQL server on which the data should be. After adding the remote server as a linked server can be queried:


the
INSERT INTO RemotePG...RemoteTable (RecordID, RecordName) VALUES (1,'Test string');

the Problem is that these queries run very long. If the challenge before us is to upload tens or hundreds of thousands of records, the execution time tends to infinity. Consider two and a half ways to insert data into a table on linked server and compare the runtime.



create a new linked server:

to create a linked server, you must already be an ODBC data source. In my case the name of the linked server and ODBC source the same.


the
USE [master]
GO

-- put the name of the linked server into a variable
declare @ServerName nvarchar(200)
SET @ServerName=N RemotePG'

-- add the remote server
EXEC master.dbo.sp_addlinkedserver @server = @ServerName, @srvproduct=@ServerName, @provider=N MSDASQL', @datasrc=@ServerName

-- add a user from a remote server
- it is under this account will run queries on a remote machine
EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=@ServerName,@useself=N'False',@locallogin=NULL,@rmtuser=N'remote_user',@rmtpassword='password'

-- solve remote procedure call
EXEC master.dbo.sp_serveroption @server=@ServerName, @optname=N rpc', @optvalue=N'true'
EXEC master.dbo.sp_serveroption @server=@ServerName, @optname=N rpc out', @optvalue=N'true'


the

fashion room 1


Let the remote machine is a simple table:


the
CREATE TABLE RemoteTable (INT RecordID, RecordName VARCHAR(200));

Look at the time to execute a simple query:


the
-- note the start time
SELECT getdate();
GO

-- declare and initialize a variable
DECLARE @i INT;
SET @i=0;

-- insert a thousand rows directly into a remote table
WHILE @i<1000
BEGIN
INSERT INTO RemoteTable RemotePG...(RecordID, RecordName) VALUES (1,'Test string');
SET @i=@i+1;
END
GO

-- note the end time
SELECT getdate();
GO

In my case it took 2 minutes 52 seconds. About 6 records per second. Not quick. For reference: the remote machine is on the channel is about 5 Mbit/s and an average ping of 16ms.


the

fashion room 2


In the case of remote SQL server allows you to use the design view:


the
EXECUTE ('an sql query on the remote machine') AT LinkedServerName;

To do this, in the settings of the Linked Server must be enabled for remote procedure calls (RPC = remote procedure call). In this case, the query is executed directly on the remote machine.
Let's see how this will affect execution time:


the
-- note the start time
SELECT getdate();
GO

-- declare and initialize a variable
DECLARE @i INT;
SET @i=0;

-- insert a thousand rows fulfilling the request remotely
WHILE @i<1000
BEGIN
EXECUTE ('INSERT INTO RemoteTable (RecordID, RecordName) VALUES (1,"Test string");') AT RemotePG;
SET @i=@i+1;
END
GO

-- note the end time
SELECT getdate();
GO

the execution Time of 17.25 seconds, better, but try to reduce this time.


the

Method number 2.5


To do this before you execute the query, prepare the data to insert into the table.


the
-- note the start time
SELECT getdate();
GO

-- declare and initialize variables
DECLARE @sql VARCHAR(max);
DECLARE @i INT;

SET @i=0;

-- write the beginning of the script
SET @sql='INSERT INTO RemoteTable (RecordID, RecordName) VALUES ';

-- add script to insert data
WHILE @i<1000
BEGIN
SELECT @sql=@sql+'(1,"Test string"),';
SET @i=@i+1;
END

-- replace the last  character  ',' with ';'
SELECT @sql=SUBSTRING(@sql,1,LEN(@sql)-1)+';'

-- execute the query remotely
EXECUTE (@sql) AT RemotePG;
GO

-- note the end time
SELECT getdate();
GO

the Query will be a long string of the form:


the
INSERT INTO RemoteTable (RecordID, RecordName) VALUES (1,'Test string'),(1,'Test string') ... (1,'Test string');

Such a request, under the same conditions was carried out in 217 milliseconds. Which is about 800 times faster than the original version.


PS the Data to insert into a table specially simplified.

Article based on information from habrahabr.ru

Comments

Popular posts from this blog

Powershell and Cyrillic in the console (updated)

Active/Passive PostgreSQL Cluster, using Pacemaker, Corosync

Automatic deployment ElasticBeanstalk using Bitbucket Pipelines