Friday, March 23, 2012

replace missing data

I am working with a database containing time series data. In many, cases there is missing data. For example, while there might be a value for 2001-01-01T23:00:00, there is none for 2001-01-01T23:0100 (one minute later). I would like to replace the missing data with data from the previous record (if the previous record is the same date). Is that possible with T-SQL?

Sure,

UPDATE SomeTable
FROM SomeTable ST
SET SomeColumnWhereDataisMissing =
(
Select TOP (1) SomeColumn From SomeTable ST2
WHERE ST2.TimeColumn < ST.TimeColumn
)
WHERE SomeColumnWhereDataisMissing IS NULL --Or whatever means that there is no data.

HTH, Jens Suessmeyer.


http://www.sqlserver2005.de

|||

Missing means there is no record for a specific datetime value. There is no null because the record does not exist.

|||

It sounds like you're looking for a list of missing dates. You can use an auxiliary numbers table to generate a list of dates and perform an OUTER JOIN against the table in question.

The following sample uses a table with one entry for each minute in a day. To me, it seems best to loop through this table on a daily basis rather than to create a date-only table that would be specific to your problem.

-- Create numbers table. You want this as a permanent table but I'm generating on-the-fly for this example.

DECLARE @.Nums TABLE (Val INT)

;WITH Numbers(n)

AS

(

SELECT 1 AS n

UNION ALL

SELECT (n + 1) AS n

FROM Numbers

WHERE

n < 1440 -- Minutes in a day

)

INSERT INTO @.Nums (Val)

SELECT n from Numbers

OPTION(MAXRECURSION 1440)

-- Create sample data

DECLARE @.Dates TABLE (PKey INT IDENTITY PRIMARY KEY, MyDate DATETIME)

INSERT INTO @.Dates (MyDate)

SELECT '2006-08-21 13:21:00'

UNION

SELECT '2006-08-21 13:22:00'

UNION

SELECT '2006-08-21 13:23:00'

UNION

SELECT '2006-08-21 13:25:00'

UNION

SELECT '2006-08-21 13:26:00'

UNION

SELECT '2006-08-21 13:30:00'

-

-- Find missing ranges.

-

DECLARE @.MinMinute INT

DECLARE @.MaxMinute INT

DECLARE @.CheckDate DATETIME -- This is the day that we're checking

SET @.CheckDate = '2006-08-21'

-- First and last time for the day of @.CheckDate

SELECT @.MinMinute = DATEDIFF(minute, @.CheckDate, MIN(MyDate)),

@.MaxMinute = DATEDIFF(minute, @.CheckDate, MAX(MyDate))

FROM @.Dates

WHERE MyDate BETWEEN @.CheckDate AND DATEADD(Day, 1, @.CheckDate)

-- Find all missing minutes in the sequence

SELECT DATEADD(minute, n.Val, @.CheckDate) AS MissingMinute, 'Missing' AS Status

FROM @.Nums n

LEFT JOIN @.Dates d ON DATEADD(minute, n.Val, @.CheckDate) = d.MyDate

WHERE n.Val BETWEEN @.MinMinute AND @.MaxMinute

and d.MyDate IS NULL

UNION ALL

SELECT MyDate, 'Exists' AS Status

FROM @.Dates

ORDER BY MissingMinute

Thanks to the following sources for information regarding auxiliary numbers tables

http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-numbers-table.html

http://codeinet.blogspot.com/2006/06/sql-numbers-table-using-common-table.html

|||

Slight edit - the previous code to generate an auxiliary numbers table will only work on SQL Server 2005. Use the following for SQL 2000:

CREATE TABLE dbo.Numbers
(
Number INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
)
WHILE COALESCE(SCOPE_IDENTITY(), 0) <= 1440
BEGIN
INSERT dbo.Numbers DEFAULT VALUES
END

No comments:

Post a Comment