Subscribe: Joe Celko The SQL Apprentice
http://joecelkothesqlapprentice.blogspot.com/atom.xml
Added By: Feedage Forager Feedage Grade B rated
Language: English
Tags:
col date  col  column  data  date  datecol datecol  datecol  dbo  employeeid employeeid  employeeid  join dbo  join  select  sql  update 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Joe Celko The SQL Apprentice

Joe Celko The SQL Apprentice





Updated: 2017-12-10T22:12:56.392-05:00

 



A Strange Use of UDFs?

2007-12-28T13:55:40.411-05:00

SQL Apprentice Question started a new project recently and the .Net/SQL Server 2000 application I was hired to help with was developed by a large consulting firm. All the data access for the application is done via stored procedures, but most stored procedure data access is done via functions. So, you might have a SELECT inside a stored procedure that looks similar to this: SELECT e.LastName, et.TerritoryDescription, en.NADescription FROM Employees e LEFT JOIN fxEmpNat() en ON e.EmployeeID = en.EmployeeID LEFT JOIN fxEmpTer() et ON e.EmployeeID = et.EmployeeID WHERE e.LastName = 'Fuller' And the function definitions are *similar* to those created in the script listed at the end of this post (using the Northwind database). I would solve the same problem using this approach: SELECT e.LastName, t.TerritoryDescription, n.NADescription FROM dbo.Employees e LEFT JOIN dbo.EmployeeTerritories et ON e.EmployeeID = et.EmployeeID JOIN dbo.Territories t ON et.TerritoryID = t.TerritoryID LEFT JOIN dbo.EmployeeNationality en ON e.EmployeeID = en.EmployeeID JOIN dbo.Nationality n ON en.NationalityID = n.NationalityID WHERE e.LastName = 'Fuller' The original developers are long gone and no one currently working on the project knows why the function-centric approach was used. There are no security restrictions that would merit such an approach, and even if there were I would think a VIEW solution would be the more traditional approach. And there are only a couple of complicated relationships that one might want to "hide" from a less experienced developer that did not know the intricacies of the data. Anybody have any ideas when this approach would be justified? -- Addl. Table and Function Definitions CREATE TABLE Nationality ( NationalityID int PRIMARY KEY, NADescription varchar(20) NOT NULL ) go CREATE INDEX IX_Nationality_NADescription ON Nationality(NADescription) go INSERT Nationality VALUES (1,'America') INSERT Nationality VALUES (2,'Canada') INSERT Nationality VALUES (3,'Angola') go CREATE TABLE EmployeeNationality ( EmployeeID int, NationalityID int, CreateDate datetime CONSTRAINT PK_EmployeeNationality PRIMARY KEY NONCLUSTERED (EmployeeID,NationalityID,CreateDate) ) go INSERT EmployeeNationality values(1,1,'01/01/80') INSERT EmployeeNationality values(1,1,'01/01/90') INSERT EmployeeNationality values(2,1,'01/01/90') INSERT EmployeeNationality values(3,2,'01/01/90') INSERT EmployeeNationality values(4,3,'01/01/90') go CREATE FUNCTION fxEmpNat() RETURNS TABLE AS RETURN ( SELECT e.EmployeeID, n.NADescription FROM dbo.Employees e JOIN dbo.EmployeeNationality en ON e.EmployeeID = en.EmployeeID AND en.CreateDate = (SELECT MAX(en2.CreateDate) FROM EmployeeNationality en2 WHERE en.EmployeeID = en2.EmployeeID AND en.NationalityID = en2.NationalityID ) JOIN dbo.Nationality n ON en.NationalityID = n.NationalityID ) go CREATE FUNCTION fxEmpTer() RETURNS TABLE AS RETURN ( SELECT e.EmployeeID, t.TerritoryDescription FROM dbo.Employees e JOIN dbo.EmployeeTerritories et ON e.EmployeeID = et.EmployeeID JOIN dbo.Territories t ON et.TerritoryID = t.TerritoryID ) go Celko AnswersOne reason I can think of is job security. This code will never port and can be read only by dialect speakers. A function call cannot be optimized like a VIEW, so you are at risk for poor performance, as well as the maintenance problems, as time goes on. Another reason is that they are not SQL programmers and do not think in terms of declarations. They want to see the familiar function call they know from procedural languages. Original source[...]



Using WHERE clause parameter

2007-12-28T13:24:44.331-05:00

SQL Apprentice Question
When I try to use an SP with parameter that is the WHERE clause it generates
an error.

E.g,

@myWhere = varchar(200)

AS

SELECT x, y FROM skwi WHERE @myWhere

myWhere = status = 7 AND LastName = 'Smith'


The problem is that the where clause is built conditionally in the program.
Any advise and examples on how to accomplish would be appreciated.



Celko Answers

>> When I try to use an SP with parameter that is the WHERE clause it generates an error. <<


The short, dangerous kludge is to use Dynamic SQL.

The right answer is to get out that old text book on Software
Engineering and the chapters on coupling and cohesion of code
modules. Those rules still apply in SQL.


You are writing a "Britney Spears, Automobiles and Squids" module --
you have no idea what it will do at run time, so it has absolutely no
cohesion. Instead of depending on every random future user to write
proper SQL, you need to earn your salary and proper them with a well-
defined module with a meaningful name and a known parameter list.


If you want a general query tool, then use QA. Application users
should be kept away form it.





Original source



How do I group in a union?

2007-12-27T10:26:27.034-05:00

SQL Apprentice Question
I have a select statement that uses union to pull data from multiple
databases and return them in a single recordset. I want to group these
results using group by. How do I do that?


Here's what I have:


SELECT EmployeeID, ProjectID
FROM DB1.table1
UNION
SELECT EmployeeID, ProjectID
FROM DB2.table1


This works fine but I want to group all projects by EmployeeID. I tried the
following but it didn't work


SELECT EmployeeID, ProjectID
FROM DB1.table1
UNION
SELECT EmployeeID, ProjectID
FROM DB2.table1
GROUP BY EmployeeID


I'd appreciate some help here.


Celko Answers
The results of a UNION do not have column names

SELECT X.emp_id, COUNT(project_id) AS project_tot
FROM (SELECT emp_id, project_id
FROM DB1.Table1
UNION
SELECT emp_id, project_id
FROM DB2.Table1)
AS X(emp_id, project_id)
GROUP BY X.emp_id;


UNION ALL will be faster, if it is possible.




Original source



when calling UPDATE from the result set of a SELECT statement, is the order in which rows from the SELECT statement 100% geronteed?

2007-12-27T10:21:08.912-05:00

SQL Apprentice Question
when calling UPDATE from the result set of a SELECT statement, is the order
in which rows from the SELECT statement 100% geronteed?

tableA


myid
-----------
0
1
2
3
4
5


UPDATE tableB u
SET myid = i.myid
FROM tableA i
ORDER BY myid


Will this always update tableB with 5 since it is the last one? is this 100%
garonteed to follow the order of the source result set?


Celko Answers
No. There is no ordering in a table by defintion. Since this strictly
proprietary syntax it an do anything MS feels like next week. You are
just looking for the comfort of a 1960's sequential file system
instead of learning to think in RDBMS.


Original source



Which values do NOT appear within a tolerance value Options

2007-12-27T10:09:22.526-05:00

SQL Apprentice QuestionI have a table of servers that receive updates on a regular basis CREATE TABLE [dbo].[tbl_ServerUpdate]( [ServerUpdateID] [int] NOT NULL, [Server] [nvarchar](256) NOT NULL, [UpdateStamp] [datetime] NOT NULL, ) I then have tolerance values in another table which I can get like this (they are in terms of minutes): DECLARE @Interval INT SELECT @Interval = IntervalValue FROM tbl_Interval (NOLOCK) WHERE IntervalID = 1 I want to know which servers have NOT received an update within the tolerance value. For example, give me the set of Server that have not received an update in the last 5 (@Interval) minutes. Celko AnswersWhat is an update_id? What would it mean in a logical data model? Surely, you did NOT just physically number the rows in a table! This table also had no key, so I made an assumption that you want to use the (originally very long AND possibly in Chinese!) server names. But isn't the update interval logically an attribute of each server? Shouldn't it be in the Servers table? I cannot see an interval floating around as an entity in itself. CREATE TABLE Servers (server_name VARCHAR(25) NOT NULL PRIMARY KEY, update_stamp DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL, update_interval INTEGER NOT NULL CHECK(update_interval > 0), etc. ); If you need this a lot, put it in a VIEW. CREATE VIEW ExpiredServers (server_name) AS SELECT server_name FROM Servers WHERE update_stamp < DATEADD(MI, -update_interval, CURRENT_TIMESTAMP); SQL Apprentice QuestionIn the real world, everything is not a beautifully static picture of relational data. I didn't include the key because I don't think its relevant to the solution. But since you brought it up: is it better to have a natural key on a varchar or an identity key as an integer that doesn't really have any intrinsic business meaning ? In theory, it should be the varchar, but in practice its more performant to index an int rather than a varchar(25). Is that not true? The interval is not an attribute of the server, but rather an attribute of the update type. In my case, there are seven different minute intervals. So, in the end, I need to know - for each update type - which server did not receive it within the last [interval] (in terms of minutes). The interval is configurable by the operations group - sometimes they may want to be alerted when a server didn't receive an update in the last 3 minutes, sometimes in the last 10 minutes. They can change this real-time in the database. I am not a TSQL guru - thats why I use this newsgroup from time to time. I also have every one of your books on my shelf. However, the corporate world is NOT academic and there is a tradeoff between having an understandable data model that is easy to work with and having a data model that can serve as a model in CS405 (or whatever). Sometimes we have to denormalize or flatten data to be able to run reports in an acceptable time; sometimes we have to add bit columns that could easily be derived from another to improve query performance. The balance you give is valuable, so I hope you continue to berate us when our designs violate, but I'll bet - despite all the companies you have worked with - you can't name one that had a data model that gave you a boner. Celko Answers>> is it better to have a natural key on a VARCHAR(n) or an IDENTITY key [SIC] as an integer that doesn't really have any intrinsic business meaning? > In theory, it should be the VARCHAR, but in practice its more performant to index an INTEGER rather than a VARCHAR(25). Is that not true? > The interval is not an attribute of the server, but rather an attribute of the update type. In my case, there are seven different minute intervals. So, in the end, I need to know - for each update type - which server did not receive it within the last [interval] (in terms of minutes). The interval is configurable by the operations group - sometimes they may want to be alerted when a server didn't rec[...]



QUOTED_IDENTIFIER & ANSI_NULLS Options

2007-12-27T10:04:38.305-05:00

SQL Apprentice Question
does anyone know how to keep QA from adding the lines setting these
two options on and off along with blank lines at the beginning and end
of every object you edit? i have searched quite a bit on this but
haven't been able to come up with anything.


Celko Answers
>> is there a reason I wouldn't want to do this? <<


Conformance to ANSI/ISO Standards should be a goal in any shop, so you
would not turn off options that bring you to that goal. Why would you
want to write your own database language?



Original source



update table dateCol3 to the later of dateCol1 or dateCol2 Options

2007-12-27T10:02:02.080-05:00

SQL Apprentice Question
CREATE TABLE #tmp1(rowID int identity(1,1), dateCol1 datetime, dateCol2
datetime, dateCol3 datetime)

INSERT INTO #tmp1(dateCol1, dateCol2)
SELECT '1/1/05', '2/1/05'
UNION ALL SELECT '3/1/05', '3/7/05'
UNION ALL SELECT '4/1/05', '3/20/05'
UNION ALL SELECT '5/1/05', '5/13/05'


UPDATE #tmp1 SET dateCol3 = ?


If dateCol1 > dateCol2 then update dateCol3 to dateCol1
else
update #tmp1 Set dateCol3 to dateCol2


I am sure this is not the most normalized example, but what would be the
tSql to update my table with the latest date in a row without having to do it
in 2 queries?


I realize I could say

update #tmp1 set datecol3 = datecol1 where datecol1 > datecol2

and then


update #tmp1 set datecol3 = datecol2 where datecol2 > datecol1


Is there a way to do this in one query statement? What would that look like?


Thanks,




Celko Answers
If you followed ISO-11179 data element naming rules, avoided
needlessly proprietary code, and put your date into the proper format
for Standard SQL, would your posting look like this?

CREATE TABLE Foobar
(foobar_id INTEGER NOT NULL PRIMARY KEY,
col1_date DATETIME NOT NULL,
col2_date DATETIME NOT NULL,
col3_date DATETIME);


INSERT INTO Foobar(foobar_id, col1_date, col2_date)
VALUES (42, '2005-01-01', '2005-02-01');
Etc.


IF col1_date > col2_date
THEN update col3_date to col1_date
ELSE update col3_date to col2_date
<< col3_date =" CASE"> col2_date
THEN col1_date
ELSE col2_date END;


You have a few choices here.
1) You can make col3_date into a computed column in proprietary syntax
-- look it up.
2) You can use the CASE expression to update col3_date; "UPDATE Foobar
SET col3_date = CASE.. END;" Of course this still means that you have
materialized computed data in violate of good design.
3) You can put it in a VIEW, and drop col3_date from the base table.
The code will always be right and portable.


Original source



How to retrieve all records from 30th record to 50th record of a table?

2007-10-04T13:20:23.690-04:00

SQL Apprentice Question
I have a table named myTable. It is sorted by some column and there is
no primery key or unique column.I want to retrieve all records from
30th record to 50th record. How can I do this by a sql statement?



Celko Answers
>> I have a table named myTable. It is sorted by some column and there is no primary key or unique column. <<


BY DEFINITION:
1) Tables have at least one key
2) Tables have no ordering


>> I want to retrieve all records from 30th record to 50th record. <<


1) Rows are not anything like records
2) Tables have no ordering

Please read a book --ANY book -- are RDBMS. You have gotten every
basic concept wrong.




Original Source



No idea where to start with Query

2007-10-04T13:07:53.321-04:00

SQL Apprentice Question
I need some help with a query using SQL 2005 and I really don't know
where to start.
I have 3 columns: Date(dateTime), Name(varchar), Value(decimal)
I need to retireive 3 values, one query or all, or one query for
each. It makes no difference to me.

1)For each Name I need to get the difference of current day's value
and previous day's value
(TodayValue- PreviousDayValue)


2)For each Name I need to get the sum of difference of yesterday's
value and today's value for the current month
So something like (Day1Value - Day2Value) + (Day2Value - Day3Value)
+...+ (Day29Value - Day30Value)


3)Same as #2 but just for the current year.


Any help would be greatly appreciated.



Celko Answers
The best place to start is with DDL, so that people do not have to
guess what the keys, constraints, Declarative Referential Integrity,
data types, etc. in your schema are. If you know how, follow ISO-11179
data element naming conventions and formatting rules. Sample data is
also a good idea, along with clear specifications. It is very hard to
debug code when you do not let us see it. What you did post was a pile
of vague names and/or reserved words. Let's make it real and sensible:

CREATE TABLE DogSchedule
(walk_date DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
dog_name CHAR(15) NOT NULL,
walk_kilometers DECIMAL (5,2) NOT NULL,
PRIMARY KEY (walk_date, dog_name));


1)For each dog_name I need to get the difference of current day's
value and previous day's value <<


You did not tell us if you are sure that all days are represented in
the table? Only one walk per day? What constraint enforces that
business rule? My point is that SQL is an integrated whole -- you
cannot separate DDL and DML; they must work together!


Next, you are going to be using the delta from day to day, so let's
put that in a VIEW.


CREATE VIEW DeltaDogWalks (walk_date, dog_name, delta_kilometers)
AS
SELECT D2.walk_date, D2.dog_name,
(D2.walk_kilometers - D1.walk_kilometers)
FROM DogSchedule AS D1, DogSchedule AS D2
WHERE D1.dog_name = D2.dog_name
AND D2.walk_date = DATEADD(DD, 1, D1.walk_date);


To sum the deltas, set up a report periods table that you can adjust:


CREATE TABLE ReportPeriods
(period_name CHAR(15) NOT NULL PRIMARY KEY,
start_date DATETIME NOT NULL,
end_date DATETIME NOT NULL,
CHECK(start_date < end_date));


Now use a "walk_date BETWEEN start_date AND end_date" to group on the
names of the reporting periods.


Original source



A Strange Query Options

2007-08-09T09:09:47.475-04:00

SQL Apprentice Question
Hi guys,

I'm trying to work this query out in my head see if you can help me with it.
I'll give you a little background information so you have the big picture.
This is a real estate database, every home has an APN number it's a unique
number that always means the same property. When a company wants to list a
home on the MLS to sell the home it gets an MLS number which is only unique
to the instance the entity wants to sell that property, so in a span of a
few years the same APN could go for sale several times and have several MLS
Numbers, but maintain the same APN. This database is setup so each row
is an MLS Number.


I need to make a list of all entries that qualify:


1. Find APN's with multiple entries


2. Narrow it down to only solds


3. Only solds that have been modified in the last 3 years


and it needs to generate a list of just DISTINCT APN's that qualify.


Thanks,




Celko Answers
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. We don't even know your data types, column or table
names much less your codes! Sample data is also a good idea, along
with clear specifications. It is very hard to debug code when you do
not let us see it.


>> This is a real estate database, every home has an APN number it's a unique number that always means the same property. <<


CREATE TABLE Properties
(apn INTEGER NOT NULL PRIMARY KEY,
etc);


>> When a company wants to list a home on the MLS to sell the home it gets an MLS number which is only unique to the instance the entity wants to sell that property, so in a span of a few years the same APN could go for sale several times and have several MLS Numbers, but maintain the same APN. <<


CREATE TABLE Listings
(apn INTEGER NOT NULL
REFERENCES Properties(apn)
ON DELETE CASCADE
ON UPDATE CASCADE,
start_date DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
end_date DATETIME, -- null means current
CHECK (start_date < end_date),
PRIMARY KEY (apn, start_date),
mls INTEGER NOT NULL,
listing_status CHAR(10) DEFAULT 'listed' NOT NULL
CHECK (listing_status IN ('listed', 'sold', 'reduced', etc)),
asking_price DECIMAL(12,2) NOT NULL,
etc);

You also need a Calendar table, since this is a history schema. Google
that.



>> I need to make a list of all entries that qualify:


1. Find APN's with multiple entries
2. Narrow it down to only solds
3. Only solds that have been modified in the last 3 years and it
needs to generate a list of just DISTINCT APN's that qualify. <<

Did you notice that you gave a step by step **procedure**, rather than
a **declarative statement**? You do not thinking SQL yet. This is
steps 1 and 2 as a query:


SELECT apn, COUNT(*) AS listed_cnt
FROM Listings
WHERE listing_status = 'sold'
GROUP BY apn
HAVING COUNT(*) > 1;


Step 3 is impossible with what you posted. What does "modified" mean
and where is it in the tables? I assume with the Properties, which
means I need a history schema on it, too. My mental image is that
"modified" is going to involve the bedroom counts, kitchen appliances,
roofing, etc. and NOT one column with a simple code in it.



Boolean computed column

2007-07-13T12:46:17.967-04:00

SQL Apprentice Question
I have a column in my database I want to be a computed flag based on an
active date.. the Flag column is called active..

basicall I want if the ActiveDate <= getdate() (as in today is after the
item was active) then the active flag is a bit 1, else its a bit 0... I
tried this as a flat out <= statement, got an error by SQL Managment
studio... so how would this be done? thanks!


Celko Answers
>> I have a column in my database I want to be a computed flag based on an active date.. the Flag column is called active.. basically I want if the ActiveDate <= getdate() (as in today is after the item was active) then the active flag is a bit 1, else its a bit 0...
I tried this as a flat out <= statement, got an error by SQL
Management studio... so how would this be done? <<

This would be done by forgetting all the basic rules for programming
in SQL :)


1) We do not use the proprietary BIT data type that does not exist in
Standard SQL


2) Flags are for assembly language programming and have no place in
RDBMS; that is why SQL has no BOOLEAN data types


3) We do not store computed columns in a table. Write a VIEW and
learn to use CURRENT_TIMESTAMP instead of the old proprietary
getdate() function call to get today's active data.


Original Source



SQL Views - embedded view work-a-rounds

2007-07-13T12:40:48.146-04:00

SQL Apprentice Question
I've been asked to re-write a sql view. The view itself contains
several calls to other views (embedded). Is there a way to get around
using embedded views. I've written the same query up using temp.
tables but obviously temp. tables can't be used in views?


Is there any special things I should be looking for?


Celko Answers
>> I've been asked to re-write a sql view. The view itself contains several calls[sic: invocations?] to other views (embedded). Is there a way to get around using embedded views. I've written the same query up using temp tables but obviously temp. tables can't be used in views? <<


Nesting VIEWs is a good progrmming practice when it is done right. It
can assure that nobody invents their own definition of something, like
how we compute a tricky formula that can send us all to prison.

Temp tables are a baaaaad idea. The SQL Server model is in violation
of ANSI/ISO and most everyone else's model of them. They are usually
a way to fake a "scratch tape" in a procedural solution, where each
step passes the tape to the next step in a process; SQL is declarative
and we want to write that way.


But the real point is that you never said *why* you want to re-write
this unnamed VIEW. Damn to give advise about anything without any
kind of spec at all ..



real world advise on temp tables please

2007-07-13T12:39:23.675-04:00

SQL Apprentice Question
a system with around 1500 users. a "bad" stored procedure that writes 3
#temp_tables. I notice system slow downs when more than 5 - 10 people run the
SP at the same time. Is this to be expected. what are the real world
expectancies of the temdb?

Celko Answers
>> a "bad" stored procedure that writes 3 #temp_tables. I notice system slow downs when more than 5 - 10 people run the SP at the same time. Is this to be expected. what are the real world expectancies of the temdb? <<


Yes, it is expected. The best solution is to re-write the procedure
to use derived tables and subqueries.

Besides being proprietary in both syntax and implementation the # temp
tables are usually a sign of bad programming. They are used as
"scratch tapes" in a routine structured as if it were a 1950's mag
tape batch; each step of a sequential process is written to a scratch
tape (aka # temp table) to be passed to the following step.


Remember coupling and cohesion from that freshman S.E. course?
Temporal coupling?



I have problem with union

2007-06-23T10:50:49.861-04:00

SQL Apprentice Question
use Sql Server 2000. I've structure like it:

Create table T1(
RID int primary key,
A varchar(8),
...
...
)


Create table T2(
RID int primary key,
A varchar(8),
...
...
)


To make easy to view, so I make a view like it


Create view VUni as
Select RID, A From T1
union


Select RID, A From T2


Now, when I execute


Select * From VUNI Where RID between 15 and 25


So the process needs a lot of times alias Very Slow. How to make it best
performance?


Oh yeach, it's other question, I want to select data from table which its
name is name dimanically (variable)


IF @YEAR=2000 Select * FROM T2000
ELSE
IF @YEAR=2001 Select * FROM T2001
ELSE
IF @YEAR=2002 Select * FROM T2002
ELSE
...
...


To make it compact, So how to the code?


Next question, How to avoid error? I execute like it


Select a/b from T2


seldom the statement is error, because value of filed "a" is 0. I want if
a/b is error calculation, so it give value 0 automatically. Can I do it?
Without I ve to make a function? Or any function (built in) to handle it?


Celko Answers
>> I want to select data from table whose name is dynamic (variable) <<


IF @year=2000 SELECT * FROM T2000
ELSE
IF @year=2001 SELECT * FROM T2001
ELSE
IF @year=2002 SELECT * FROM T2002
ELSE
...
<<

This design flaw is so bad it has a name, like a disease: "Attribute
Splitting". Instead of separate tables, you need one table with a
"_year" column in it. You then use VIEWs or queries.


You probably also split an attribute in the first part of this
posting, and are trying to fix it with a UNION [ALL] construct.



Original Source



Working days calendar in T-SQL

2007-06-23T10:45:45.219-04:00

SQL Apprentice Question
My company working days is from Monday to Friday.
I would like to generate a result set which show all
consecutive working days (Monday to Friday excluding the
weekends) that is, for June 2007, it show me 1st, 4th, 5th,
6th, 8th, 11th etc... until 29th (last working day of June). Is
it possible to do this with T-SQL?My company working days is from Monday to Friday.
I would like to generate a result set which show all
consecutive working days (Monday to Friday excluding the
weekends) that is, for June 2007, it show me 1st, 4th, 5th,
6th, 8th, 11th etc... until 29th (last working day of June). Is
it possible to do this with T-SQL?

Celko Answers
Build a calendar table with one column for the calendar data and other
columns to show whatever your business needs in the way of temporal
information. Do not try to calculate holidays in SQL -- Easter alone
requires too much math.

CREATE TABLE Calendar
(cal_date DATE NOT NULL PRIMARY KEY,
fiscal_year INTEGER NOT NULL,
fiscal_month INTEGER NOT NULL,
week_in_year INTEGER NOT NULL, -- SQL server is not ISO standard
holiday_type INTEGER NOT NULL
CHECK(holiday_type IN ( ..), --
day_in_year INTEGER NOT NULL,
julian_business_day INTEGER NOT NULL,
...);


The Julian business day is a good trick. Number the days from
whenever your calendar starts and repeat a number for a weekend or
company holiday.


A calendar table for US Secular holidays can be built from the data at
this website, so you will get the three-day weekends:

http://www.smart.net/~mmontes/ushols.html

Time zones with fractional hour displacements http://www.timeanddate.com/worldclock/city.html?n=246 http://www.timeanddate.com/worldclock/city.html?n=176 http://www.timeanddate.com/worldclock/city.html?n=5 http://www.timeanddate.com/worldclock/city.html?n=54

But the STRANGE ones are:
http://www.timeanddate.com/worldclock/city.html?n=63 http://www.timeanddate.com/worldclock/city.html?n=5


Original Source



group by datetime

2007-06-16T09:10:47.753-04:00

SQL Apprentice Question
I am baffled by this query and can use a little help pls!


query count for each Monday of the week in the last few months between 9 pm
and 1 am. I know I can use datepart() but can't figure out how to query
between 9pm and 1am. Thanks.



Celko Answers
>> I am baffled by this query and can use a little help pls! <<


Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it. Why are you so rude?


>> query count for each Monday of the week in the last few month between 9 pm [sic: 11:00:00 Hrs] and 1 am [sic: 01:00:00 the next day??]. I know I can use DATEPART() but can't figure out how to query between 9 pm [sic] and 1 am [sic]. <<


You really have no idea how time works!! What the hell is that AM and
PM crap? You never heard of ISO-8601 Standards and UTC???

What you do is set up a table of temporal ranges with upper and lower
limits with TIMESTAMP limits and join to it. Hey, you spit on us by
not posting DDL, why should we post DDL and data for you?


Your unit of measurement is wrong and you are getting screwed up. But
your invisible DDL tells us nothing!!

Original Source



Updating based on values of two records

2007-06-16T09:07:24.311-04:00

SQL Apprentice Question
I'm sure this is obvious, but for some reason, I can't get my hands
around the solution. I have a table like this:

Account CalYear CalMonth Amount


Comm 2006 12 80
Comm 2007 01 100
Comm 2007 02 125
Incr 2007 01 21
Incr 2007 02 28


I want to update the incr account so it shows the correct difference
between the previous month, the change basically. Corrected it would
look like this:


Account CalYear CalMonth Amount


Comm 2006 12 80
Comm 2007 01 100
Comm 2007 02 125
Incr 2007 01 20
Incr 2007 02 25



Celko Answers

>> I have a table like this: <<


Why do you spit on us and want us to do your homework/job for you?

Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.


Why did you invent your own two column data types that avoid the
temporal data types? Have you ever read the ISO-8601 rules for
temporal data?


CREATE TABLE StinkingFoobar -- until we get a real name
(acct_type CHAR(4) NOT NULL,
start_date DATETIME NOT NULL,
end_date DATETIME NOT NULL,
CHECK (start_date < end_date),
squid_amt INTEGER NOT NULL, -- read ISO-11179 before you program
again!!
PRIMARY KEY (acct_type, start_date)
);


Did I guess right?? Gee, wish you had helped!!



>> I want to update the incr account so it shows the correct difference between the previous month <<


No. That is soooooo non-RDBMS!! This is a computed value and needs
to be done in a VIEW using the OLAP functions. RTFM.


Original Source



newbie question on update

2007-06-16T09:01:59.277-04:00

SQL Apprentice Question
I have a clients table which consists of, among other things, a column
named source.

I have a second table, sources, with two columns - client_id and source.


The are fewer rows in sources than in clients. I am trying to update
clients.source with sources.source without affecting the clients that
do not have a corresponding row in sources.


My first try was update clients set clients.source = sources.source
where clients.client_id = sources.client_id


But I ended up with nulls in the clients.source column for rows in
which there was no row in sources.


I am guessing that I need to involve a join somehow, but I am having a
very hard time not thinking procedurally, and just don't "get" it.


I've been looking at Join examples, but anything non-trivial just
baffles me. Would someone be so kind as to explain how to do what I'm
trying to do?


Thanks.



Celko Answers
>> I am finding it difficult to think in terms of sets and 'simultaneous' actions. I keep wanting to solve these things procedurally. <<


It takes about a year to have the revelation :) My next book will
deal with thinking in sets. Hey, remember how weird recursion was?


Original Source



Question regarding multiple data sources and coalesce

2007-06-14T08:56:37.339-04:00

SQL Apprentice Question
I inherited a db that is pulling data from 5 different sources and
storing them in 5 different tables. The tables all have the same
basic data, minor differences based on source. The company currently
creates a "summary" table by joining the 5 source tables and using
coalesce( ) to display data from preferred data sources if it is
available from there.

I've been asked to present an alternate schema. My first thought was
to normalize the 5 tables into 1 with a source andI could just include
the source in the ORDER BY of queries to get the preferred source.
But then I thought since each source could have 30Million rows maybe I
would loose performance over the existing schema. So, could somebody
point me towards a reference source that may cover this type of
topic? Of course, any opinions (on this issue) would be appreciated
as well.


Celko Answers

>> I've been asked to present an alternate schema. My first thought was to normalize the 5 tables into 1 with a source and I could just include the source in the ORDER BY of queries to get the preferred source. <<


Something ike this? Cram all the data that you have into staging
table

CREATE TABLE Foobar
(foo_key INTEGER NOT NULL,
source_nbr INTEGER NOT NULL
CHECK (source_nbr BETWEEN 1 AND 5),
PRIMARY KEY (foo_key, source_nbr),
etc.);


Assuming sources are ranks from 1 to 5, pull out the most trusted for
each key


SELECT foo_key, etc.
FROM Foobar AS F1
WHERE source_nbr =
(SELECT MIN(source_nbr)
FROM Foobar AS F2
WHERE F1.foo_key = F2.foo_key);


This is the best we can do without more specs.




Original Source



Need urgent help on a QUERY

2007-06-14T08:54:47.307-04:00

SQL Apprentice Question
Given:

SELECT STATE_ID
FROM GROUP_STATE


STATE_ID
----------------
10
15
16
17
18
19
20


SELECT GROUP_ID
FROM GROUP
WHERE GROUP_ID NOT IN (SELECT DISTINCT GROUP_ID FROM GROUP_STATE)


GROUP_ID
---------------
1
2
16
5


I need a query that would look like:
GROUP_ID STATE_ID
--------------- ----------------
1 10
1 15
1 16
1 17
1 18
1 19
1 20


2 10
2 15
2 16
2 17
2 18
2 19
2 20


ETC....



Celko Answers
SELECT GS.state_id, G.group_id
FROM ( SELECT state_id FROM GroupStates AS GS
CROSS JOIN
SELECT group_id FROM Groups
WHERE group_id
NOT IN (SELECT DISTINCT group_id FROM GroupStates)
AS G);

Do not use reserved words for data element names.


Original Source



changing order of records

2007-06-14T08:52:28.303-04:00

SQL Apprentice Question
hi, i've got a problem. i need to change order of records in my table . here
is the table

id forumname
----------------
1 test1
2 test2
3 test3
4 test4


i need to write code that can move nay record up or down without changing
id of forumname. i thought about another column called order


id forumname order
---------------------
1 test1 1
2 test2 2
3 test3 3
4 test4 4


but couldnt figureout how to change that order column. any one got an idea?



Celko Answers
Here is an old "cut & paste" for this problem: Given a motorpool with
numbered parking spaces, you want to move the automobiles around.
CREATE TABLE Motorpool
(parking_space INTEGER NOT NULL PRIMARY KEY
CHECK (parking_space > 0),
vin CHAR(17) NOT NULL);

Re-arrange the display order based on the parking_space column:


CREATE PROCEDURE SwapVehicles (@old_parking_space INTEGER,
@new_parking_space INTEGER)
AS
UPDATE Motorpool
SET parking_space
= CASE parking_space
WHEN @old_parking_space
THEN @new_parking_space
ELSE parking_space + SIGN(@old_parking_space - @new_pos)
END
WHERE parking_space BETWEEN @old_parking_space AND
@new_parking_space
OR parking_space BETWEEN @new_parking_space AND
@old_parking_space;


When you want to drop a few rows, remember to close the gaps with
this:


CREATE PROCEDURE CloseMotorpoolGaps()
AS
UPDATE Motorpool
SET parking_space
= (SELECT COUNT (M1.parking_space)
FROM Motorpool AS M1
WHERE M1.parking_space <= Motorpool.parking_space);




Original Source



self-referencing constraint with identity column?

2007-06-13T07:03:15.234-04:00

SQL Apprentice Question
have a table that requires to have a self-referencing constraint to enforce
a parent-child type of relationship:

ID (identity column)
ParentID (INT column that references the ID column).


To determine if I'm at the top-level of of the relationship, I was going to
leave the ParentID null, otherwise it must be a value of another ID column in
the table which indicates its child of another.


However, can you have it so you don't use NULL to indicate this and instead
set ID and ParentID equal to the same value and still use the IDENTITY column?


So on a INSERT (i.e the IDENTITY would generate 25, so I'd like to set the
ParentID to 25 as well). This would was causing FK violation and I was
wondering if there is a way around it using an insert trigger?



Celko Answers

>> I have a table that requires to have a self-referencing constraint to enforce a parent-child type of relationship: <<


No it is not required at all. Look up the Nested Sets model for trees
and hierarchies. You will find the constraitns are much easier and the
code will run1-2 orders of magnitude faster than recursive path
traversals.

Also, never use IDENTITY in an RDBMS; find a valid relational key
instead.



Original Source



Database Design Question

2007-06-12T06:15:44.396-04:00

SQL Apprentice Question
I'm fairly new to database design, having only really worked with
existing tables etc in the past.


Simple question this really...........


In a users table, is it wise to have a ManagerID column that is
effectively pointing to another user in the same table, with the
theory being that if that person is top dog, they will just have a
null entry in ManagerID.


I will check for circular references before entering the data.


Is there a better way of storing the data, or is this the only way?




Celko Answers
>> Is there a better way of storing the data, or is this the only way? <<


Get a copy of TREES & HIERARCHIES IN SQL for several different ways to
do an organizational chart. What you have is thd adjacency list
model; look up the nested sets model.



Original Source



Using Soundex to identify possible duplicates

2007-06-12T06:11:03.273-04:00

SQL Apprentice Question
In a table that has person forename and surname, is it possible to use
soundex to identify for every row in the table what similar matches
there are in the same table?



Celko Answers

>> is it possible to use Soundex to identify for every row in the table what similar matches there are in the same table? <<


Don't do it. Name handling is an ugly problem and if you have to do
this on a regular basis get a package designed for this kind of work.
Some companies are Group 1 Software, SSA (used to have a great booklet
on this topic), Melissa Data Corporation and Promark Software Inc.

Their software handles mailing lists and you can get a review copy
from Melissa Data. They do not choke on names like "John Paul van der
Poon" and worse.


Original Source



giving one union preference

2007-06-12T06:08:50.327-04:00

SQL Apprentice Question
Let say I have a query with the following structure:

Select name, mdate, kdate
from table1 a
inner join (
select name, mdate
from table2 b
where a.id = b.id
and mdate >= kdate
union
select name, mdate
from table3 c
where a.id = c.id
and mdate >= kdate
) mindate


When the select mdate from the inner join union (mindate) returns the
same mdate for table b and c, I want to give preference to the c.mdate
and so c.name.


Is there a way to do this?




Celko Answers
>> When the select mdate from the inner join union (mindate) returns the same mdate for table b and c, I want to give preference to the c.mdate and so c.name. <<


This question makes no sense. The poorly named derived table,
Mindate, acts as if it is materialized and you no longer have access
to tables B and C. It is like asking to see the eggs after the cake
has been baked.

Original Source