Discussion:
Database Eye for the Application Guy
Jason Bennett
2007-05-20 22:15:43 UTC
Permalink
So I guess a little background is required first. While I've worked with
databases for years now, it's mostly been in the context of applications
and O/R mapping - very normalized design, not much query crafting, etc.
I have a decent grasp of databases and SQL, but I don't know much of the
arcana.

Enter my new job: very transactionally intense, with a mix of OLTP and
reporting requirements. This means lots of PL/SQL, denormalization, and
query crafting. What I need is some books that can take me from where I
am to where I want to be. What queries are safe (few joins), which will
kill you (date ranges, lots of joins), and how to strike a good balance
when denormalizing without it coming back to hose you.

Aside from Scott's (all hail) Agile database books, what others would
people recommend?

jason
--
Jason Bennett, ***@acm.org
E pur si muove!
Get Firefox! - http://getfirefox.com
Clifford Heath
2007-05-20 22:56:17 UTC
Permalink
Post by Jason Bennett
Aside from Scott's (all hail) Agile database books, what others would
people recommend?
Without a question, "Transaction Processing Concepts and Techniques"
by Jim Gray and Andreas Reuter. It was the book that spilt the beans
on how relational DBs work internally, and how they achieve
transactional
behaviour in the face of faults. That covers all the nuts-and-bolts
under-
standing you'll need for dealing with any real database.

Next, get "Information Modeling and Relational Databases" by Terry
Halpin, It covers three forms of entity-relational modeling, data
modeling
using UML, and analyses the strengths and weaknesses of both by
comparison with Object Role Modeling, a conceptual modeling technique
that's more powerful than either. This one book will teach you all
you'll
ever need to know about how to *think* about data. Every DBA should
read it.

Both books are well-written and easy to read with plenty of concrete
realistic examples, but beware - to properly absorb the material you'll
want to spend up to 2 months slowly absorbing the contents of each.

The only thing you might need that neither covers is *how* a query
processor chooses a query plan. You'll know all the options they can
choose from, and you'll know how to evaluate the cost of each, but
you won't know how the engine evaluates the cost of each. I don't
think that matters, as long as you understand both the domain and
range of the optimiser's operations.

Clifford Heath.
Curt Sampson
2007-05-21 01:35:48 UTC
Permalink
Post by Jason Bennett
Enter my new job: very transactionally intense, with a mix of OLTP and
reporting requirements. This means lots of PL/SQL, denormalization, and
query crafting. What I need is some books that can take me from where I
am to where I want to be. What queries are safe (few joins), which will
kill you (date ranges, lots of joins), and how to strike a good balance
when denormalizing without it coming back to hose you.
"Fewer joins is better" is a rule of thumb that can bite you as often as
it can help you. The big performance killer on large databases (where
the size is more than a couple of times that of the machine's RAM) are
random reads from disk. Lots of these are about the worst thing that can
ever happen to you.

Large sequential reads can hurt, too, but in most situations, I'll take
a couple of 100-block sequential reads over 20 1-block random reads,
even though the former is reading ten times as much data.

The joins only hurt when they're hitting disk; getting rid of relatively
small lookup tables to avoid joins can actually slow down queries
overall (due to reducing the amount of effective cache) just to save you
a little bit of CPU.

Anyway, a great book to learn all about this is Dan Tow's _SQL Tuning_
(O'Reilly). You should come out of this one with a pretty good idea of
how database optimizers work and the tools to do real optimization,
rather than the phony "just denormalize and avoid joins" approach that
that so many alleged DBAs take. (This is an area where programmers have
a big advantage; they're generally a lot better at thinking about how
a DBMS might do query optimization.)

Also, a general familiarity with the low-level details of how disk IO
works is fairly essential; i.e., you want a fairly good idea of where
the disk heads will be moving and why in various configurations and
with various queries. The book above will help with that, if you don't
already know it.

Beyond that, a good pile of relational theory is a big help. I'm a
big fan of C.J. Date's books. _Introduction to Database Systems_ is
great if you can find the time to get through it; failing that, _The
Database Relational Model_ is a slim volume that will give you the
essentials in a slightly more beginner-friendly form. Moving on from
there, _Foundation for Future Database Systems: The Third Manifesto_
is fantastic (you may want to skip straight to this if you're already
reasonably familiar with relational theory), and Date, Darwen and
Lorentzos' _Temporal Data and the Relational Model_ is quite useful.

A warning: if you actually learn and understand any substantial portion
of the books above, your life will start to feel rather frustrating, as
you'll be in a situation akin to that of an experienced and productive
Lisp or Smalltalk programer forced to program in COBOL or BASIC. Yes,
the products out there really are that bad. I have been for several
years now sorely tempted to write a good RDBMS. I'm not sure such a
thing would ever be a commercial success (at least not in the next
decade or two), so it's going to have to wait until I'm rich through
other means....

cjs
--
Curt Sampson +81 90 7737 2974
<***@cynic.net> http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it. --George Bernard Shaw
Garris, Nicole
2007-05-21 15:15:43 UTC
Permalink
Curt Sampson wrote:

Anyway, a great book to learn all about this is Dan Tow's _SQL Tuning_
(O'Reilly). You should come out of this one with a pretty good idea of
how database optimizers work and the tools to do real optimization,
rather than the phony "just denormalize and avoid joins" approach that
that so many alleged DBAs take.

Thank you Curt! To understand database performance you must have at
least a basic understanding of how the optimizer works. You must also
understand indexing-I'm told that often the proper indexes are the key
to good performance.

Does a general understanding suffice in most cases? Or must one learn
the ins and outs of the optimizer for the particular DBMS that one is
using?



[Non-text portions of this message have been removed]
Pramod Sadalage
2007-05-22 03:03:23 UTC
Permalink
Hi All,

This is a great thread. I would love to capture all the book recommendations
and list them in the Links section or on my webpage.
So when you make a book recommendation a link to amazon would be great. or
at least the ISBN or Full Name of the book and author.

I have also found "The Art of SQL" by Stephane Faroult to be good. can be
found here on amazon<http://www.amazon.com/gp/product/0596008945?ie=UTF8&tag=agiledba-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0596008945>

Regards
Pramod
<http://www.amazon.com/exec/obidos/search-handle-url/105-8142951-5782815?%5Fencoding=UTF8&search-type=ss&index=books&field-author=Stephane%20Faroult>
Post by Curt Sampson
Anyway, a great book to learn all about this is Dan Tow's _SQL Tuning_
(O'Reilly). You should come out of this one with a pretty good idea of
how database optimizers work and the tools to do real optimization,
rather than the phony "just denormalize and avoid joins" approach that
that so many alleged DBAs take.
Thank you Curt! To understand database performance you must have at
least a basic understanding of how the optimizer works. You must also
understand indexing-I'm told that often the proper indexes are the key
to good performance.
Does a general understanding suffice in most cases? Or must one learn
the ins and outs of the optimizer for the particular DBMS that one is
using?
[Non-text portions of this message have been removed]
[Non-text portions of this message have been removed]
Curt Sampson
2007-05-21 16:18:51 UTC
Permalink
Post by Garris, Nicole
Thank you Curt! To understand database performance you must have at
least a basic understanding of how the optimizer works. You must also
understand indexing-I'm told that often the proper indexes are the key
to good performance.
Sometimes. As often it's just better queries.
Post by Garris, Nicole
Does a general understanding suffice in most cases? Or must one learn
the ins and outs of the optimizer for the particular DBMS that one is
using?
A general understanding is just fine for most cases; many
performance-related aspects of RDBMSes are pretty much the same across
all the common implementations.

cjs
--
Curt Sampson <***@cynic.net> +81 90 7737 2974
http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it. --George Bernard Shaw
Adrian Walker
2007-05-21 18:08:27 UTC
Permalink
Curt --

You wrote...

I have been for several years now sorely tempted to write a good RDBMS. I'm
not sure such a
thing would ever be a commercial success (at least not in the next
decade or two), so it's going to have to wait until I'm rich through
other means....

There's anecdotal evidence pointing to the notion that the language for a
next generation RDBMS should not be SQL.

If this is of interest, we could discuss off list.

Cheers,
-- Adrian

Internet Business Logic (R)
A Wiki for Executable Open Vocabulary English
Online at www.reengineeringllc.com Shared use is free

Adrian Walker
Reengineering
Post by Curt Sampson
Post by Jason Bennett
Enter my new job: very transactionally intense, with a mix of OLTP and
reporting requirements. This means lots of PL/SQL, denormalization, and
query crafting. What I need is some books that can take me from where I
am to where I want to be. What queries are safe (few joins), which will
kill you (date ranges, lots of joins), and how to strike a good balance
when denormalizing without it coming back to hose you.
"Fewer joins is better" is a rule of thumb that can bite you as often as
it can help you. The big performance killer on large databases (where
the size is more than a couple of times that of the machine's RAM) are
random reads from disk. Lots of these are about the worst thing that can
ever happen to you.
Large sequential reads can hurt, too, but in most situations, I'll take
a couple of 100-block sequential reads over 20 1-block random reads,
even though the former is reading ten times as much data.
The joins only hurt when they're hitting disk; getting rid of relatively
small lookup tables to avoid joins can actually slow down queries
overall (due to reducing the amount of effective cache) just to save you
a little bit of CPU.
Anyway, a great book to learn all about this is Dan Tow's _SQL Tuning_
(O'Reilly). You should come out of this one with a pretty good idea of
how database optimizers work and the tools to do real optimization,
rather than the phony "just denormalize and avoid joins" approach that
that so many alleged DBAs take. (This is an area where programmers have
a big advantage; they're generally a lot better at thinking about how
a DBMS might do query optimization.)
Also, a general familiarity with the low-level details of how disk IO
works is fairly essential; i.e., you want a fairly good idea of where
the disk heads will be moving and why in various configurations and
with various queries. The book above will help with that, if you don't
already know it.
Beyond that, a good pile of relational theory is a big help. I'm a
big fan of C.J. Date's books. _Introduction to Database Systems_ is
great if you can find the time to get through it; failing that, _The
Database Relational Model_ is a slim volume that will give you the
essentials in a slightly more beginner-friendly form. Moving on from
there, _Foundation for Future Database Systems: The Third Manifesto_
is fantastic (you may want to skip straight to this if you're already
reasonably familiar with relational theory), and Date, Darwen and
Lorentzos' _Temporal Data and the Relational Model_ is quite useful.
A warning: if you actually learn and understand any substantial portion
of the books above, your life will start to feel rather frustrating, as
you'll be in a situation akin to that of an experienced and productive
Lisp or Smalltalk programer forced to program in COBOL or BASIC. Yes,
the products out there really are that bad. I have been for several
years now sorely tempted to write a good RDBMS. I'm not sure such a
thing would ever be a commercial success (at least not in the next
decade or two), so it's going to have to wait until I'm rich through
other means....
cjs
--
Curt Sampson +81 90 7737 2974
The power of accurate observation is commonly called cynicism
by those who have not got it. --George Bernard Shaw
[Non-text portions of this message have been removed]
Willem Bogaerts
2007-05-22 07:07:48 UTC
Permalink
Post by Adrian Walker
There's anecdotal evidence pointing to the notion that the language for a
next generation RDBMS should not be SQL.
Quite probably true. However, SQL is so standard that it is here to
stay. There are even object databases that feature SQL. Not primarily
intended for everyday use, but immensely handy for browsing and querying.
Most non-SQL database communications libraries also feature SQL for
compatibility.

SQL is not that bad for complex reports, but a compact, low-level
ISAM-style communication possibility would take a great deal of parsing
load from the server.
--
Willem Bogaerts

Applicatiesmid
Kratz B.V.
http://www.kratz.nl/
Curt Sampson
2007-05-23 05:38:47 UTC
Permalink
Post by Willem Bogaerts
Post by Adrian Walker
There's anecdotal evidence pointing to the notion that the language for a
next generation RDBMS should not be SQL.
Quite probably true. However, SQL is so standard that it is here to
stay.
Sure. COBOL is here to stay, too. That doesn't mean that anybody who
needs to work fast and lean is going to use it.

You know something's wrong with it when you regularly find yourself
writing complex queries in a more relational pseudocode and translating
those to SQL.
Post by Willem Bogaerts
SQL is not that bad for complex reports, but a compact, low-level
ISAM-style communication....
Ew. I did say *R*DBMS.

cjs
--
Curt Sampson <***@cynic.net> +81 90 7737 2974
http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it. --George Bernard Shaw
Andrew Gregovich
2007-05-21 14:55:25 UTC
Permalink
Hi Jason

What you need is a mixture of theoretical and practical material. For theoretical, I'd recommend "Database Systems: A Practical Approach to Design, Implementation and Management" or something like http://books.google.com/books?id=Vaf8t82flL8C for some nice examples of I/O + data access path calculations. A lot of the vendor-specific books (especially if they're focused on certification) don't even explain how the basic b-tree index works, and you'll be surprised how many DBAs misunderstand this because of this gap.

On the other hand, you'll need a vendor-specific book to tell you about the guts of an RDBMS... How it manages concurrency (locking vs. MVCC), how its memory optimizer works, its 3GL syntax, special performance enhancement extensions and features and such.

Good luck

Andrew

----- Original Message ----
From: Jason Bennett <***@acm.org>
To: ***@yahoogroups.com
Sent: Monday, May 21, 2007 6:15:43 AM
Subject: [agileDatabases] Database Eye for the Application Guy















So I guess a little background is required first. While I've worked with

databases for years now, it's mostly been in the context of applications

and O/R mapping - very normalized design, not much query crafting, etc.

I have a decent grasp of databases and SQL, but I don't know much of the

arcana.



Enter my new job: very transactionally intense, with a mix of OLTP and

reporting requirements. This means lots of PL/SQL, denormalization, and

query crafting. What I need is some books that can take me from where I

am to where I want to be. What queries are safe (few joins), which will

kill you (date ranges, lots of joins), and how to strike a good balance

when denormalizing without it coming back to hose you.



Aside from Scott's (all hail) Agile database books, what others would

people recommend?



jason
--
Jason Bennett, ***@acm. org

E pur si muove!

Get Firefox! - http://getfirefox. com












<!--

#ygrp-mlmsg {font-size:13px;font-family:arial, helvetica, clean, sans-serif;}
#ygrp-mlmsg table {font-size:inherit;font:100%;}
#ygrp-mlmsg select, input, textarea {font:99% arial, helvetica, clean, sans-serif;}
#ygrp-mlmsg pre, code {font:115% monospace;}
#ygrp-mlmsg * {line-height:1.22em;}
#ygrp-text{
font-family:Georgia;
}
#ygrp-text p{
margin:0 0 1em 0;}
#ygrp-tpmsgs{
font-family:Arial;
clear:both;}
#ygrp-vitnav{
padding-top:10px;font-family:Verdana;font-size:77%;margin:0;}
#ygrp-vitnav a{
padding:0 1px;}
#ygrp-actbar{
clear:both;margin:25px 0;white-space:nowrap;color:#666;text-align:right;}
#ygrp-actbar .left{
float:left;white-space:nowrap;}
.bld{font-weight:bold;}
#ygrp-grft{
font-family:Verdana;font-size:77%;padding:15px 0;}
#ygrp-ft{
font-family:verdana;font-size:77%;border-top:1px solid #666;
padding:5px 0;
}
#ygrp-mlmsg #logo{
padding-bottom:10px;}

#ygrp-vital{
background-color:#e0ecee;margin-bottom:20px;padding:2px 0 8px 8px;}
#ygrp-vital #vithd{
font-size:77%;font-family:Verdana;font-weight:bold;color:#333;text-transform:uppercase;}
#ygrp-vital ul{
padding:0;margin:2px 0;}
#ygrp-vital ul li{
list-style-type:none;clear:both;border:1px solid #e0ecee;
}
#ygrp-vital ul li .ct{
font-weight:bold;color:#ff7900;float:right;width:2em;text-align:right;padding-right:.5em;}
#ygrp-vital ul li .cat{
font-weight:bold;}
#ygrp-vital a {
text-decoration:none;}

#ygrp-vital a:hover{
text-decoration:underline;}

#ygrp-sponsor #hd{
color:#999;font-size:77%;}
#ygrp-sponsor #ov{
padding:6px 13px;background-color:#e0ecee;margin-bottom:20px;}
#ygrp-sponsor #ov ul{
padding:0 0 0 8px;margin:0;}
#ygrp-sponsor #ov li{
list-style-type:square;padding:6px 0;font-size:77%;}
#ygrp-sponsor #ov li a{
text-decoration:none;font-size:130%;}
#ygrp-sponsor #nc {
background-color:#eee;margin-bottom:20px;padding:0 8px;}
#ygrp-sponsor .ad{
padding:8px 0;}
#ygrp-sponsor .ad #hd1{
font-family:Arial;font-weight:bold;color:#628c2a;font-size:100%;line-height:122%;}
#ygrp-sponsor .ad a{
text-decoration:none;}
#ygrp-sponsor .ad a:hover{
text-decoration:underline;}
#ygrp-sponsor .ad p{
margin:0;}
o {font-size:0;}
.MsoNormal {
margin:0 0 0 0;}
#ygrp-text tt{
font-size:120%;}
blockquote{margin:0 0 0 4px;}
.replbq {margin:4;}
-->








____________________________________________________________________________________Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7


[Non-text portions of this message have been removed]
cathyfarrell_ct
2007-05-21 17:07:23 UTC
Permalink
Regarding Curt's question "Does a general understanding suffice in
most cases? Or must one learn the ins and outs of the optimizer for
the particular DBMS that one is using?", even within a particular
DBMS, the factors affecting performance can vary from version to
version. Check out Tom Kyte's Things You "Know" presentation in the
archives of the New York Oracle Users Group (www.nyoug.org).

A couple more book recommendations for Jason:
Oracle PL/SQL Best Practices by Steven Feuerstein
Jason Bennett
2007-05-22 07:11:02 UTC
Permalink
Post by cathyfarrell_ct
Oracle PL/SQL Best Practices by Steven Feuerstein
Appreciate everyone's suggestions. Given my finite time, I'll probably
have to pick and choose, but I hope it's been useful to others as well.

jason
--
Jason Bennett, ***@acm.org
E pur si muove!
Get Firefox! - http://getfirefox.com
Curt Sampson
2007-05-25 00:17:07 UTC
Permalink
Post by cathyfarrell_ct
Regarding Curt's question "Does a general understanding suffice in
most cases? Or must one learn the ins and outs of the optimizer for
the particular DBMS that one is using?", even within a particular
DBMS, the factors affecting performance can vary from version to
version. Check out Tom Kyte's Things You "Know" presentation in the
archives of the New York Oracle Users Group (www.nyoug.org).
The presentation is available at

http://www.nyoug.org/Presentations/2005/kyte_you_know.pdf

to save those of you who are interested the hassle of digging through
the site to find it.

And what did I learn from this presentation? The general knowledge is
critical, and specific "knowledge" can hurt more than help if it's
specific knowledge of something that was true only for another version
of the DB, if it was ever true at all.

Applying small unconnected bits of specific knowledge can hurt as much
as it can help. You need to build a model of how your DBMS and the
database its running work, test that your model matches what's really
going on, and use the model to predict specific changes that will help
fix your performance problems.

cjs
--
Curt Sampson <***@cynic.net> +81 90 7737 2974
http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it. --George Bernard Shaw
Andrew Gregovich
2007-05-21 14:25:32 UTC
Permalink
Curt

Rather that throwing blank statements (RDBMSs suck), can you come up with a concrete example of where they fall short, how this could be addressed and yet maintain all of the standard features and performance that they offer nowadays.

And please note that you could be perceived as arrogant or ignorant (or both) in implying that the features such as advanced cost-based optimizers or various indexing/hashing techniques can be created with a 3 man-year effort.

Andrew

----- Original Message ----
From: Curt Sampson <***@cynic.net>
To: Jason Bennett <***@acm.org>
Cc: Agile Databases List <***@yahoogroups.com>
Sent: Monday, May 21, 2007 9:35:48 AM
Subject: Re: [agileDatabases] Database Eye for the Application Guy



A warning: if you actually learn and understand any substantial portion

of the books above, your life will start to feel rather frustrating, as

you'll be in a situation akin to that of an experienced and productive

Lisp or Smalltalk programer forced to program in COBOL or BASIC. Yes,

the products out there really are that bad. I have been for several

years now sorely tempted to write a good RDBMS. I'm not sure such a

thing would ever be a commercial success (at least not in the next

decade or two), so it's going to have to wait until I'm rich through

other means....



cjs
--
Curt Sampson +81 90 7737 2974

<***@cynic.net> http://www.starling -software. com

The power of accurate observation is commonly called cynicism

by those who have not got it. --George Bernard Shaw












<!--

#ygrp-mlmsg {font-size:13px;font-family:arial, helvetica, clean, sans-serif;}
#ygrp-mlmsg table {font-size:inherit;font:100%;}
#ygrp-mlmsg select, input, textarea {font:99% arial, helvetica, clean, sans-serif;}
#ygrp-mlmsg pre, code {font:115% monospace;}
#ygrp-mlmsg * {line-height:1.22em;}
#ygrp-text{
font-family:Georgia;
}
#ygrp-text p{
margin:0 0 1em 0;}
#ygrp-tpmsgs{
font-family:Arial;
clear:both;}
#ygrp-vitnav{
padding-top:10px;font-family:Verdana;font-size:77%;margin:0;}
#ygrp-vitnav a{
padding:0 1px;}
#ygrp-actbar{
clear:both;margin:25px 0;white-space:nowrap;color:#666;text-align:right;}
#ygrp-actbar .left{
float:left;white-space:nowrap;}
.bld{font-weight:bold;}
#ygrp-grft{
font-family:Verdana;font-size:77%;padding:15px 0;}
#ygrp-ft{
font-family:verdana;font-size:77%;border-top:1px solid #666;
padding:5px 0;
}
#ygrp-mlmsg #logo{
padding-bottom:10px;}

#ygrp-vital{
background-color:#e0ecee;margin-bottom:20px;padding:2px 0 8px 8px;}
#ygrp-vital #vithd{
font-size:77%;font-family:Verdana;font-weight:bold;color:#333;text-transform:uppercase;}
#ygrp-vital ul{
padding:0;margin:2px 0;}
#ygrp-vital ul li{
list-style-type:none;clear:both;border:1px solid #e0ecee;
}
#ygrp-vital ul li .ct{
font-weight:bold;color:#ff7900;float:right;width:2em;text-align:right;padding-right:.5em;}
#ygrp-vital ul li .cat{
font-weight:bold;}
#ygrp-vital a {
text-decoration:none;}

#ygrp-vital a:hover{
text-decoration:underline;}

#ygrp-sponsor #hd{
color:#999;font-size:77%;}
#ygrp-sponsor #ov{
padding:6px 13px;background-color:#e0ecee;margin-bottom:20px;}
#ygrp-sponsor #ov ul{
padding:0 0 0 8px;margin:0;}
#ygrp-sponsor #ov li{
list-style-type:square;padding:6px 0;font-size:77%;}
#ygrp-sponsor #ov li a{
text-decoration:none;font-size:130%;}
#ygrp-sponsor #nc {
background-color:#eee;margin-bottom:20px;padding:0 8px;}
#ygrp-sponsor .ad{
padding:8px 0;}
#ygrp-sponsor .ad #hd1{
font-family:Arial;font-weight:bold;color:#628c2a;font-size:100%;line-height:122%;}
#ygrp-sponsor .ad a{
text-decoration:none;}
#ygrp-sponsor .ad a:hover{
text-decoration:underline;}
#ygrp-sponsor .ad p{
margin:0;}
o {font-size:0;}
.MsoNormal {
margin:0 0 0 0;}
#ygrp-text tt{
font-size:120%;}
blockquote{margin:0 0 0 4px;}
.replbq {margin:4;}
-->









____________________________________________________________________________________Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.
http://farechase.yahoo.com/

[Non-text portions of this message have been removed]
Curt Sampson
2007-06-06 00:36:45 UTC
Permalink
Post by Andrew Gregovich
And please note that you could be perceived as arrogant or ignorant
(or both) in implying that the features such as advanced cost-based
optimizers or various indexing/hashing techniques can be created with a
3 man-year effort.
I wouldn't call the indexing and hashing techniques used by modern
RDBMSes "advanced." (At least the ones I'm aware of, though I'd be
interested in pointers to literature showing otherwise.) A B-tree, in
this day and age, is hardly rocket science.

At any rate, and being a programmer I admit that I may well be typical
of the breed and overly optimistic here, I would think that if you had
a good team of 4-8 people, were starting from scratch, and were using a
good language for most of your work (Haskell, LISP, Scheme, that sort
of thing), you could write a new RDBMS as sophisticated and performant
as any already out there in three years. (I'm assuming that you don't
support SQL or any of the non-relational crap that clutters up a lot
of the modern systems and makes them much more difficult to write,
optimize, and use.)
Post by Andrew Gregovich
Rather that throwing blank statements (RDBMSs suck), can you come up
with a concrete example of where they fall short, how this could be
addressed and yet maintain all of the standard features and performance
that they offer nowadays.
Well, my dream product would include the following features:

1. A decent query langage to replace SQL. In particular, the code should
much more clearly reflect the relational operations being performed, and
it should be easy to modularize parts of queries and re-use those parts
elsewhere.

2. A separate language that can be used to annotate queries to give the
DBMS hints or instructions on how to increase performance for queries.
This must be guaranteed not to change the logic of a query, only the
performance.

3. A similar logical versus physical split for the database schema. For
example, I ought to be able to have two relations stored physically as
their join while not seeing that reflected at all in the logical model,
especially in terms of integrity, but just in the performance of the
system. On the logical side, I ought to be able to create a derived
relation (a view, in SQL terms), drop the relations from which it was
derived, and not lose the derived relation; the storage management
system ought to know what data are needed by the logical model, and
just deal with it, except when giving specific performance and storage
instructions separate from the logical model.

cjs
--
Curt Sampson <***@cynic.net> +81 90 7737 2974
http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it. --George Bernard Shaw
Dawn Wolthuis
2007-06-13 20:28:54 UTC
Permalink
I'll join you in wishing for a better language than SQL, Curt.

I'll add that I would like good query language support not only for
nested relations (Non-First Normal Form), but for data ordered in
lists. The industry would be well-served if we would switch from
three-valued logic back to two-valued logic too.

Then toss in strong support for variable length data and for duck
typing where desired rather than strong typing. Also permit
navigational operators and not strictly relational ones so that joins
are not necessary when you wish to navigate through a web of data
using foreign keys.

Then I'm in on the new query language, although XQuery seems to have
most of this but is sadly "dog ugly", it seems (although I suppose if
I used it regularly it would grow on me).

Cheers! --dawn

Dawn M. Wolthuis
Tincat Group, Inc.
www.tincat-group.com
Post by Curt Sampson
Post by Andrew Gregovich
And please note that you could be perceived as arrogant or ignorant
(or both) in implying that the features such as advanced cost-based
optimizers or various indexing/hashing techniques can be created with a
3 man-year effort.
I wouldn't call the indexing and hashing techniques used by modern
RDBMSes "advanced." (At least the ones I'm aware of, though I'd be
interested in pointers to literature showing otherwise.) A B-tree, in
this day and age, is hardly rocket science.
At any rate, and being a programmer I admit that I may well be typical
of the breed and overly optimistic here, I would think that if you had
a good team of 4-8 people, were starting from scratch, and were using a
good language for most of your work (Haskell, LISP, Scheme, that sort
of thing), you could write a new RDBMS as sophisticated and performant
as any already out there in three years. (I'm assuming that you don't
support SQL or any of the non-relational crap that clutters up a lot
of the modern systems and makes them much more difficult to write,
optimize, and use.)
Post by Andrew Gregovich
Rather that throwing blank statements (RDBMSs suck), can you come up
with a concrete example of where they fall short, how this could be
addressed and yet maintain all of the standard features and performance
that they offer nowadays.
1. A decent query langage to replace SQL. In particular, the code should
much more clearly reflect the relational operations being performed, and
it should be easy to modularize parts of queries and re-use those parts
elsewhere.
2. A separate language that can be used to annotate queries to give the
DBMS hints or instructions on how to increase performance for queries.
This must be guaranteed not to change the logic of a query, only the
performance.
3. A similar logical versus physical split for the database schema. For
example, I ought to be able to have two relations stored physically as
their join while not seeing that reflected at all in the logical model,
especially in terms of integrity, but just in the performance of the
system. On the logical side, I ought to be able to create a derived
relation (a view, in SQL terms), drop the relations from which it was
derived, and not lose the derived relation; the storage management
system ought to know what data are needed by the logical model, and
just deal with it, except when giving specific performance and storage
instructions separate from the logical model.
cjs
--
http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it. --George Bernard Shaw
Scott Ambler
2007-06-15 11:32:37 UTC
Permalink
I definitely agree that SQL isn't perfect, but at the
same time I have to be practical about the issue. SQL
is clearly the dominant query language and has been
for several decades. Furthermore, we've seen a
multitude of alternative query languages fail in the
past.

Having said that, I await the day when a new, superior
query language has become the defacto standard. Until
then I won't be holding my breath.

- Scott
Post by Dawn Wolthuis
I'll join you in wishing for a better language than
SQL, Curt.
I'll add that I would like good query language
support not only for
nested relations (Non-First Normal Form), but for
data ordered in
lists. The industry would be well-served if we
would switch from
three-valued logic back to two-valued logic too.
Then toss in strong support for variable length data
and for duck
typing where desired rather than strong typing.
Also permit
navigational operators and not strictly relational
ones so that joins
are not necessary when you wish to navigate through
a web of data
using foreign keys.
Then I'm in on the new query language, although
XQuery seems to have
most of this but is sadly "dog ugly", it seems
(although I suppose if
I used it regularly it would grow on me).
Cheers! --dawn
Dawn M. Wolthuis
Tincat Group, Inc.
www.tincat-group.com
Scott W. Ambler
Practice Leader Agile Development, IBM Methods Group
http://www-306.ibm.com/software/rational/bios/ambler.html


Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail at http://mrd.mail.yahoo.com/try_beta?.intl=ca
Curt Sampson
2007-07-23 08:28:20 UTC
Permalink
Post by Dawn Wolthuis
I'll add that I would like good query language support not only for
nested relations (Non-First Normal Form), but for data ordered in
lists.
Just two more data types. Nested relations in fact can be first-normal
form as well, if you accept that a relation is a data type like any
other.
Post by Dawn Wolthuis
The industry would be well-served if we would switch from
three-valued logic back to two-valued logic too.
Actually, something along the lines of Haskell's "Maybe" is really what
we need.
Post by Dawn Wolthuis
Then toss in strong support for variable length data....
Absolutely. Though this seems to be pretty good in most DBMSes right
now. Performance, too: even PostgreSQL these days will automatically
store large objects in a separate storage space to speed table scans
that aren't touching that particular value.
Post by Dawn Wolthuis
...and for duck typing where desired rather than strong typing.
Actually, strong typing works just as flexibly and much more safely than
duck typing if you have a decent type system that includes things such
as type inference. Duck typing and dynamic detection of type problems
are mostly a reaction to bad type systems in statically typed languages.
Post by Dawn Wolthuis
Also permit navigational operators and not strictly relational ones so
that joins are not necessary when you wish to navigate through a web
of data using foreign keys.
It seems to me that joins, if well implemented, are the easier way to
deal with this. Possibly the two are even directly equivalant. Do you
have a particular example of something where joins, in the best syntax
you can imagine, are harder than navigation (also in the best syntax you
can think of)?

cjs
--
Curt Sampson <***@cynic.net> +81 90 7737 2974
http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it. --George Bernard Shaw
Dawn Wolthuis
2007-07-23 14:05:34 UTC
Permalink
Post by Curt Sampson
Post by Dawn Wolthuis
I'll add that I would like good query language support not only for
nested relations (Non-First Normal Form), but for data ordered in
lists.
Just two more data types. Nested relations in fact can be first-normal
form as well, if you accept that a relation is a data type like any
other.
Yes, this is a redefining of 1NF that renders "1NF" meaningless (or
almost meaningless). That is the result of some better analysis than
that which has been somewhat burned into our profession by way of
RDBMS products, SQL-92 still being the flavor most standard across
tools, and higher ed teaching normalization with the form formerly
known as 1NF.

Unfortunately, redefining 1NF to no longer mean what products and
people thought it meant has made it so it is difficult to talk about
the form formerly known as 1NF, that which has contributed, in my
opinion, to some unnecessary software bloat and complexity. It has
also made it difficult to get the word out that 1NF, as we knew it, is
dead in theory, while still very much alive in practice. Most RDBMS
tools are not set up to work well with nested relations. Lists are an
even bigger issue for most RDBMS tools.
Post by Curt Sampson
Post by Dawn Wolthuis
The industry would be well-served if we would switch from
three-valued logic back to two-valued logic too.
Actually, something along the lines of Haskell's "Maybe" is really what
we need.
There are pros and cons to each approach, and I'll confess that I hae
not done much with Haskell to see the advantages. I have worked with
databases with both 2 and 3VL and there are significant quality
advantages, it seems, when working with a 2VL. Additionally, the
logic used with non-DBMS data is then the same as that used with the
DBMS. 2VL enhanced ease of software development and maintenance.
Even though the world is muddy, having a model (the database) where we
can ask yes or no questions and get yes or no answers has a
significant advantage.
Post by Curt Sampson
Post by Dawn Wolthuis
Then toss in strong support for variable length data....
Absolutely. Though this seems to be pretty good in most DBMSes right
now.
Much better than it once was, definitely.
Post by Curt Sampson
Performance, too: even PostgreSQL these days will automatically
store large objects in a separate storage space to speed table scans
that aren't touching that particular value.
Post by Dawn Wolthuis
...and for duck typing where desired rather than strong typing.
Actually, strong typing works just as flexibly and much more safely than
duck typing if you have a decent type system that includes things such
as type inference. Duck typing and dynamic detection of type problems
are mostly a reaction to bad type systems in statically typed languages.
Since I know that I have conflicting opinions about typing depending
on the weather or the time of day, you might be right on this. I will
add that those DBMS's that serve up all data as strings, where you can
handily cast to numbers, dates, names, or any other type (or even
multiple types depending on the situation -- this Person is-an
Employee so cast this field to a number, else leave as a string), have
provided the biggest bang for the buck in productivity in my
experience. So, I tend to think there is something to having the type
checking somewhere prior to the persistence logic in our applications,
letting persistence logic "simply" persist data or objects.

By tightly coupling type checking and persistence, we make our
software brittle, perhaps unnecessarily. Although this promotes some
kinds of data quality (no matter what an application developer does,
no non-numeric data will ever make it into this column), it encourages
other poor quality decisions, particularly by those who have not read
"Refactoring Databases" (Amber & Sadalage)

Even the need to refactor in test environments while working on new
development, sometimes while doing R&D prior to nailing down a new
design, contributes to a slow-down in software development or, too
frequently, to using an "only add, never change" approach to working
with databases.
Post by Curt Sampson
Post by Dawn Wolthuis
Also permit navigational operators and not strictly relational ones so
that joins are not necessary when you wish to navigate through a web
of data using foreign keys.
It seems to me that joins, if well implemented, are the easier way to
deal with this.
I am definitely not advocating elimination of JOINs, simply adding
tools so that developers have more flexibility. Regardless of how
this is implemented by the SQL engine, there can be advantages in
writing and maintaining something like the following, especially once
you are working with many tables contributing to a result set:

select partyId, Person->name, ssn from Employee;
Post by Curt Sampson
Possibly the two are even directly equivalant.
Yes, the DBMS may do whatever under the covers, I am talking about
modeling in the software application for speed in development and ease
of maintenance.
Post by Curt Sampson
Do you
have a particular example of something where joins, in the best syntax
you can imagine, are harder
It is very subtle. I have been trying to glean these subtlties as I
work because I have seen (my budget has seen) a significant advantages
in doing modern software development with tools not thought to be
modern RDBMS's.
Post by Curt Sampson
than navigation (also in the best syntax you
can think of)?
The example above is perhaps too simple to give enough hints on how
this can improve productivity. Toss a few more attributes in there
coming from a bunch of tables and then add in there that you might be
looking for the current status, that might have been implemented with
history as a list (or stack) where the top entry is the current
status, so you might want Person->status(0) or Person->status[0].
Compare that to the corresponding SQL statement to choose a bunch of
fields from different tables, including the most recent status from
date-stamped statuses. Advantages include being able to think from
the one to the many and not just from the many to the one. It might
sound like I am conflating ideas here (and maybe I am), as I realize I
cannot yet pinpoint what, precisely, saves the dollars that are saved
when people use products employing such techniques. I do know that I
don't want to go back to SQL-only-DBMS's, so I'm hoping they catch up.

Instead of only considering the question "How can we work in an agile
way with today's databases?", the team can work with an "agile
database" too. These agile databases permit both SQL and other means
(e.g. 3GL approaches) for implementing CRUD services. I resist the
argument that some make that we are trading in quality for agility,but
the downside, as one of my colleagues has said, is that such tools
give developers enough rope to hang themselves, so there needs to be a
different approach to some quality issues.

Have a good day. --dawn
Post by Curt Sampson
cjs
--
http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it. --George Bernard Shaw
--
Dawn M. Wolthuis
Tincat Group, Inc. tincat-group.com

Take and give some delight today
Curt Sampson
2007-07-27 08:18:35 UTC
Permalink
Post by Dawn Wolthuis
Post by Curt Sampson
Just two more data types. Nested relations in fact can be first-normal
form as well, if you accept that a relation is a data type like any
other.
Yes, this is a redefining of 1NF that renders "1NF" meaningless (or
almost meaningless).
Not at all. Here is Date's definition of 1NF (from Introduction to Database
Systems, 7th ed., p. 357)

First normal form: A relvar is in 1NF if and only if, in every legal
value of that relvar, every tuple contains exactly one value for
each attribute.

So long as the attribute contains only one, never more than one, relation,
it's in 1NF.

Don't be confused by it being a composite data type. In some sense,
text strings could be considered composite data types as well, since
it's essentially a list of zero more characters, yet when we update an
attribute from "a" to "abc", we wouldn't consider that attribute to have
gone from one to three values.
Post by Dawn Wolthuis
Most RDBMS tools are not set up to work well with nested relations.
I am not entirely sure what you mean by the term "nested relations," but
it makes me quite uncomfortable. After all, we don't refer to a string
or an integer as being "nested" inside a relation; why should we refer
to any other data type as being "nested"?
Post by Dawn Wolthuis
Lists are an even bigger issue for most RDBMS tools.
Sure, but they're mostly misused, in my experience, anyway. Have a look
at the PostgreSQL catalogue for some classic examples. There are plenty
of queries I'd like to be able to do on those that I can't do because
they put stuff in arrays instead of normalizing the data.
Post by Dawn Wolthuis
Post by Curt Sampson
Post by Dawn Wolthuis
The industry would be well-served if we would switch from
three-valued logic back to two-valued logic too.
Actually, something along the lines of Haskell's "Maybe" is really what
we need.
There are pros and cons to each approach, and I'll confess that I hae
not done much with Haskell to see the advantages.
Haskell's type system is well worth study.
Post by Dawn Wolthuis
I will add that those DBMS's that serve up all data as strings, where
you can handily cast to numbers, dates, names, or any other type (or
even multiple types depending on the situation -- this Person is-an
Employee so cast this field to a number, else leave as a string),
have provided the biggest bang for the buck in productivity in my
experience.
Ouch. That seems destined to lead to run-time type errors.
Post by Dawn Wolthuis
So, I tend to think there is something to having the type checking
somewhere prior to the persistence logic in our applications, letting
persistence logic "simply" persist data or objects.
If all you need is persistence, why bother with an RDBMS? Keep in mind,
the primary advantage of a good RDBMS is that it serves as an inference
engine; you present it with a set of propositions and queries, and it
infers the answers for you.
Post by Dawn Wolthuis
select partyId, Person->name, ssn from Employee;
That seems to me syntatic sugar for a join. In fact, I find something like
this much more clear:

Employee JOIN Person PROJECT partyId, name, ssn
Post by Dawn Wolthuis
...and then add in there that you might be looking for the current
status, that might have been implemented with history as a list (or
stack) where the top entry is the current status, so you might want
Person->status(0) or Person->status[0].
Why not just model status as a relation. Then it works like anything
else in the database, refactoring is easier (e.g., you can trivially add
other dependent attributes quite easily) and you can do queries such as,
"give me the information based on status as it was last July 15th."
Post by Dawn Wolthuis
Compare that to the corresponding SQL statement....
No, don't. My whole point is that, given better syntax, many of these
difficulties that send you to non-relational solutions go away.

cjs
--
Curt Sampson <***@cynic.net> +81 90 7737 2974
http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it. --George Bernard Shaw
Dawn Wolthuis
2007-08-01 02:07:02 UTC
Permalink
Post by Curt Sampson
Post by Dawn Wolthuis
Post by Curt Sampson
Just two more data types. Nested relations in fact can be first-normal
form as well, if you accept that a relation is a data type like any
other.
Yes, this is a redefining of 1NF that renders "1NF" meaningless (or
almost meaningless).
Not at all. Here is Date's definition of 1NF (from Introduction to Database
Systems, 7th ed., p. 357)
First normal form: A relvar is in 1NF if and only if, in every legal
value of that relvar, every tuple contains exactly one value for
each attribute.
So long as the attribute contains only one, never more than one, relation,
it's in 1NF.
Can you give an example of a (database) relation that is not in 1NF?
Post by Curt Sampson
Don't be confused by it being a composite data type. In some sense,
text strings could be considered composite data types as well,
Yes, agreed. I have no problem with the domain of an attribute
including relations or even lists. My issue is that First Normal Form
(1NF) had a meaning such that "Non-first Normal Form" databases were
NOT in 1NF (rather by definition, one would think). Now those
databases that would be classified as NF2, including those that
encourage lists or relations as attribute values (by including
operations and queries on such, for example) are NOT in Non-First
Normal Form by the newer defs of 1NF. When "Non-First Normal Form"
does not mean that a relation is not in first normal form, you can see
how communication might be a bit confusing.

This has also made it difficult to get the word out that 1NF as we
knew is, is dead. Relational theorists simply redefined it so they did
not have to say that it no longer applies. Unfortunately, of all the
normal forms, the form formerly known as 1NF is the only one that DBMS
tools, such as SQL-92 and products that employ such, enforce within
the DBMS -- it is the only one you cannot violate in many environments
and would be well-advised not to violate in those that use SQL as THE
sole langauge of the DBMS.
Post by Curt Sampson
since
it's essentially a list of zero more characters, yet when we update an
attribute from "a" to "abc", we wouldn't consider that attribute to have
gone from one to three values.
Post by Dawn Wolthuis
Most RDBMS tools are not set up to work well with nested relations.
I am not entirely sure what you mean by the term "nested relations," but
it makes me quite uncomfortable.
A relation with an attribute whose values are relations.
Post by Curt Sampson
After all, we don't refer to a string
or an integer as being "nested" inside a relation; why should we refer
to any other data type as being "nested"?
It not only seems reasonable terminology to say that a relation within
a relation is "nested" it is also the terminology used by some NF2
products. In IBM UniData, for example, one might issue an SQL
statement with an UNNEST in it if you wish to have each of a number of
multivalues in a separeate row in the result set. It is OK if you
prefer GROUP and UNGROUP to NEST and UNNEST (although that could also
be confusing).
Post by Curt Sampson
Post by Dawn Wolthuis
Lists are an even bigger issue for most RDBMS tools.
Sure, but they're mostly misused, in my experience, anyway.
Sometimes they are, but most often the issue is that NF2 developers
might decide not to refactor when they should, just as happens with
any DBMS.
Post by Curt Sampson
Have a look
at the PostgreSQL catalogue for some classic examples. There are plenty
of queries I'd like to be able to do on those that I can't do because
they put stuff in arrays instead of normalizing the data.
That could be the query tool, however. SQL was born as a query tool
that only had to handle a single "simple" value for an attribute.
Post by Curt Sampson
Post by Dawn Wolthuis
Post by Curt Sampson
Post by Dawn Wolthuis
The industry would be well-served if we would switch from
three-valued logic back to two-valued logic too.
Actually, something along the lines of Haskell's "Maybe" is really what
we need.
There are pros and cons to each approach, and I'll confess that I hae
not done much with Haskell to see the advantages.
Haskell's type system is well worth study.
I'll put it on the list, but I'm out of "heads-down student mode" for
a while and into practitioner mode.
Post by Curt Sampson
Post by Dawn Wolthuis
I will add that those DBMS's that serve up all data as strings, where
you can handily cast to numbers, dates, names, or any other type (or
even multiple types depending on the situation -- this Person is-an
Employee so cast this field to a number, else leave as a string),
have provided the biggest bang for the buck in productivity in my
experience.
Ouch. That seems destined to lead to run-time type errors.
There are trade-offs with any such design decisions. You might notice
that there are tons of non-fatal javascript errors when surfing around
the internet, but you can often still get your job done. Add a bit
more QA for some sites to elimiate more such errors.
Post by Curt Sampson
Post by Dawn Wolthuis
So, I tend to think there is something to having the type checking
somewhere prior to the persistence logic in our applications, letting
persistence logic "simply" persist data or objects.
If all you need is persistence, why bother with an RDBMS?
Agreed, but there are some reasons that a big honkin' RDBMS that might
be both over- (constraint-handling) and under- (lack of support for
lists) featured might be incorporated into a software project during
even the very first sprint/iteration of software construction.

Let's say that a software developer or development team is starting
from scratch and has the full range of options available to them.
When they design something that they think must stick around even when
their software is not running, they either look at file system options
or at DBMS solutions. They often choose an RDBMS so that they and
others can execute queries against it using SQL and/or query tools
that employ SQL.

Persistence as a requirement implies the ability to retrieve persisted
information, while the use of standard tools for such retrieval likely
needs to be stated as a separate requirement. But I would contend
that a requirement of "CRUD services" (aka "persistence") plus the
requirement of being able to access data using industry-standard
(today SQL-based) tools are the two primary requirements under
consideration when a team decides to employ a DBMS, even though a
(R)DBMS has considerably more features than these two (security,
constraint-handling, RI). Deciding to use these additional features up
front might be a good example of how NOT to do agile software
development, perhaps?

Also, rather than looking at the conceptual design and determining
whether an NF2 or 1NF DBMS might be the best option, developers tend
to go with a standard, often even portraying a conceptual or at least
logical data model in the form formerly known as 1NF, and then design
to SQL, for example.
Post by Curt Sampson
Keep in mind,
the primary advantage of a good RDBMS is that it serves as an inference
engine; you present it with a set of propositions and queries, and it
infers the answers for you.
Yes, and which software development teams have a need for an inference
engine in the first few iterations of their software development
efforts? An RDBMS might just be one of the biggest detractors from
doing agile software development as it injects a big solution even
when the requirements are much smaller (such as the two I listed that
typically prompt use of an RDBMS).
Post by Curt Sampson
Post by Dawn Wolthuis
select partyId, Person->name, ssn from Employee;
That seems to me syntatic sugar for a join.
Fine, let's call it "chocolate" and then I'm in--sweet is good.
Post by Curt Sampson
In fact, I find something like
Employee JOIN Person PROJECT partyId, name, ssn
That is fine for those who think exclusively in sets. There is not a
good reason, in my opinion, to reduce the developers choices in this
regard. I understand that the coupling of constraint-handling with
persistence is what drives this, so that first order logic can be
employed. I am not willing to give up what we once had and can still
have in software development agility when using NF2 tools. Again,
there are trade-offs. Not every project is the same, of course, but
for a typical complex database application, requiring good
performance, ease of maintenance, data accuracy, etc, I have seen
better bang for the buck and also happier end-users (again, for the
buck, as one can accomplish pretty much the same with each toolset)
with tools like IBM UniData than with the average SQL-DBMS.
Post by Curt Sampson
Post by Dawn Wolthuis
...and then add in there that you might be looking for the current
status, that might have been implemented with history as a list (or
stack) where the top entry is the current status, so you might want
Person->status(0) or Person->status[0].
Why not just model status as a relation. Then it works like anything
else in the database, refactoring is easier (e.g., you can trivially add
other dependent attributes quite easily)
I disagree that refactoring is easier when developing in tools that
traditionally have required 1NF than with NF2 databases. This is
where there can be truly significant developer productivity gains from
a tool that permits lists (e.g.). I have detailed some of the gains
in the articles that I wrote up as "mewsings" last year, but will give
one example here.

Let's say that you start with 'status' as a single-valued attribute of
Product and later the requirements are to keep a history (without
dates to start with). Let's say that the precise requirements prompt
you to think of 'status' now as a logical stack with the most recent
status on top.

Some NF2 DBMS tools permit everything that has worked, screens and
reports, to keep working after you change the definition of 'status'
to be multi-valued. The reports might even run successfuly, now
showing multiple statuses instead of just one. The user interface is
then enhanced by changing the UI widget to a window, or widget that
collects and shows the multivalues in a prettier way, but the product
was able to be enhanced to permit the new functionality with a change
of a description from S (singlevalued) to M (multivalued).

There are many such examples where a change to requirements when using
an NF2 database yields much higher developer productivity than with a
traditional SQL-DBMS, which is why I think of NF2 tools as "agile
databases". I came to this hypothesis as a manager based on what
seemed to be significant differences in the bottom line budgets for
development in an NF2 DBMS compared to an SQL-DBMS. I later started
investigating why this might be. NF2 and 2-valued-logic are two of
the reasons, even though not the entirety of the reasons, I now think.
Post by Curt Sampson
and you can do queries such as,
"give me the information based on status as it was last July 15th."
Post by Dawn Wolthuis
Compare that to the corresponding SQL statement....
No, don't. My whole point is that, given better syntax, many of these
difficulties that send you to non-relational solutions go away.
It will be interesting to see if the industry continues heading in the
NF2 direction, modeling more data with lists and relations within
relations. XML is only one reason to head toward NF2 and 2VL. Agile
database application software development is another. Given the
enhancements that are being made to tools from IBM (DB2 Viper),
Microsoft (such as linq), Oracle (with purchase of Berkeley DB), and
elsewhere, I suspect the s/w development industry will rediscover this
soon, even if only by adding new tricks rather than with a revolution
as we saw with the introduction of relational theory and subsequent
(unfortunate IMO) introduction of 1NF. --dawn
Post by Curt Sampson
cjs
Curt Sampson
2007-08-05 12:37:22 UTC
Permalink
Post by Dawn Wolthuis
Can you give an example of a (database) relation that is not in 1NF?
No, because by definition it's no longer a relation if it's not in 1NF.

As for the "nested" thing, I just picked up a copy of Date's _The
Relational Database Dictionary_, and he seems to think that it's an
acceptable alternative term to the preferred "relation-valued attribute",
so I guess I'm fine with that.
Post by Dawn Wolthuis
Have a look at the PostgreSQL catalogue for some classic examples.
There are plenty of queries I'd like to be able to do on those that I
can't do because they put stuff in arrays instead of normalizing the
data.
That could be the query tool, however. SQL was born as a query tool
that only had to handle a single "simple" value for an attribute.
It's not the query tool. Single, "simple" values are what I want,
instead of arrays or lists.
Post by Dawn Wolthuis
Deciding to use these additional features up front might be a good
example of how NOT to do agile software development, perhaps?
To my mind, that would be akin to deciding not to use RDBMs features
is akin to deciding not to use features of your programming language
or libraries. If the feature's there, and it's helpful to what you're
doing, use it.
Post by Dawn Wolthuis
Keep in mind, the primary advantage of a good RDBMS is that it serves
as an inference engine; you present it with a set of propositions and
queries, and it infers the answers for you.
Yes, and which software development teams have a need for an inference
engine in the first few iterations of their software development efforts?
Many of them. A typical small web site might need to answer the
following sorts of questions, for which the answer is not directly in
the database: How many users are there? How many postings were made
today? How many logins were there? What's the most popular hour of the
day to log in?
Post by Dawn Wolthuis
An RDBMS might just be one of the biggest detractors from doing agile
software development as it injects a big solution even when the
requirements are much smaller...
I'd disagree. You wouldn't suggest that someone code something in C
because Ruby is more power than you need at the moment, would you? All
else being equal, you use the most powerful tools available, because
that will let you produce the minimal amount of code.
Post by Dawn Wolthuis
Employee JOIN Person PROJECT partyId, name, ssn
That is fine for those who think exclusively in sets.
And why would you not think in sets unless there were a compelling
reason not to do so? Sets are less work, because they're easier to
reason about than, say, lists or trees.
Post by Dawn Wolthuis
I disagree that refactoring is easier when developing in tools that
traditionally have required 1NF than with NF2 databases.
I may well agree with that, but why use those tools? My ideal solution
to the problem would be to fix the tools, not use something more
difficult. (In practice, of course, the cost of fixing the tools might
be high enough that you just live with what you've got.)
Post by Dawn Wolthuis
Some NF2 DBMS tools permit everything that has worked, screens and
reports, to keep working after you change the definition of 'status'
to be multi-valued.
Sure. And others will do the same if you make something "multi-valued"
by adding a new relation.

cjs
--
Curt Sampson <***@cynic.net> +81 90 7737 2974
http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it. --George Bernard Shaw
Cameron Laird
2007-08-05 15:43:29 UTC
Permalink
.
.
.
Post by Curt Sampson
Post by Dawn Wolthuis
An RDBMS might just be one of the biggest detractors from doing agile
software development as it injects a big solution even when the
requirements are much smaller...
I'd disagree. You wouldn't suggest that someone code something in C
because Ruby is more power than you need at the moment, would you? All
else being equal, you use the most powerful tools available, because
that will let you produce the minimal amount of code.
.
.
.
Me, too.

There's another distinction afoot here that I shan't now make
time to articulate as fully as it deserves: while Oracle,
let's say, sure doesn't feel agile, that has to do with factors
beyond its RDBMSness. SQLite is a delightful persistence
engine that makes lots of common operations delightfully light
in weight--and you get SQL for free!

So, yes, I agee with both Dawn and Curt: burdening an agile
project with a corporate-standard RDBMS and its associated
methodology can be a major distraction; but it's equally
pointless to fall into the mistake of believing all RDBMS are
heavyweights.
Scott Ambler
2007-08-06 20:49:44 UTC
Permalink
Post by Cameron Laird
There's another distinction afoot here that I shan't
now make
time to articulate as fully as it deserves: while
Oracle,
let's say, sure doesn't feel agile, that has to do
with factors
beyond its RDBMSness.
As with most things, the technical issues are often
straightforward to overcome once you choose to do so.
The difficult issues are the ones involving people.
In particular, I've found that the cultural impedance
mismatch between the traditional data community and
pretty much everyone else in IT is a serious problem
that needs to be addressed within most organizations.
Sadly, a lot of the traditional data community is
still struggling with the idea that the rest of the
world has pretty much moved on from what they prefer
to do.

And BTW, as Pramod and I show in Refactoring Databases
you can in fact take a relatively Agile approach with
Oracle if you choose to.

- Scott

Scott W. Ambler
Practice Leader Agile Development, IBM Methods Group
http://www-306.ibm.com/software/rational/bios/ambler.html


Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail at http://mrd.mail.yahoo.com/try_beta?.intl=ca
Mustafa Ekim
2007-08-07 10:49:53 UTC
Permalink
Hey guys,

I want to know how u guys resolve the kind of problem
I have:

I have a model, say 'person'. He has a name. But he
does not tell his real name to everyone everytime,
instead he tells sometime different names to some
different people. But, bear in mind that he 99% tells
the his correct name.

Since not ALWAYS he is telling the correct name, I
cannot store name as a column in my person table.
instead, I have to store it in the relation table
between 2 people.

may be an approach could be like this,
I will store the default name in person table, and in
the relation table I will have a boolean value telling
that he told his real name or if not another name
value to store the name he told.

may be my english is not sufficient to explain my
problem, but, if any of u understand its ok :)

thanks in advance






____________________________________________________________________________________
Need a vacation? Get great deals
to amazing places on Yahoo! Travel.
http://travel.yahoo.com/
Curt Sampson
2007-08-06 21:43:33 UTC
Permalink
There's another distinction afoot here that I shan't now make time
to articulate as fully as it deserves: while Oracle, let's say, sure
doesn't feel agile, that has to do with factors beyond its RDBMSness.
This is a good point that I'd forgotten: Oracle, as one example, is a
particuarly nasty piece of work to administer. I'm used to a world where
once per machine I "pkg_add postgresql", set up a few IDs, and then for
a new project just "qam-file -u src/psql", add a few files to src/db,
add some IDs (once per machine that's using this project), and start
typing and testing SQL code. (And I've got fixes for the ID stuff in
the works--that's the most annoying part a the moment because it's a 5
minute or so manual process, mostly due to password management issues.)
...and you get SQL for free!
I'm not entirely sure that that's a good thing! :-)

cjs
--
Curt Sampson <***@cynic.net> +81 90 7737 2974
http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it. --George Bernard Shaw
Dawn Wolthuis
2007-08-09 20:09:02 UTC
Permalink
Post by Curt Sampson
Post by Dawn Wolthuis
Can you give an example of a (database) relation that is not in 1NF?
No, because by definition it's no longer a relation if it's not in 1NF.
Yes, that was my point about rendering this def of 1NF meaningless.
With the prior meaning, people would put a relation into 1NF, but
there is no longer a need to do so, right? Now we are left with NF2 =
Non-First Normal Form ACTUALLY BEING in first normal form, just not in
the form formerly known as first.

Obviously, this is going to lead to misunderstandings, starting with
folks (developers and data modelers, for example) thinking that they
still "must put their data in first normal form" from some purist
standpoint, not just due to the failings of any particular tool, while
still thinking that first normal forms means that it is not in
non-first normal form. How can we get the word out on complete change
in the meaning of 1NF so that it has been rendered pretty much
meaningless?
Post by Curt Sampson
As for the "nested" thing, I just picked up a copy of Date's _The
Relational Database Dictionary_, and he seems to think that it's an
acceptable alternative term to the preferred "relation-valued attribute",
so I guess I'm fine with that.
Good deal, I agree with Date ;-)
Post by Curt Sampson
Post by Dawn Wolthuis
Have a look at the PostgreSQL catalogue for some classic examples.
There are plenty of queries I'd like to be able to do on those that I
can't do because they put stuff in arrays instead of normalizing the
data.
That could be the query tool, however. SQL was born as a query tool
that only had to handle a single "simple" value for an attribute.
It's not the query tool. Single, "simple" values are what I want,
instead of arrays or lists.
Sometimes I want single, simple values, but often I want lists, arrays
(or sets or at some times perhaps even bags)
Post by Curt Sampson
Post by Dawn Wolthuis
Deciding to use these additional features up front might be a good
example of how NOT to do agile software development, perhaps?
To my mind, that would be akin to deciding not to use RDBMs features
is akin to deciding not to use features of your programming language
or libraries. If the feature's there, and it's helpful to what you're
doing, use it.
Yes, and if the feature gets in your way, don't use it.
Post by Curt Sampson
Post by Dawn Wolthuis
Keep in mind, the primary advantage of a good RDBMS is that it serves
as an inference engine; you present it with a set of propositions and
queries, and it infers the answers for you.
Yes, and which software development teams have a need for an inference
engine in the first few iterations of their software development efforts?
Many of them. A typical small web site might need to answer the
following sorts of questions, for which the answer is not directly in
the database: How many users are there? How many postings were made
today? How many logins were there? What's the most popular hour of the
day to log in?
Ah, yes, definitely there is a need for a query tool. I had thought
you were including constraint handling. I use DBMS's that include
SQL, but did not arise out of the RDBMS craze of the 80's. They still
have inference engines, especially if by that you simply mean that
queries such as those you listed are possible (and typically easier
than with SQL). So, would you revise your statement to say that the
feature you reference is a primary feature of a DBMS, or do you think
it really is specific to an RDBMS that it can answer questions such as
the above?
Post by Curt Sampson
Post by Dawn Wolthuis
An RDBMS might just be one of the biggest detractors from doing agile
software development as it injects a big solution even when the
requirements are much smaller...
I'd disagree. You wouldn't suggest that someone code something in C
because Ruby is more power than you need at the moment, would you?
I like that analogy (you get a point for that one), but there is
nothing that handily converts from C to Ruby, where one could design
for a DBMS with little constraint handling and move to one with more
features quite handily if one were to decide that there were a lot of
constraints that should not be handled within a business rules
framework (where the user can make changes) and that should not be
handled as data stored in the DBMS for the purpose of generating forms
or other code, for example. I have found few "data processing
applications" that really are well-served by employing a ton of SQL
constraints, for example.
Post by Curt Sampson
All
else being equal, you use the most powerful tools available, because
that will let you produce the minimal amount of code.
Use a sledge hammer to pound in a nail?

If the tool is available at run-time, so it becomes part of the
"product" (the software), then it does not make sense to always select
one with a lot more features than you need, does it?
Post by Curt Sampson
Post by Dawn Wolthuis
Employee JOIN Person PROJECT partyId, name, ssn
That is fine for those who think exclusively in sets.
And why would you not think in sets unless there were a compelling
reason not to do so?
I think there often is a compelling reason not to and I have a
wonderful example from a meeting I was in on Tuesday of this week. My
hunch for a few years has been that I get better software, more
accurate results, and much higher productivity from developers who do
not employ exclusively-SQL-based DBMS's. I really need to write up
the experience I just had, which is yet another anecdote along these
lines, but will give you the very short version here.

We were coming up with multiple ways to do something with simulated
data where we knew (should have known) to make no assumptions about
that data. In this case our little example was retrieving a
two-column list of phone types & phone numbers from a database and
presenting them on a web page in a specific format. So, the
presentation was consistent, but the process of getting from persisted
data to the presentation varied. With the first few attempts, we
employed SQL, and all of the results were the same, validating our
efforts to some extent. With the next one we did "what a typical MV
(an NF2 model) developer would do" with a loop and two-valued logic
and we ended up with two more rows. Why? Because we did an inner and
not an outer join and there were some empty set (null in 2VL) values
for the phone type.

So, yes, our SQL was in error, and with the tens, if not hundreds of
man years (and even some woman years) in the room, including some very
brilliant people, we made a mistake. We got the accurate data when we
dispensed with our set-based, howbeit flawed, approach.

It has been my experience that this was definitely not an exception.
It might even be par for the course. Because I want quality software,
with flexible/agile software development over the life of the software
(minimizing brittle components), with the software development
providing a big bang for the buck, I am not sold on the exclusive use
of set-based processing of data. There are pros and cons to any such
decisions.
Post by Curt Sampson
Sets are less work, because they're easier to
reason about than, say, lists or trees.
I respectfully disagree. If I am trying to drive from here to there
and you provide me with a set that contains whatever data you think
would get me there, do you think htat set will be easy to use than a
map (a graph)? If I need a list (ordered) e-mail addresses for each
person, is it easier to model and use that as a list attribute within
a Person relation, or with a new relation, adding in a number
attribute pasted to each e-mail, along with an identifier to tie this
e-mail entry to a specific Person? If you think the latter to be
easier, then perhaps you have never had to mess around with a ripple
delete in an SQL-only-DBMS?

http://www.tincat-group.com/mewsings/2006/01/who-ordered-ripple-delete.html
Post by Curt Sampson
Post by Dawn Wolthuis
I disagree that refactoring is easier when developing in tools that
traditionally have required 1NF than with NF2 databases.
I may well agree with that, but why use those tools? My ideal solution
to the problem would be to fix the tools, not use something more
difficult. (In practice, of course, the cost of fixing the tools might
be high enough that you just live with what you've got.)
I'm happy to work with tools that are simply easier, and less costly, to use.
Post by Curt Sampson
Post by Dawn Wolthuis
Some NF2 DBMS tools permit everything that has worked, screens and
reports, to keep working after you change the definition of 'status'
to be multi-valued.
Sure. And others will do the same if you make something "multi-valued"
by adding a new relation.
This is the reason I would love to see some bake-offs in our industry,
pitting various languages, DBMS tools, and IDEs, against each other,
with assessments for each of the (somewhat standard) quality
attributes, along with timing & cost considerations. Maybe there are
such and I am missing them (if so, please send me to such URLs). I do
not have the resources to pull off such a bake-off, but I'm sure that
there are groups or companies that do, and I would be happy to donate
some time to such an effort. I don't like saying "my hunch is..."
when we could be getting some emperical data.

I realize it is not small thing to set up tests that could be agreed
upon, choose judges, etc. I could be wrong, but I do not see anything
from industry analyst groups or independent sources comparing tools
such as Oracle with those such as Cache' (I use the MV style of
coding with it). They are considered to be two different animals, but
they can be employed by projects writing software to address the same
"data processing" needs. I would like to see such comparisons.
Cheers! --dawn

Dawn M. Wolthuis
Tincat Group, Inc.
Post by Curt Sampson
cjs
--
http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it. --George Bernard Shaw
Andrew Gregovich
2007-05-22 09:49:22 UTC
Permalink
That's correct, if you're building a system with high volume requirements you will definitely need to know certain details. SQL itself is a mathematical construct, so in an ideal world the DB would be able to optimize the detailed path with whichever magic tricks it can come up with. Naturally, optimizers are not omniscient and therefore some vendors will say "use EXISTS here or IN there" (although both are logically equivalent). This is becoming less of an issue as they get smarter with every new product release, but there are some RDBMS with certain performance limitations which could really ruin your day if you were not aware of them; I'll give you an example - no hash joins in MySQL (or have they implemented them yet?).

Looking beyond individual SQLs, there is another critical aspect of an RDBMS - the concurrency enforcement mechanism, i.e. is it lock-based (AKA pessimistic) or multi-version (AKA optimistic). For example, many people have big issues migrating from Oracle or MySQL/InnoDB to MS SQL Server (often unforeseen because MSoft doesn't tend to bring this issue up).

Lastly, for very large or high-performance systems, vendors have some specialized specific features can really be of huge help. Again, the difference between vendors tends to be more emphasized as the sizing/performance requirements increase - for trivial systems it's very easy to write non-vendor specific code, but for complex systems you're a lot more tied in.

Andrew

----- Original Message ----
From: cathyfarrell_ct <***@rdg.boehringer-ingelheim.com>
To: ***@yahoogroups.com
Sent: Tuesday, May 22, 2007 1:07:23 AM
Subject: [agileDatabases] Re: Database Eye for the Application Guy













Regarding Curt's question "Does a general understanding suffice in

most cases? Or must one learn the ins and outs of the optimizer for

the particular DBMS that one is using?", even within a particular

DBMS, the factors affecting performance can vary from version to

version. Check out Tom Kyte's Things You "Know" presentation in the

archives of the New York Oracle Users Group (www.nyoug.org) .



A couple more book recommendations for Jason:

Oracle PL/SQL Best Practices by Steven Feuerstein














<!--

#ygrp-mlmsg {font-size:13px;font-family:arial, helvetica, clean, sans-serif;}
#ygrp-mlmsg table {font-size:inherit;font:100%;}
#ygrp-mlmsg select, input, textarea {font:99% arial, helvetica, clean, sans-serif;}
#ygrp-mlmsg pre, code {font:115% monospace;}
#ygrp-mlmsg * {line-height:1.22em;}
#ygrp-text{
font-family:Georgia;
}
#ygrp-text p{
margin:0 0 1em 0;}
#ygrp-tpmsgs{
font-family:Arial;
clear:both;}
#ygrp-vitnav{
padding-top:10px;font-family:Verdana;font-size:77%;margin:0;}
#ygrp-vitnav a{
padding:0 1px;}
#ygrp-actbar{
clear:both;margin:25px 0;white-space:nowrap;color:#666;text-align:right;}
#ygrp-actbar .left{
float:left;white-space:nowrap;}
.bld{font-weight:bold;}
#ygrp-grft{
font-family:Verdana;font-size:77%;padding:15px 0;}
#ygrp-ft{
font-family:verdana;font-size:77%;border-top:1px solid #666;
padding:5px 0;
}
#ygrp-mlmsg #logo{
padding-bottom:10px;}

#ygrp-vital{
background-color:#e0ecee;margin-bottom:20px;padding:2px 0 8px 8px;}
#ygrp-vital #vithd{
font-size:77%;font-family:Verdana;font-weight:bold;color:#333;text-transform:uppercase;}
#ygrp-vital ul{
padding:0;margin:2px 0;}
#ygrp-vital ul li{
list-style-type:none;clear:both;border:1px solid #e0ecee;
}
#ygrp-vital ul li .ct{
font-weight:bold;color:#ff7900;float:right;width:2em;text-align:right;padding-right:.5em;}
#ygrp-vital ul li .cat{
font-weight:bold;}
#ygrp-vital a {
text-decoration:none;}

#ygrp-vital a:hover{
text-decoration:underline;}

#ygrp-sponsor #hd{
color:#999;font-size:77%;}
#ygrp-sponsor #ov{
padding:6px 13px;background-color:#e0ecee;margin-bottom:20px;}
#ygrp-sponsor #ov ul{
padding:0 0 0 8px;margin:0;}
#ygrp-sponsor #ov li{
list-style-type:square;padding:6px 0;font-size:77%;}
#ygrp-sponsor #ov li a{
text-decoration:none;font-size:130%;}
#ygrp-sponsor #nc {
background-color:#eee;margin-bottom:20px;padding:0 8px;}
#ygrp-sponsor .ad{
padding:8px 0;}
#ygrp-sponsor .ad #hd1{
font-family:Arial;font-weight:bold;color:#628c2a;font-size:100%;line-height:122%;}
#ygrp-sponsor .ad a{
text-decoration:none;}
#ygrp-sponsor .ad a:hover{
text-decoration:underline;}
#ygrp-sponsor .ad p{
margin:0;}
o {font-size:0;}
.MsoNormal {
margin:0 0 0 0;}
#ygrp-text tt{
font-size:120%;}
blockquote{margin:0 0 0 4px;}
.replbq {margin:4;}
-->









____________________________________________________________________________________Get the Yahoo! toolbar and be alerted to new email wherever you're surfing.
http://new.toolbar.yahoo.com/toolbar/features/mail/index.php

[Non-text portions of this message have been removed]
purplepangolin
2007-05-22 13:36:31 UTC
Permalink
Post by Jason Bennett
So I guess a little background is required first. While I've worked with
databases for years now, it's mostly been in the context of
applications
Post by Jason Bennett
and O/R mapping - very normalized design, not much query crafting, etc.
I have a decent grasp of databases and SQL, but I don't know much of the
arcana.
Enter my new job: very transactionally intense, with a mix of OLTP and
reporting requirements. This means lots of PL/SQL, denormalization, and
query crafting. What I need is some books that can take me from where I
am to where I want to be. What queries are safe (few joins), which will
kill you (date ranges, lots of joins), and how to strike a good balance
when denormalizing without it coming back to hose you.
Aside from Scott's (all hail) Agile database books, what others would
people recommend?
jason
--
E pur si muove!
Get Firefox! - http://getfirefox.com
Three authors to look for:
Steven Feuerstein - Oracle PL/SQL Programming
Tom Kyte - Expert Oracle Database Architecture - 9i and 10g
Programming Techniques and Solutions
Jonathon Lewis - Cost Based Oracle: Fundamentals: v. 1

Tom Kyte has a great Q & A site (http://asktom.oracle.tom). Both Tom
and Jonathon will give you chapter and verse on how to tune your queries.
Andrew Gregovich
2007-06-07 01:56:02 UTC
Permalink
Curt

I like your arguments much better now - in fact I agree with most of them. I dislike SQL too for its lack of reusability but most of all for not being a true relational query language. Regarding optimizer hints, they are quite common nowadays, but as you said, it would be good to standardize them (but this may go against the vendors' need to differentiate themselves by offering new performance features).

On the other hand, I think your forecasts are still very unrealistic. Firstly, I'm pretty sure you wouldn't be able to get both the performance and disk/memory management features by using a functional language such as LISP, compared to C. You could use it in certain areas (e.g. optimizer) but you still have a tricky problem domain (e.g. how do you optimize "get all tuples from customers where age > X" and you don't know what X is). I suspect that the major vendors used both heuristic and pure theoretical approaches here, and for that you need time. Secondly, think of the whole DBMS, including concurrency control, backups, management consoles, monitoring/tuning features... It's a hell of a lot of work. For example, have a look at the history of InnoDB, which itself is really just a storage and concurrency management engine rather than a full RDBMS. It still took a couple of years with a
team lead by a pretty clever guy to develop. Now, this kind of limited scope product is
possible to achieve by small teams, but if you want to write something as encompassing as a major RDBMS and really make it worthwhile for someone other than academics to use, then really I wish all the best to you for being a very brave soul.

Cheers

Andrew

----- Original Message ----
From: Curt Sampson <***@cynic.net>
To: ***@yahoogroups.com
Sent: Wednesday, June 6, 2007 8:36:45 AM
Subject: Re: [agileDatabases] Database Eye for the Application Guy
Post by Andrew Gregovich
And please note that you could be perceived as arrogant or ignorant
(or both) in implying that the features such as advanced cost-based
optimizers or various indexing/hashing techniques can be created with a
3 man-year effort.
I wouldn't call the indexing and hashing techniques used by modern

RDBMSes "advanced." (At least the ones I'm aware of, though I'd be

interested in pointers to literature showing otherwise.) A B-tree, in

this day and age, is hardly rocket science.



At any rate, and being a programmer I admit that I may well be typical

of the breed and overly optimistic here, I would think that if you had

a good team of 4-8 people, were starting from scratch, and were using a

good language for most of your work (Haskell, LISP, Scheme, that sort

of thing), you could write a new RDBMS as sophisticated and performant

as any already out there in three years. (I'm assuming that you don't

support SQL or any of the non-relational crap that clutters up a lot

of the modern systems and makes them much more difficult to write,

optimize, and use.)
Post by Andrew Gregovich
Rather that throwing blank statements (RDBMSs suck), can you come up
with a concrete example of where they fall short, how this could be
addressed and yet maintain all of the standard features and performance
that they offer nowadays.
Well, my dream product would include the following features:



1. A decent query langage to replace SQL. In particular, the code should

much more clearly reflect the relational operations being performed, and

it should be easy to modularize parts of queries and re-use those parts

elsewhere.



2. A separate language that can be used to annotate queries to give the

DBMS hints or instructions on how to increase performance for queries.

This must be guaranteed not to change the logic of a query, only the

performance.



3. A similar logical versus physical split for the database schema. For

example, I ought to be able to have two relations stored physically as

their join while not seeing that reflected at all in the logical model,

especially in terms of integrity, but just in the performance of the

system. On the logical side, I ought to be able to create a derived

relation (a view, in SQL terms), drop the relations from which it was

derived, and not lose the derived relation; the storage management

system ought to know what data are needed by the logical model, and

just deal with it, except when giving specific performance and storage

instructions separate from the logical model.



cjs
--
Curt Sampson <***@cynic.net> +81 90 7737 2974

http://www.starling -software. com

The power of accurate observation is commonly called cynicism

by those who have not got it. --George Bernard Shaw












<!--

#ygrp-mlmsg {font-size:13px;font-family:arial, helvetica, clean, sans-serif;}
#ygrp-mlmsg table {font-size:inherit;font:100%;}
#ygrp-mlmsg select, input, textarea {font:99% arial, helvetica, clean, sans-serif;}
#ygrp-mlmsg pre, code {font:115% monospace;}
#ygrp-mlmsg * {line-height:1.22em;}
#ygrp-text{
font-family:Georgia;
}
#ygrp-text p{
margin:0 0 1em 0;}
#ygrp-tpmsgs{
font-family:Arial;
clear:both;}
#ygrp-vitnav{
padding-top:10px;font-family:Verdana;font-size:77%;margin:0;}
#ygrp-vitnav a{
padding:0 1px;}
#ygrp-actbar{
clear:both;margin:25px 0;white-space:nowrap;color:#666;text-align:right;}
#ygrp-actbar .left{
float:left;white-space:nowrap;}
.bld{font-weight:bold;}
#ygrp-grft{
font-family:Verdana;font-size:77%;padding:15px 0;}
#ygrp-ft{
font-family:verdana;font-size:77%;border-top:1px solid #666;
padding:5px 0;
}
#ygrp-mlmsg #logo{
padding-bottom:10px;}

#ygrp-vital{
background-color:#e0ecee;margin-bottom:20px;padding:2px 0 8px 8px;}
#ygrp-vital #vithd{
font-size:77%;font-family:Verdana;font-weight:bold;color:#333;text-transform:uppercase;}
#ygrp-vital ul{
padding:0;margin:2px 0;}
#ygrp-vital ul li{
list-style-type:none;clear:both;border:1px solid #e0ecee;
}
#ygrp-vital ul li .ct{
font-weight:bold;color:#ff7900;float:right;width:2em;text-align:right;padding-right:.5em;}
#ygrp-vital ul li .cat{
font-weight:bold;}
#ygrp-vital a {
text-decoration:none;}

#ygrp-vital a:hover{
text-decoration:underline;}

#ygrp-sponsor #hd{
color:#999;font-size:77%;}
#ygrp-sponsor #ov{
padding:6px 13px;background-color:#e0ecee;margin-bottom:20px;}
#ygrp-sponsor #ov ul{
padding:0 0 0 8px;margin:0;}
#ygrp-sponsor #ov li{
list-style-type:square;padding:6px 0;font-size:77%;}
#ygrp-sponsor #ov li a{
text-decoration:none;font-size:130%;}
#ygrp-sponsor #nc {
background-color:#eee;margin-bottom:20px;padding:0 8px;}
#ygrp-sponsor .ad{
padding:8px 0;}
#ygrp-sponsor .ad #hd1{
font-family:Arial;font-weight:bold;color:#628c2a;font-size:100%;line-height:122%;}
#ygrp-sponsor .ad a{
text-decoration:none;}
#ygrp-sponsor .ad a:hover{
text-decoration:underline;}
#ygrp-sponsor .ad p{
margin:0;}
o {font-size:0;}
.MsoNormal {
margin:0 0 0 0;}
#ygrp-text tt{
font-size:120%;}
blockquote{margin:0 0 0 4px;}
.replbq {margin:4;}
-->









____________________________________________________________________________________
Get the Yahoo! toolbar and be alerted to new email wherever you're surfing.
http://new.toolbar.yahoo.com/toolbar/features/mail/index.php

[Non-text portions of this message have been removed]
Curt Sampson
2007-07-23 08:20:39 UTC
Permalink
Post by Andrew Gregovich
Regarding optimizer hints, they are quite common nowadays, but as you
said, it would be good to standardize them....
Actually, I don't care that much about standardizing them, but I care
a lot about *separation* of them. When I'm writing my query, I want to
see just the query, and not worry about the optimization of it. I want
to organize it in the way that makes most sense for the logical query,
rather than move things around, or change subselects for joins, in order
to make the query perform better.

If I change the query such that the optimization hints now become
inconsistent with it, the DBMS should probably warn me, but use the
hints as best it can for the parts of the query with which they are
consistent.
Post by Andrew Gregovich
On the other hand, I think your forecasts are still very unrealistic.
Firstly, I'm pretty sure you wouldn't be able to get both the
performance and disk/memory management features by using a functional
language such as LISP, compared to C.
For a DBMS, you absolutely would, for a couple of reasons.

First, even heavily optimized C is usually only two to five times as
fast as the output of decent compilers for other languages. And I
expect, from having read a bunch of the PostgreSQL source code, that
most DBMSes aren't using heavily optimized C anyway; programmers add a
lot of indirection and other performance killers (from the point of view
of executing the minimum number of instructions and memory references)
so that the code is comprehensible and maintainable.

Also, depending on what you're doing, the better typing and other
information available in other languages can actually result in faster
code than a C compiler would ever be capable of producing, since that
can allow the compiler to make assumptions about the data accesses that
a C compiler is not allowed to make. There's a good example related to
some matrix calculations kicking around; if it's really important I'll
dig it up for you.

Second, CPU performance just isn't that important to a DBMS. DBMSes tend
to move a lot of data, relative to the size of things such as cache
lines and even entire caches, to the point where DBMS folks tend to
think of main memory as just another cache. You can get the details
from various books, such as _Transaction Processing_ or any computer
architecture book, but the well-known chart looks something like this:

Type Size Speed
registers 100s of bytes very fast
L1 cache 10s of kilobytes fast
L2 cache 100s of kilobytes medium
RAM 100s of megabytes slow
disk 100s of gigabytes very, very, very slow

Disk is the truly nasty one, of course, being orders of magnitude slower
than RAM, but even the cache/RAM interface is pretty bad: a cache miss
can cost you tens or even hundreds of instructions. There are plenty of
cases these days where it's faster to burn many CPU cycles to calculate
a value than to look it up from memory.

Note that when you're sitting there looking at the performance monitor
on your DBMS server, and all of the CPU stats are pinned, that doesn't
mean you're actually executing instructions with all of those cycles. A
lot of the time a CPU or core is just sitting there waiting for a cache
line to load, and there's nothing else it can do with that time.
Post by Andrew Gregovich
You could use it in certain areas (e.g. optimizer) but you still have
a tricky problem domain (e.g. how do you optimize "get all tuples from
customers where age > X" and you don't know what X is).
These sorts of tricky problems are exactly where more powerful languages
shine. That's one of the big factors that would enable you to write
optimizers for these sorts of problems so quickly.
Post by Andrew Gregovich
Secondly, think of the whole DBMS, including concurrency control,
backups, management consoles, monitoring/tuning features... It's a
hell of a lot of work.
True. I'd be addressing this in three ways:

1. Drop a lot of the stuff that simply provides more shiny ways of
doing things. E.g., make the management information available, but
don't bother writing and shipping a big graphical console app to
let people do this visually, at least not at first.

2. Dropping things that make no sense once your DBMS is truly
relational. There's a lot of non-relational cruft in Oracle, for
example.

3. The lack of need for backward compatability removes a whole class
of problems.
Post by Andrew Gregovich
Now, this kind of limited scope product is possible to achieve by
small teams, but if you want to write something as encompassing as
a major RDBMS and really make it worthwhile for someone other than
academics to use, then really I wish all the best to you for being a
very brave soul.
Well, I'm not quite brave enough to actually attempt it. Yet. But I do
believe I can see how it can be done. Keep in mind, we're talking about
building this on much better technology than that on which Oracle or DB2
was built.

(BTW, if it's not too painful for you, CC me on responses. I sometimes
stay away from the list for a while when I'm busy.)

cjs
--
Curt Sampson <***@cynic.net> +81 90 7737 2974
http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it. --George Bernard Shaw
Tansey, Lisa
2007-07-24 21:49:12 UTC
Permalink
Dawn - love your multi-table inferred join syntax! What are you using
for a database?

-Lisa


[Non-text portions of this message have been removed]
khopsicker
2007-08-10 17:30:25 UTC
Permalink
I've been a data/database professional for over 12 years. I've worked
with a lot of developers on tuning SQL and the over all application.
One book that I just recently discovered and started going through is
"SQL Design Patterns" by Vadim Tropashko. It's definitely not a
beginner's book, but once you get more experienced, there's a lot of
great information in there.

Ken Hopsicker
BoldTech Systems, Inc

Loading...