Posts in General Software
"We Are Sorry to Inform You..."
As the January 23rd deadline for paper submission for the Conference of the Association for Software Testing draws closer, I must say I found Simone Santini's Computer article We Are Sorry to Inform You... very entertaining.

My favorite paper rejection was the following:
E.F. CODD

"A Relational Model of Data for Large Shared Data Banks." This paper proposes that all data in a database be represented in the form of relations - sets of tuples - and that all the operations relative to data access be made on this model. Some of the ideas presented in the paper are interesting and may be of some use, but, in general, this very preliminary work fails to make a convincing point as to their implementation, performance, and practical usefulness. The paper's general point is that the tabular form presented should be suitable for general data access, but I see two problems with this statement: expressivity and efficiency.


The paper contains no real-world example to convince us that any model of practical interest can be cast in it. Quite the contrary, at first sight I doubt that anything complex enough to be of practical interest can be modeled using relations. The simplicity of the model prevents one from, for instance, representing hierarchies directly and forces their replacement with complicated systems of "foreign keys." In this situation, any realistic model might end up requiring dozens of interconnected tables-hardly a practical solution given that, probably, we can represent the same model using two or three properly formatted files.

Even worse, the paper contains no efficiency evaluation: There are no experiments with real or synthetic data to show how the proposed approach compares with traditional ones on real-world problems. The main reason for using specialized file formats is efficiency: Data can be laid out in such a way that the common access patterns are efficient. This paper proposes a model in which, to extract any significant answer from any real database, the user will end up with the very inefficient solution of doing a large number of joins. Yet we are given no experimental result or indication of how this solution might scale up.

The formalism is needlessly complex and mathematical, using concepts and notation with which the average data bank practitioner is unfamiliar. The paper doesn't tell us how to translate its arcane operations into executable block access.

Adding together the lack of any real-world example, performance experiment, and implementation indication or detail, we are left with an obscure exercise using unfamiliar mathematics and of little or no practical consequence. It can be safely rejected.

However, the rejections for Turing and Dijkstra were close seconds.

Don't forget, January 23rd's the deadline. Get your papers in soon!