One application I wrote years ago was designed to store and print mill test reports. Mill test reports, also known as certified mill test reports, contain the elemental analysis of metals. Usually these are steel metals. They can also include physical properties, but these won't be discussed here.

The first bit of information is the amount of each element (basic chemical elements) in the mix of steel. This includes elements like carbon, manganese, molybdenum, etc. I created a table with the manufacturer, the identifier from the manufacturer also known as a heat number, and then a column for each chemical. For column names, I used the element symbol.

This was very simple stuff. (This was a long time ago.) But then, I ran into a case where I forgot an element in the table definition and the screen. So a thought came to me: What if there was a 2 character column for the element symbol and then the next column would be the value for that element. This defies database normalization rules. Now the column would mean different things based on the value of another column.??I had recently come from the minicomputer world where this was standard practice. It's messy, but managable. It makes indexing, searching, and some queries more difficult, but it wasn't a challenge for reporting because you just print the element column followed by the value.

So we start with this:

CREATE TABLE chemical_analysis (
  manufacturer VARCHAR(30) NOT NULL,
  heat_number VARCHAR(30) NOT NULL,
  C float,
  Mn float,
  Mo float,
  /* 
     etc.
     and then we add this:
  */
  Element1 VARCHAR(2),
  Element1_Pct float,
  Element2 VARCHAR(2),
  Element2_Pct float
);

So at this point, there are a few problems here. How many user definable elements do we add? We need to add validation. The list of element symbols is finite, so we don't have to worry about what can be an element outside of chemestry. How do we make sense of this?

But if we think about allowing the user to define fields, there is another method which allows for better indexing of the data:

CREATE TABLE chemical_analysis (
  manufacturer VARCHAR(30) NOT NULL,
  heat_number VARCHAR(30) NOT NULL,
  C float,
  Mn float,
  Mo float,
  /* etc.
  and then we add this:
  */
  Element1_Pct float,
  Element2_Pct float
);
CREATE TABLE chemical_analysis_metadata (
  Element1 VARCHAR(2),
  Element2 VARCHAR(2)
);

Here, the definition of the elements in the chemical analysis table is externalized in a metadata table. This way, element1 is always the same and element 2 is always the same. Why you might say???Here's a scenario: You are selling this application to multiple customers. You don't know what elements these customers want to track. You want to give them flexibility to track whatever elements they need.

There are several negatives here: You don't now if you've created enough custom fields for your customers. You are making the schema more complex.??Many applications use this strategy, sometimes called flex fields or custom fields, and give you a finite number of fields you can use to extend your schema. This is a basic user defined field strategy that will allow your customers to add extra fields.

This is not what I would consider the best strategy for an enterprise application that is sold commercially. I'll get into that in future posts. This is something to look out for in applications that you may work with. I've seen it in ERP systems and other systems like Tech Excel's DevTrack. It works, but it's not very elegant. You then have to create different columns for different types of data if you need to store data that way. Either that or you use varchar types and type conversion.

This could be used in a SaaS context if the metadata was customer specific. Each customer could have their own column definitions and the columns would remain static for each customer separately.

A better solution would be to just normalize the data into two tables:

CREATE TABLE chemical_analysis_head (
  manufacturer VARCHAR(30) NOT NULL,
  heat_number VARCHAR(30) NOT NULL
);
CREATE TABLE chemical_analysis_elements (
  manufacturer VARCHAR(30) NOT NULL,
  heat_number VARCHAR(30) NOT NULL,
  element VARCHAR(2),
  percent float
);

I'm not getting into the whole surrogate keys thing here, but this is just a better design. This design avoids the need for user defined fields altogether. Hindsight is 20/20. Printing the elements in the right order and getting the report to look like you want is more of a challenge with this design, though. Also, validation may be more challenging in the code for missing elements.

Also, this is a database design and not an object-oriented design. In the object-oriented approach, the actual storage of the data may differ from the object representation of the data. I will cover that in future posts. An OO design??makes it easier to deal with validation and business rules in the business layer.