Sunday, November 28, 2010

data denormalization?? who cares?

This posting about getting data (your data, that is) into a place where it is more useful to your organization.

Let's say you have been keeping some data of yours (such as shipments) in an Excel spreadsheet. This is very common.

You have set up some very basic reporting using this data. You want other people in your organization to input their information into this "database".

You have a few options, for instance, you can:

  • give them the spreadsheet on a USB stick
  • can send them the spreadsheet thru email
  • store the spreadsheet on a fileshare where they can access it
  • upload the spreadsheet to the "cloud" such as Google Docs
  • upload the spreadsheet to your internal web store (maybe a SharePoint document library)


The problem with all of these methods is that the other people will still be inputting and editing their data using Excel. Besides for all of the conflicts associated with the methods above, the likelihood of out-of-date and erroneous data is still very good. Let's look at an example:








First NameLast NameSt
JoeSmithMD
SallyJonesNV
SueKwonCal
BillLyonCA


If you saw this data in Excel, does it look clean? How about the St column for Sue Kwon. You might guess St means state, but by looking only at the first row, it might mean occupation.

Many "home brew" data sets are designed like this. A way to keep it cleaner is to have St be a lookup field, where St is the State and comes from another "location" where there is only one entry for California (not Cal or CA)!

This can be done with Excel using the =VLOOKUP() function (see Ref below).

When data is changed to behave like this, the data is said to be "normalized". There are a couple of other database-techie components to this definition, but we will only work on this one for now.

When the data is in the form above, it is said to be "denormalized" and is subject to all manner of data cleansing problems.

The next posting will go into how data like this can be stored in Sharepoint, in a normalized fashion using "no code".

Regards..

Ref: Learning VLOOKUP in Excel

No comments:

Post a Comment