ROLAP Musings

Freitag, 30. April 2010

SQL Server Backup Compression

In Vincent Rainardis Blog you find a good explanation of Backup Compression for SQL Server 2008.

Montag, 15. März 2010

Change Data Capture

Vincent Rainardi explains in his Blog the Change Data Capture (CDC) mechanism of MS SQL Server and includes a short code example.

Mittwoch, 27. Januar 2010

Using Runtime-Collations To Mimique Different Target Orders

Here is a nice conclusion of oracles language & collation settings/capabilities.

Especially the usage of the

NLSSORT

function in conjunction with an

order by

statement can help you to present result sets in exactly that order that, e.g., a target system with a specific internationalization setting will interpret or need the data:

select * from source order by NLSSORT(sort_column,'NLS_FORMAT=xxx')

Useful format parameters are, e.g., binary or ascii7.

Mittwoch, 6. Januar 2010

Debugging Attribute Constraints in Dimension Tables

A solution that is especially useful wrt. SSAS postprocessing can be found here.

Mittwoch, 9. Dezember 2009

Maintaining Analyto-Operative Data using MS Access

Actually, this idea should rather have startet another "BI Musings" blog. But for the sake of simplicitely ...

Any advanced data warehouse/BI landscape will necessarily develop a bunch of (mostly: dimensional) data residing in the twilight zone between operational input and analytical output.
Examples are - from our experiences - localization resources, seasonal and other analytical taggings, organisational mappings, plausibility rules, technical annotations such as versions, modes, runs, etc., etc., etc. ... See, the hunger comes with the meal.

Most often, it is too expensive and/or inconvenient to put these data and the associated maintenance processes into the (legacy) operational systems where they should have been modelled in the first place.

On the other hand, the relational techie in charge of the DWH platform is seldom willing to care about the business meaning of those things, too.

We have tried various mechanisms to construct and maintain these beasts which range from "Direct-Editing Relational Tables", "Pre-Privoting in Excel and Mounting the Sheet as an External Table" into "Replaying SQL Scripts". Neither of those solutions have been very satisfying so far.

Our customer indeed gave us a hint to look at a tool that we have abandoned long before but that is still installed at most BI users computing devices along with Excel: MS Access!

Using Access to build up and maintain a normalized part of the dimensional model (e.g., we use individual tables for each hierarchy/attribute hosting technical and business keys, labels, descriptions, ranges, localizations, ....) eventually using forms make it a pleasure for business users.

By (graphically!) defining views in Access, it is quite easy to construct, pre-pivot, debug and even pre-chart the flattened dimensional outcome of the partial model. Outer joins (caution, mostly RIGHT JOINs in a typical star-like structure) help you to initially identify integrity problems when doing batch-input. After that, using referential integrity constraints, Access will help you and your users to derive at consistent intermediate versions of the data.

Finally, you can use the more-or-less two-button ODBC export facility to dump the complete result view into the datawarehouse stage/information schema of your choice. Depending on the sophistication of your driver (believe me, SQL-Server is well-supported ;-), type conversion could sometimes aberrate, but SSAS will remap whatever junk there comes into reasonable target types most of the time.

Transformation of parent-child-hierarchies into a multi-column representation

it's not rocket science but here is a query to transform a balanced parent-child-hierarchy into a multi-column hierarchy for a dimension table:

-- subquery factoring for the relevant part of the source table
with
groups as
(select groupid 
      , groupname
      , parentid
   from ext_groups)
-- definition of the levels of the hierarchy
select ig0.groupid groupid_level0
     , ig0.groupname groupname_level0
     , ig1.groupid groupid_level1
     , ig1.groupname groupname_level1
     , ig2.groupid groupid_level2
     , ig2.groupname groupname_level2
     , ig3.groupid groupid_level3
     , ig3.groupname groupname_level3
     , ig4.groupid groupid_level4
     , ig4.groupname groupname_level4
  from 
       -- root-level with the restriction to nodes without a parent
       -- (in this case: partenid = 0)
       (select *
          from groups
         where parentid = 0
        ) ig0
       -- level 1
     , (select *
          from groups
        ) ig1
       -- level 2
     , (select *
          from groups
        ) ig2
       -- level 3
     , (select *
          from groups
        ) ig3
       -- level 4 (= branch-level)
     , (select *
          from groups
        ) ig4
 where ig1.parentid = ig0.groupid
   and ig2.parentid = ig1.groupid
   and ig3.parentid = ig2.groupid
   and ig4.parentid = ig3.groupid
 order by ig0.groupid
        , ig1.groupid
        , ig2.groupid
        , ig3.groupid
        , ig4.groupid;

The first subquery ig0 gets all the nodes from the root level, then these rows are joined to ig1 via ig1.parentid = ig0.groupid and every following level is connected in the same way. Of course you have to know the depth of the hierarchy to choose the fitting number of subqueries.

The approach should also work for unbalanced (=ragged) hierarchies, but then you have to use outer joins (and maybe to replace NULL-values by a known parent-entry).

Somewhat related is the information that Oracle 11 Release 2 has introduced recursive subquery factoring, so you can avoid the ''connect by'' syntax if you like. Rob van Wijk explains some pros and cons of both options in his Blog.

Mittwoch, 2. Dezember 2009

Never Ever!

Never ever forget to add the nonrecoverable keyword when doing a load into a tablespace/database without active backup/recovery!

Especially when talking about information schema tablespaces, this could result in a lot of developers waiting for a several-hours backup job to finish hopefully
before the nightly ETL´s creating views, dropping old tables, running statistics etc ...

No more happy. Lasted for just half an hour. Think I need a beer or two.