This is to minimize the influence of ongoing transactions within the operational systems on your following ETL processes. This is also to allow for "batch performance" treatments on the source data that you normally would not apply to your source systems. And finally, having the data locally gives you more expressivity when combining the data even with other data from the same source.
In an ongoing customer´s project, we were faced with exactly that situation and came to build a lot scripts of the form:
drop table stage.replicated_table;
create table stage.replicated_table(replicated_column1 replicated_data_type1 replicated_constraint1, replicated_column2 ...);
declare load_cursor load for select column1, column2,... from source_link_schema.source_table for read only with ur;
load from "load_cursor" method p(1,2,...) insert into stage.replicated_table(replicated_column1, replicated_column2, ...) nonrecoverable;
runstats on table stage.replicated_table ...;
each taking about 20 minutes in order to describe the original table and reformat the output into the create table syntax. Pffff.
For gods sake, some people at IBM do not look only good in their blue suits (still too few, I think I have to blog next my hate away about the ugly and time-consuming create view restriction when selecting non-unique column names) and invented the create table like syntax
drop table stage.replicated_table;
create table stage.replicated_table like sources_link_schema.source_table;
declare load_cursor load for select * from source_link_schema.source_table for read only with ur;
load from "load_cursor" method p(1,2,...) insert into stage.replicated_table nonrecoverable;
runstats on table stage.replicated_table ...;
1 minute. Runs fine. I´m happy.
Keine Kommentare:
Kommentar veröffentlichen