<?xml version='1.0' encoding='UTF-8'?><rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:blogger="http://schemas.google.com/blogger/2008" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" version="2.0"><channel><atom:id>tag:blogger.com,1999:blog-2860188939685086745</atom:id><lastBuildDate>Fri, 13 Mar 2026 07:52:07 +0000</lastBuildDate><title>Dataware Housing</title><description></description><link>http://informaticatutors.blogspot.com/</link><managingEditor>noreply@blogger.com (Naveen)</managingEditor><generator>Blogger</generator><openSearch:totalResults>10</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2860188939685086745.post-5445844139114845496</guid><pubDate>Tue, 07 Jun 2011 12:38:00 +0000</pubDate><atom:updated>2011-06-07T05:38:48.311-07:00</atom:updated><title>Wishing Thanks to My Readers.</title><description>&lt;p&gt;Hi reader’s,&lt;/p&gt;  &lt;p&gt;I’m very much happy to see that lot of peoples from all part of globe have visited my blog and visitor’s count has crossed 3000 + with in 40 days of starting the blog. I would request my readers to start posting there valuable comments and requests so that we can make much more solicit &amp;amp; interactive blog. I had also noticed that the number of viewers who visiting my SQL or Data warehousing blog is pretty low as compared to informatica. I would also request my visitors to look in to them as I had spent a lot of time in making a content for it and which is very useful when your are at work or can also be used as the quick reference guide before taking up any interview. &lt;/p&gt;  &lt;p&gt;I’m planning to publish a book on informatica soon which will all major concepts of the data warehousing in detail and Sql concepts used by an ETL consultants, Basic Unix Commands and Introduction to the Scheduler tools like autosys, Tivoli and off course Informatica which has brought me all this fame.&lt;/p&gt;  &lt;p&gt;I would highly appreciate if few people come forward and post the interview questions or the scenarios which they have encountered in there day to day work for discussion at the discussion board.&lt;/p&gt;  </description><link>http://informaticatutors.blogspot.com/2011/06/wishing-thanks-to-my-readers.html</link><author>noreply@blogger.com (Naveen)</author><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2860188939685086745.post-7691539113095081270</guid><pubDate>Thu, 05 May 2011 13:16:00 +0000</pubDate><atom:updated>2011-05-05T06:16:00.066-07:00</atom:updated><title>Staging Area</title><description>&lt;p&gt;&lt;b&gt;&lt;/b&gt;&lt;/p&gt;  &lt;p&gt;&lt;b&gt;What is Staging area why we need it in DWH?&lt;/b&gt;&lt;/p&gt;  &lt;p&gt;If target and source databases are different and target table volume is high it contains some millions of records in this scenario without staging table we need to design your informatica using look up&amp;#160; to find out whether the record exists or not in the target table since target has huge volumes so its costly to create cache it will hit the performance.&lt;/p&gt;  &lt;p&gt;If we create staging tables in the target database we can&amp;#160;&amp;#160; simply do outer join in the source qualifier to determine insert/update this approach will give you good performance.&lt;/p&gt;  &lt;p&gt; It will avoid full table scan to determine insert/updates on target.And also we can create index on staging tables since these tables were designed for specific application it will not impact to any other&amp;#160; schemas/users.&lt;/p&gt;  &lt;p&gt; While processing flat files to data warehousing we can perform cleansing. &lt;/p&gt;  &lt;p&gt;Data cleansing, also known as &lt;a href=&quot;http://www.wisegeek.com/what-is-data-scrubbing.htm&quot;&gt;data scrubbing&lt;/a&gt;, is the process of ensuring&amp;#160; that a set of data is correct and accurate. During data cleansing, records are checked for accuracy and consistency.&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Since it is one-to-one mapping from ODS to staging we do truncate and reload.&lt;/li&gt;    &lt;li&gt;We can create indexes in the staging state, to perform our source qualifier best.&lt;/li&gt;    &lt;li&gt;If we have the staging area no need to relay on the informatics transformation to known whether the record exists or not.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;b&gt;Data cleansing&lt;/b&gt;    &lt;br /&gt;Weeding out unnecessary or unwanted things (characters and spaces etc) from incoming data to make it more meaningful and informative&lt;/p&gt;  &lt;p&gt;&lt;b&gt;Data merging&lt;/b&gt;    &lt;br /&gt;Data can be gathered from heterogeneous systems and put together&lt;/p&gt;  &lt;p&gt;&lt;b&gt;Data scrubbing&lt;/b&gt;&lt;/p&gt;  &lt;p&gt;Data scrubbing is the process of fixing or eliminating individual pieces of data that are incorrect, incomplete or duplicated before the data is passed to end user.   &lt;br /&gt;Data scrubbing is aimed at more than eliminating errors and redundancy. The goal is also to bring consistency to various data sets that may have been created with different, incompatible business rules.&lt;/p&gt;  </description><link>http://informaticatutors.blogspot.com/2011/05/staging-area.html</link><author>noreply@blogger.com (Naveen)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2860188939685086745.post-3984562033686102568</guid><pubDate>Thu, 05 May 2011 13:09:00 +0000</pubDate><atom:updated>2011-05-05T06:09:10.990-07:00</atom:updated><title>Difference between Datamarts and Data warehouse</title><description>&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;2&quot; width=&quot;467&quot; border=&quot;2&quot;&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign=&quot;top&quot; width=&quot;219&quot;&gt;&lt;strong&gt;Datamart&lt;/strong&gt;&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;244&quot;&gt;&lt;strong&gt;Data Warehouse&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign=&quot;top&quot; width=&quot;219&quot;&gt;1.Data mart is usually sponsored at the department level and developed with a specific issue or subject in mind, a data mart is a data warehouse with a focused objective.&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;244&quot;&gt;1.Data warehouse is a “Subject-Oriented, Integrated, Time-Variant, Nonvolatile collection of data in support of decision making”.&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign=&quot;top&quot; width=&quot;219&quot;&gt;2.A data mart is used on a business division/ department level.&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;244&quot;&gt;2.A data warehouse is used on an enterprise level.&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign=&quot;top&quot; width=&quot;219&quot;&gt;3.A Data Mart is a subset of data from a Data Warehouse. Data Marts are built for specific user groups.&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;244&quot;&gt;3.A Data Warehouse is simply an integrated consolidation of data from a variety of sources that is specially designed to support strategic and tactical decision making.&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign=&quot;top&quot; width=&quot;219&quot;&gt;4.By providing decision makers with only a subset of data from the Data Warehouse, Privacy, Performance and Clarity Objectives can be attained.&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;244&quot;&gt;4.The main objective of Data Warehouse is to provide an integrated environment and coherent picture of the business at a point in time.&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  </description><link>http://informaticatutors.blogspot.com/2011/05/difference-between-datamarts-and-data.html</link><author>noreply@blogger.com (Naveen)</author><thr:total>1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2860188939685086745.post-8867697213949457753</guid><pubDate>Tue, 22 Mar 2011 07:19:00 +0000</pubDate><atom:updated>2011-03-22T00:46:12.156-07:00</atom:updated><title>Difference Between OLTP Vs. OLAP</title><description>&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;2&quot; width=&quot;487&quot; border=&quot;0&quot;&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;&amp;#160;&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;187&quot;&gt;&lt;strong&gt;&lt;font color=&quot;#0000ff&quot;&gt;OLTP&lt;/font&gt;&lt;/strong&gt; &lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;204&quot;&gt;&lt;strong&gt;&lt;font color=&quot;#0000ff&quot;&gt;OLAP&lt;/font&gt;&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;Source of data&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;187&quot;&gt;Operational data; OLTPs are the original source of the data&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;204&quot;&gt;Consolidation data; OLAP data comes from the various OLTP Databases&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;Purpose of data&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;187&quot;&gt;To control and run fundamental business tasks&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;204&quot;&gt;To help with planning, problem solving, and decision support&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;What the data&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;187&quot;&gt;Reveals a snapshot of ongoing business processes&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;204&quot;&gt;Multi-dimensional views of various kinds of business activities&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;Inserts and Updates&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;187&quot;&gt;Short and fast inserts and updates initiated by end users&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;204&quot;&gt;Periodic long-running batch jobs refresh the data&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;Processing Speed&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;187&quot;&gt;Typically very fast&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;204&quot;&gt;Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;Queries&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;187&quot;&gt;Relatively standardized and simple queries Returning relatively few records&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;204&quot;&gt;Often complex queries involving aggregations&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;         &lt;p&gt;Space Requirements&lt;/p&gt;       &lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;187&quot;&gt;Can be relatively small if historical data is archived&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;204&quot;&gt;Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;Database Design &lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;187&quot;&gt;Highly normalized with many tables&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;204&quot;&gt;Typically de-normalized with fewer tables; use of star and/or snowflake schemas&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;Backup and Recovery&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;187&quot;&gt;Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability&lt;/td&gt;        &lt;td valign=&quot;top&quot; width=&quot;204&quot;&gt;Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method.&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;font size=&quot;1&quot;&gt; Source : &lt;/font&gt;&lt;font size=&quot;1&quot;&gt;http://datawarehouse4u.info/OLTP-vs-OLAP.html&lt;/font&gt;&lt;/p&gt;  </description><link>http://informaticatutors.blogspot.com/2011/03/difference-between-oltp-vs-olap.html</link><author>noreply@blogger.com (Naveen)</author><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2860188939685086745.post-4413847628313538629</guid><pubDate>Tue, 22 Mar 2011 07:10:00 +0000</pubDate><atom:updated>2011-03-22T00:11:00.004-07:00</atom:updated><title>Data mining</title><description>&lt;p&gt;&lt;font size=&quot;2&quot;&gt;&lt;strong&gt;Data mining&lt;/strong&gt; is the process of extracting patterns from large &lt;/font&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Data_set&quot;&gt;&lt;font size=&quot;2&quot;&gt;data sets&lt;/font&gt;&lt;/a&gt;&lt;font size=&quot;2&quot;&gt; by combining methods from &lt;/font&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Statistics&quot;&gt;&lt;font size=&quot;2&quot;&gt;statistics&lt;/font&gt;&lt;/a&gt;&lt;font size=&quot;2&quot;&gt; and &lt;/font&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Artificial_intelligence&quot;&gt;&lt;font size=&quot;2&quot;&gt;artificial intelligence&lt;/font&gt;&lt;/a&gt;&lt;font size=&quot;2&quot;&gt; with &lt;/font&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Database_management&quot;&gt;&lt;font size=&quot;2&quot;&gt;database management&lt;/font&gt;&lt;/a&gt;&lt;font size=&quot;2&quot;&gt;. &lt;/font&gt;&lt;/p&gt;  &lt;h4&gt;Process:&lt;/h4&gt;  &lt;h5&gt;1.&lt;font color=&quot;#0000ff&quot;&gt;Pre-processing&lt;/font&gt; &lt;/h5&gt;  &lt;p&gt;Before data mining algorithms can be used, a target data set must be assembled. As data mining can only uncover patterns already present in the data, the target dataset must be large enough to contain these patterns while remaining concise enough to be mined in an acceptable timeframe. A common source for data is a &lt;a href=&quot;http://en.wikipedia.org/wiki/Datamart&quot;&gt;data mart&lt;/a&gt; or &lt;a href=&quot;http://en.wikipedia.org/wiki/Data_warehouse&quot;&gt;data warehouse&lt;/a&gt;. Pre-process is essential to analyze the multivariate datasets before clustering or data mining.&lt;/p&gt;  &lt;p&gt;The target set is then cleaned. Cleaning removes the observations with noise and missing data.&lt;/p&gt;  &lt;p&gt;The clean data are reduced into &lt;a href=&quot;http://en.wikipedia.org/wiki/Feature_vector&quot;&gt;feature vectors&lt;/a&gt;, one vector per observation. A feature vector is a summarized version of the raw data observation. The feature(s) selected will depend on what the objective(s) is/are; obviously, selecting the &amp;quot;right&amp;quot; feature(s) is fundamental to successful data mining.&lt;/p&gt;  &lt;p&gt;The feature vectors are divided into two sets, the &amp;quot;training set&amp;quot; and the &amp;quot;test set&amp;quot;. The training set is used to &amp;quot;train&amp;quot; the data mining algorithm(s), while the test set is used to verify the accuracy of any patterns found.&lt;/p&gt;  &lt;h5&gt;2.&lt;font color=&quot;#0000ff&quot;&gt;Data mining&lt;/font&gt;&lt;/h5&gt;  &lt;p&gt;Data mining commonly involves four classes of tasks:&lt;sup&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Data_mining#cite_note-Fayyad-12&quot;&gt;[13]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Cluster_analysis&quot;&gt;&lt;strong&gt;Clustering&lt;/strong&gt;&lt;/a&gt; – is the task of discovering groups and structures in the data that are in some way or another &amp;quot;similar&amp;quot;, without using known structures in the data. &lt;/li&gt;    &lt;li&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Statistical_classification&quot;&gt;&lt;strong&gt;Classification&lt;/strong&gt;&lt;/a&gt; – is the task of generalizing known structure to apply to new data. For example, an email program might attempt to classify an email as legitimate or spam. Common algorithms include &lt;a href=&quot;http://en.wikipedia.org/wiki/Decision_tree_learning&quot;&gt;decision tree learning&lt;/a&gt;, &lt;a href=&quot;http://en.wikipedia.org/wiki/Nearest_neighbor_(pattern_recognition)&quot;&gt;nearest neighbor&lt;/a&gt;, &lt;a href=&quot;http://en.wikipedia.org/wiki/Naive_Bayes_classifier&quot;&gt;naive Bayesian classification&lt;/a&gt;, &lt;a href=&quot;http://en.wikipedia.org/wiki/Artificial_neural_networks&quot;&gt;neural networks&lt;/a&gt; and &lt;a href=&quot;http://en.wikipedia.org/wiki/Support_vector_machines&quot;&gt;support vector machines&lt;/a&gt;. &lt;/li&gt;    &lt;li&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Regression_analysis&quot;&gt;&lt;strong&gt;Regression&lt;/strong&gt;&lt;/a&gt; – Attempts to find a function which models the data with the least error. &lt;/li&gt;    &lt;li&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Association_rule_learning&quot;&gt;&lt;strong&gt;Association rule learning&lt;/strong&gt;&lt;/a&gt; – Searches for relationships between variables. For example a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.&lt;/li&gt; &lt;/ul&gt;  &lt;h5&gt;3.&lt;font color=&quot;#0000ff&quot;&gt;Results validation&lt;/font&gt;&lt;/h5&gt;  &lt;p&gt;The final step of knowledge discovery from data is to verify the patterns produced by the data mining algorithms occur in the wider data set. Not all patterns found by the data mining algorithms are necessarily valid. It is common for the data mining algorithms to find patterns in the training set which are not present in the general data set, this is called &lt;a href=&quot;http://en.wikipedia.org/wiki/Overfitting&quot;&gt;overfitting&lt;/a&gt;. To overcome this, the evaluation uses a &lt;a href=&quot;http://en.wikipedia.org/wiki/Test_set&quot;&gt;test set&lt;/a&gt; of data which the data mining algorithm was not trained on. The learnt patterns are applied to this test set and the resulting output is compared to the desired output. For example, a data mining algorithm trying to distinguish spam from legitimate emails would be trained on a &lt;a href=&quot;http://en.wikipedia.org/wiki/Training_set&quot;&gt;training set&lt;/a&gt;of sample emails. Once trained, the learnt patterns would be applied to the test set of emails which it had not been trained on, the accuracy of these patterns can then be measured from how many emails they correctly classify. A number of statistical methods may be used to evaluate the algorithm such as &lt;a href=&quot;http://en.wikipedia.org/wiki/Receiver_operating_characteristic&quot;&gt;ROC curves&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;If the learnt patterns do not meet the desired standards, then it is necessary to reevaluate and change the preprocessing and data mining. If the learnt patterns do meet the desired standards then the final step is to interpret the learnt patterns and turn them into knowledge.&lt;/p&gt;  </description><link>http://informaticatutors.blogspot.com/2011/03/data-mining.html</link><author>noreply@blogger.com (Naveen)</author><thr:total>1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2860188939685086745.post-3139836845827897703</guid><pubDate>Tue, 22 Mar 2011 06:49:00 +0000</pubDate><atom:updated>2011-03-21T23:49:09.885-07:00</atom:updated><title>Data modeling</title><description>&lt;p&gt;&lt;b&gt;Data modeling&lt;/b&gt; is the process of creating a &lt;a href=&quot;http://en.wikipedia.org/wiki/Data_model&quot;&gt;data model&lt;/a&gt; by applying formal data model descriptions using data modeling techniques.&lt;/p&gt;  &lt;p&gt;In other words Data modeling can be defined as a &lt;a href=&quot;http://en.wikipedia.org/wiki/Software_development_process&quot;&gt;method&lt;/a&gt; used to define and analyze data &lt;a href=&quot;http://en.wikipedia.org/wiki/Requirement&quot;&gt;requirements&lt;/a&gt; needed to support the &lt;a href=&quot;http://en.wikipedia.org/wiki/Business_process&quot;&gt;business processes&lt;/a&gt; of an organization.&lt;/p&gt;  &lt;h5&gt;&lt;font color=&quot;#0000ff&quot;&gt;Conceptual, logical and physical schemes : &lt;/font&gt;&lt;/h5&gt;  &lt;ul&gt;   &lt;li&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Conceptual_schema&quot;&gt;&lt;strong&gt;Conceptual schema&lt;/strong&gt;&lt;/a&gt;: This consists of entity classes, representing kinds of things of significance in the domain, and relationships assertions about associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. &lt;/li&gt;    &lt;li&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Logical_schema&quot;&gt;&lt;strong&gt;Logical schema&lt;/strong&gt;&lt;/a&gt;:&amp;#160; This consists of descriptions of tables and columns, object oriented classes, and XML tags, among other things. &lt;/li&gt;    &lt;li&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Physical_schema&quot;&gt;&lt;strong&gt;Physical schema&lt;/strong&gt;&lt;/a&gt;: describes the physical means by which data are stored. This is concerned with partitions, CPUs, table spaces, and the like.&lt;/li&gt; &lt;/ul&gt;  &lt;h5&gt;&lt;font color=&quot;#0000ff&quot;&gt;Modeling methodologies : &lt;/font&gt;&lt;/h5&gt;  &lt;p&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Data_model&quot;&gt;Data models&lt;/a&gt; represent information areas of interest. While there are many ways to create data models, But only two modeling methodologies standard&amp;#160; top-down and bottom-up are used in real time environment : &lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;strong&gt;Bottom-up models : &lt;/strong&gt;are often the result of a &lt;a href=&quot;http://en.wikipedia.org/wiki/Reengineering&quot;&gt;reengineering&lt;/a&gt; effort. They usually start with existing data structures forms, fields on application screens, or reports. These models are usually physical, application-specific, and incomplete from an &lt;a href=&quot;http://en.wikipedia.org/wiki/Enterprise_architecture&quot;&gt;enterprise perspective&lt;/a&gt;. They may not promote data sharing, especially if they are built without reference to other parts of the organization.&lt;/li&gt;    &lt;li&gt;&lt;strong&gt;Top-down &lt;/strong&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Logical_data_model&quot;&gt;&lt;strong&gt;logical data models&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;,&lt;/strong&gt; on the other hand, are created in an abstract way by getting information from people who know the subject area. A system may not implement all the entities in a logical model, but the model serves as a reference point or template.&lt;/li&gt; &lt;/ul&gt;  </description><link>http://informaticatutors.blogspot.com/2011/03/data-modeling.html</link><author>noreply@blogger.com (Naveen)</author><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2860188939685086745.post-5851172068506430513</guid><pubDate>Tue, 22 Mar 2011 06:36:00 +0000</pubDate><atom:updated>2011-03-21T23:36:38.248-07:00</atom:updated><title>Dimensional Modeling</title><description>&lt;p&gt;&lt;b&gt;Dimensional modeling&lt;/b&gt; (DM) is the name of a set of techniques and concepts used in &lt;a href=&quot;http://en.wikipedia.org/wiki/Data_warehouse&quot;&gt;data warehouse&lt;/a&gt; design. Dimensional modeling always uses the concepts of facts (measures), and dimensions (context).&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;font color=&quot;#0000ff&quot;&gt;Facts :&lt;/font&gt;&lt;/strong&gt; Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts.&lt;/p&gt;  &lt;h5&gt;&lt;font color=&quot;#0000ff&quot; size=&quot;2&quot;&gt;Types of Facts : &lt;/font&gt;&lt;/h5&gt;  &lt;p&gt;There are three types of facts:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;b&gt;Additive&lt;/b&gt;: Additive facts are facts that can be summed up through all of the dimensions in the fact table. &lt;/li&gt;    &lt;li&gt;&lt;b&gt;Semi-Additive&lt;/b&gt;: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. &lt;/li&gt;    &lt;li&gt;&lt;b&gt;Non-Additive&lt;/b&gt;: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.&lt;/li&gt; &lt;/ul&gt;  &lt;h5&gt;&lt;font color=&quot;#0000ff&quot; size=&quot;2&quot;&gt;Types of Fact Tables &lt;/font&gt;&lt;/h5&gt;  &lt;p&gt;There are two types of fact tables:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;b&gt;Cumulative&lt;/b&gt;: This type of fact table describes what has happened over a period of time. For example, this fact table may describe the total sales by product by store by day. The facts for this type of fact tables are mostly additive facts. The first example presented here is a cumulative fact table. &lt;/li&gt;    &lt;li&gt;&lt;b&gt;Snapshot&lt;/b&gt;: This type of fact table describes the state of things in a particular instance of time, and usually includes more semi-additive and non-additive facts. The second example presented here is a snapshot fact table.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;font color=&quot;#0000ff&quot;&gt;&lt;font size=&quot;2&quot;&gt;&lt;strong&gt;Dimension :&lt;/strong&gt; A&lt;font color=&quot;#000000&quot;&gt; dimension is a &lt;/font&gt;&lt;/font&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Data_element&quot;&gt;&lt;font color=&quot;#000000&quot; size=&quot;2&quot;&gt;data element&lt;/font&gt;&lt;/a&gt;&lt;font color=&quot;#000000&quot; size=&quot;2&quot;&gt; that categorizes each item in a &lt;/font&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Data_set&quot;&gt;&lt;font color=&quot;#000000&quot; size=&quot;2&quot;&gt;data set&lt;/font&gt;&lt;/a&gt;&lt;/font&gt;&lt;font color=&quot;#000000&quot; size=&quot;2&quot;&gt; into non-overlapping regions. A data warehouse dimension provides the means to &amp;quot;slice and dice&amp;quot; data in a data warehouse. Dimensions provide structured labeling information to otherwise unordered numeric measures.&lt;/font&gt;&lt;/p&gt;  &lt;h4&gt;&lt;font color=&quot;#0000ff&quot; size=&quot;2&quot;&gt;Types of Dimension :&lt;/font&gt;&lt;/h4&gt;  &lt;h4&gt;&lt;font size=&quot;2&quot;&gt;Conformed dimension :&lt;/font&gt; &lt;/h4&gt;  &lt;p&gt;In data warehousing, a conformed dimension is a &lt;a href=&quot;http://searchdatamanagement.techtarget.com/definition/dimension&quot;&gt;dimension&lt;/a&gt; that has the same meaning to every fact with which it relates. Conformed dimensions allow facts and measures to be categorized and described in the same way across multiple facts and/or &lt;a href=&quot;http://searchsqlserver.techtarget.com/sDefinition/0,290660,sid87_gci211900,00.html&quot;&gt;data marts&lt;/a&gt;, ensuring consistent reporting across the enterprise.&lt;/p&gt;  &lt;p&gt;A conformed dimension can exist as a single dimension table that relates to multiple fact tables within the same &lt;a href=&quot;http://searchsqlserver.techtarget.com/sDefinition/0,290660,sid87_gci211904,00.html&quot;&gt;data warehouse&lt;/a&gt;, or as identical dimension tables in separate data marts. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;font size=&quot;2&quot;&gt;Junk dimension : &lt;/font&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;A Junk Dimension is a dimension table consisting of attributes that do not belong in the fact table or in any of the existing dimension tables&lt;font size=&quot;2&quot;&gt;&lt;strong&gt; .&lt;/strong&gt;The junk dimension should contain a single row representing the blanks as a surrogate key that will be used in the fact table for every row returned with a blank comment field&lt;sup&gt;.&lt;/sup&gt;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;The designer is faced with the challenge of where to put&amp;#160; attributes that do not belong in the other dimensions,Solution is to create a new dimension for each of the remaining attributes, but due to their nature, it could be necessary to create a vast number of new dimensions resulting in a fact table with a very large number of foreign keys.&lt;/p&gt;  &lt;h5&gt;Degenerate dimension : &lt;/h5&gt;  &lt;p&gt;A dimension key, such as a transaction number, invoice number, ticket number, or bill-of-lading number, that has no attributes and hence does not join to an actual dimension table. Degenerate dimensions are very common when the grain of a fact table represents a single transaction item or line item because the degenerate dimension represents the unique identifier of the parent. Degenerate dimensions often play an integral role in the fact table&#39;s primary key.&lt;/p&gt;  &lt;h4&gt;&lt;font color=&quot;#0000ff&quot;&gt;Dimensional modeling structure:&lt;/font&gt;&lt;/h4&gt;  &lt;p&gt;The dimensional model is built on a star-like schema, with dimensions surrounding the fact table. To build the schema, the following design model is used:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Choose the business process &lt;/li&gt;    &lt;li&gt;Declare the Grain &lt;/li&gt;    &lt;li&gt;Identify the dimensions &lt;/li&gt;    &lt;li&gt;Identify the Fact&lt;/li&gt; &lt;/ol&gt;  &lt;h4&gt;&lt;font color=&quot;#0000ff&quot;&gt;Benefits of dimensional modeling :&lt;/font&gt;&lt;/h4&gt;  &lt;p&gt;Benefits of the dimensional modeling are following:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Understandability&lt;/li&gt;    &lt;li&gt;Query performance&lt;/li&gt;    &lt;li&gt;Extensibility&lt;/li&gt; &lt;/ol&gt;  </description><link>http://informaticatutors.blogspot.com/2011/03/dimensional-modeling.html</link><author>noreply@blogger.com (Naveen)</author><thr:total>1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2860188939685086745.post-8257633033043313712</guid><pubDate>Tue, 22 Mar 2011 06:07:00 +0000</pubDate><atom:updated>2011-03-21T23:07:46.241-07:00</atom:updated><title>Data Warehouse Schemas</title><description>&lt;p&gt;A schema is a collection of database objects, including tables, views, indexes, and synonyms. You can arrange schema objects in the schema models designed for data warehousing in a variety of ways.&lt;/p&gt;  &lt;h5&gt;Star Schemas :&lt;/h5&gt;  &lt;p&gt;The &lt;b&gt;star schema&lt;/b&gt; (also called &lt;i&gt;star-&lt;a href=&quot;http://en.wikipedia.org/wiki/Join_(SQL)&quot;&gt;join&lt;/a&gt; schema&lt;/i&gt;, &lt;i&gt;data cube&lt;/i&gt;, or &lt;i&gt;multi-dimensional schema&lt;/i&gt;) is the simplest style of &lt;a href=&quot;http://en.wikipedia.org/wiki/Data_warehouse&quot;&gt;data warehouse&lt;/a&gt; &lt;a href=&quot;http://en.wikipedia.org/wiki/Logical_schema&quot;&gt;schema&lt;/a&gt;. The star schema consists of one or more &lt;a href=&quot;http://en.wikipedia.org/wiki/Fact_table&quot;&gt;fact tables&lt;/a&gt; referencing any number of &lt;a href=&quot;http://en.wikipedia.org/wiki/Dimension_(data_warehouse)&quot;&gt;dimension tables&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;img alt=&quot;Text description of dwhsg007.gif follows&quot; src=&quot;http://download.oracle.com/docs/cd/B10500_01/server.920/a96520/dwhsg007.gif&quot; /&gt;&lt;/p&gt;  &lt;p&gt;The facts that the data warehouse helps analyze are classified along different dimensions:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;strong&gt;The &lt;i&gt;fact table&lt;/i&gt;&lt;/strong&gt; holds the main data. It includes a large amount of aggregated data, such as price and units sold. There may be multiple fact tables in a star schema. &lt;/li&gt;    &lt;li&gt;&lt;strong&gt;&lt;i&gt;Dimension tables&lt;/i&gt;,&lt;/strong&gt; which are usually smaller than fact tables, include the attributes that describe the facts. Often this is a separate table for each dimension. Dimension tables can be joined to the fact table(s) as needed.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;Dimension tables have a simple &lt;a href=&quot;http://en.wikipedia.org/wiki/Primary_key&quot;&gt;primary key&lt;/a&gt;, while fact tables have a set of foreign keys which make up a &lt;a href=&quot;http://en.wikipedia.org/wiki/Compound_key&quot;&gt;compound&lt;/a&gt; &lt;a href=&quot;http://en.wikipedia.org/wiki/Primary_key&quot;&gt;primary key&lt;/a&gt; consisting of a combination of relevant dimension keys.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Advantages :&lt;/strong&gt;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Provide a direct and intuitive mapping between the business entities being analyzed by end users and the schema design. &lt;/li&gt;    &lt;li&gt;&lt;a name=&quot;14729&quot;&gt;&lt;/a&gt;Provide highly optimized performance for typical star queries. &lt;/li&gt;    &lt;li&gt;&lt;a name=&quot;14730&quot;&gt;&lt;/a&gt;Are widely supported by a large number of business intelligence tools, which may anticipate or even require that the data-warehouse schema contain dimension tables&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;strong&gt;Snow Flake Schemas :&lt;/strong&gt; The snowflake schema is represented by centralized &lt;a href=&quot;http://en.wikipedia.org/wiki/Fact_table&quot;&gt;fact tables&lt;/a&gt; which are connected to multiple &lt;a href=&quot;http://en.wikipedia.org/wiki/Dimension_(data_warehouse)&quot;&gt;dimensions&lt;/a&gt;. In the snowflake schema, dimensions are &lt;a href=&quot;http://en.wikipedia.org/wiki/Normalization_(database)&quot;&gt;normalized&lt;/a&gt; into multiple related tables, whereas the star schema&#39;s dimensions are denormalized with each dimension represented by a single table.&lt;/p&gt;  &lt;p&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Snowflake schemas are often better with more sophisticated query tools that isolate users from the raw table structures and for environments having numerous queries with complex criteria.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;img alt=&quot;Text description of dwhsg008.gif follows&quot; src=&quot;http://download.oracle.com/docs/cd/B10500_01/server.920/a96520/dwhsg008.gif&quot; /&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Advantages :&lt;/strong&gt;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Some &lt;a href=&quot;http://en.wikipedia.org/wiki/OLAP&quot;&gt;OLAP&lt;/a&gt; multidimensional database modeling tools that use dimensional data marts as data sources are optimized for snowflake schemas.&lt;/li&gt;    &lt;li&gt;A snowflake schema can sometimes reflect the way in which users think about data. Users may prefer to generate queries using a star schema in some cases, although this may or may not be reflected in the underlying organization of the database.&lt;/li&gt;    &lt;li&gt;A multidimensional view is sometimes added to an existing transactional database to aid reporting. In this case, the tables which describe the dimensions will already exist and will typically be normalized. A snowflake schema will therefore be easier to implement.&lt;/li&gt;    &lt;li&gt;If a dimension is very sparse (i.e. most of the possible values for the dimension have no data) and/or a dimension has a very long list of attributes which may be used in a query, the dimension table may occupy a significant proportion of the database and snowflaking may be appropriate.&lt;/li&gt; &lt;/ul&gt;  </description><link>http://informaticatutors.blogspot.com/2011/03/data-warehouse-schemas.html</link><author>noreply@blogger.com (Naveen)</author><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2860188939685086745.post-7687097892051718691</guid><pubDate>Tue, 22 Mar 2011 05:16:00 +0000</pubDate><atom:updated>2011-03-21T22:16:39.269-07:00</atom:updated><title>Data Warehouse Architecture</title><description>&lt;p&gt;&lt;img src=&quot;http://www.databaseanswers.org/data_models/data_warehouse_loading_alt/images/data_warehouse_architecture_lge.jpg&quot; /&gt;&lt;/p&gt;  &lt;p&gt;The above Architecture is taken from the &lt;a href=&quot;http://www.databaseanswers.com&quot;&gt;www.databaseanswers.com&lt;/a&gt; . We would recommend you to visit this site to get good understanding towards data modeling.&lt;/p&gt;  &lt;p&gt;Now we are goanna define each and every terminology in the above picture to facilitate better understanding of the subject.&lt;/p&gt;  &lt;p&gt;1. &lt;strong&gt;Operational Data Store :&lt;/strong&gt; is a &lt;a href=&quot;http://en.wikipedia.org/wiki/Database&quot;&gt;database&lt;/a&gt; designed to integrate data from multiple sources for additional operations on the data. The data is then passed back to operational systems for further operations and to the data warehouse for reporting.&lt;/p&gt;  &lt;p&gt;2. &lt;strong&gt;ERP :&lt;/strong&gt; Enterprise resource planning integrates internal and external &lt;a href=&quot;http://en.wikipedia.org/wiki/Management_information&quot;&gt;management information&lt;/a&gt; across an entire organization, embracing &lt;a href=&quot;http://en.wikipedia.org/wiki/Finance&quot;&gt;finance&lt;/a&gt;/&lt;a href=&quot;http://en.wikipedia.org/wiki/Accounting&quot;&gt;accounting&lt;/a&gt;, &lt;a href=&quot;http://en.wikipedia.org/wiki/Manufacturing&quot;&gt;manufacturing&lt;/a&gt;, sales and service, etc.&lt;/p&gt;  &lt;p&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Its purpose is to facilitate the flow of information between all business functions inside the boundaries of the organization and manage the connections to outside stakeholders.&lt;/p&gt;  &lt;p&gt;3. &lt;strong&gt;CRM :&lt;/strong&gt; Customer relationship management is a widely-implemented strategy for managing a company’s interactions with &lt;a href=&quot;http://en.wikipedia.org/wiki/Customers&quot;&gt;customers&lt;/a&gt;, clients and sales prospects. It involves using technology to organize, automate, and synchronize business processes—principally &lt;a href=&quot;http://en.wikipedia.org/wiki/Sales&quot;&gt;sales&lt;/a&gt; activities, but also those for &lt;a href=&quot;http://en.wikipedia.org/wiki/Marketing&quot;&gt;marketing&lt;/a&gt;, &lt;a href=&quot;http://en.wikipedia.org/wiki/Customer_service&quot;&gt;customer service&lt;/a&gt;, and &lt;a href=&quot;http://en.wikipedia.org/wiki/Technical_support&quot;&gt;technical support&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Customer relationship management describes a company-wide business strategy including customer-interface departments as well as other departments.&lt;/p&gt;  &lt;p&gt;4. &lt;strong&gt;Flat Files In data Ware Housing :&lt;/strong&gt; Flat Files Doesn’t Maintain referential Integrity like RDBMS and are Usually seperated by some delimiters like comma and pipes etcs&lt;strong&gt;.&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Right from Informatica 8.6 unstructured data sources like Ms-word,Email and Pdf can be taken as source.&lt;/p&gt;  &lt;p&gt;5. &lt;strong&gt;ETL (Extract,Transform, And load) : &lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;is a process in &lt;a href=&quot;http://en.wikipedia.org/wiki/Database&quot;&gt;database&lt;/a&gt; usage and especially in &lt;a href=&quot;http://en.wikipedia.org/wiki/Data_warehouse&quot;&gt;data warehousing&lt;/a&gt; that involves:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Data_extraction&quot;&gt;Extracting data&lt;/a&gt; from outside sources &lt;/li&gt;    &lt;li&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Data_transformation&quot;&gt;Transforming&lt;/a&gt; it to fit operational needs (which can include quality levels) &lt;/li&gt;    &lt;li&gt;Loading it into the end target (database or &lt;a href=&quot;http://en.wikipedia.org/wiki/Data_warehouse&quot;&gt;data warehouse&lt;/a&gt;)&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;6.&lt;strong&gt; Data Marts:&lt;/strong&gt; A data mart (DM) is the access layer of the &lt;a href=&quot;http://en.wikipedia.org/wiki/Data_warehouse&quot;&gt;data warehouse&lt;/a&gt; (DW) environment that is used to get data out to the users. The DM is a subset of the DW, usually oriented to a specific business line or team.&lt;/p&gt;  &lt;p&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; For the Definition of the Data Warehouse Please Refer to Introduction to the Data ware Housing.&lt;/p&gt;  &lt;p&gt;7. &lt;strong&gt;OLAP :&lt;/strong&gt; OLAP (Online Analytical Processing) is a methodology to provide end users with access to large amounts of data in an intuitive and rapid manner to assist with deductions based on investigative reasoning.&lt;/p&gt;  &lt;p&gt;OLAP systems need to:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Support the complex analysis requirements of decision-makers, &lt;/li&gt;    &lt;li&gt;Analyze the data from a number of different perspectives (business dimensions), and &lt;/li&gt;    &lt;li&gt;Support complex analyses against large input (atomic-level) data sets.&lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;8. &lt;strong&gt;OLTP :&lt;/strong&gt; &lt;b&gt;Online transaction processing&lt;/b&gt;, or &lt;b&gt;OLTP&lt;/b&gt;, refers to a class of systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval &lt;a href=&quot;http://en.wikipedia.org/wiki/Transaction_processing&quot;&gt;transaction processing&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;9. &lt;strong&gt;Data Mining:&lt;/strong&gt; Is the process of extracting patterns from large &lt;a href=&quot;http://en.wikipedia.org/wiki/Data_set&quot;&gt;data sets&lt;/a&gt; by combining methods from &lt;a href=&quot;http://en.wikipedia.org/wiki/Statistics&quot;&gt;statistics&lt;/a&gt; and &lt;a href=&quot;http://en.wikipedia.org/wiki/Artificial_intelligence&quot;&gt;artificial intelligence&lt;/a&gt; with &lt;a href=&quot;http://en.wikipedia.org/wiki/Database_management&quot;&gt;database management&lt;/a&gt;. Data mining is seen as an increasingly important tool by modern business to transform data into &lt;a href=&quot;http://en.wikipedia.org/wiki/Business_intelligence&quot;&gt;business intelligence&lt;/a&gt; giving an informational advantage.&lt;/p&gt;  </description><link>http://informaticatutors.blogspot.com/2011/03/data-warehouse-architecture.html</link><author>noreply@blogger.com (Naveen)</author><thr:total>12</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2860188939685086745.post-6953714021489319439</guid><pubDate>Tue, 22 Mar 2011 03:44:00 +0000</pubDate><atom:updated>2011-03-21T20:44:53.186-07:00</atom:updated><title>Introduction to Data warehousing</title><description>&lt;p&gt;A &lt;b&gt;data warehouse&lt;/b&gt; (DW) is a database used for reporting. The data is offloaded from the operational systems for reporting. The data may pass through an &lt;a href=&quot;http://en.wikipedia.org/wiki/Operational_data_store&quot;&gt;operational data store&lt;/a&gt; for additional operations before it is used in the DW for reporting.&lt;/p&gt;  &lt;p&gt;A data warehouse maintains its functions in three layers: staging, integration, and access. &lt;i&gt;Staging&lt;/i&gt; is used to store raw data for use by developers (analysis and support). The &lt;i&gt;integration&lt;/i&gt; layer is used to integrate data and to have a level of abstraction from users. The &lt;i&gt;access&lt;/i&gt; layer is for getting data out for users.&lt;/p&gt;  &lt;p&gt;1. &lt;b&gt;Ralph Kimball&#39;s paradigm&lt;/b&gt;: Data warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Definition as Per Ralph Kimball :&lt;/strong&gt; A data warehouse is a copy of transaction data specifically structured for query and analysis.&lt;/p&gt;  &lt;p&gt;His Approach towards towards the Data warehouse Design is &lt;strong&gt;Bottom-Up.&lt;/strong&gt;In the &lt;i&gt;bottom-up&lt;/i&gt; approach &lt;a href=&quot;http://en.wikipedia.org/wiki/Data_mart&quot;&gt;data marts&lt;/a&gt; are first created to provide reporting and analytical capabilities for specific &lt;a href=&quot;http://en.wikipedia.org/wiki/Business_process&quot;&gt;business processes&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160; 2.&lt;b&gt;Bill Inmon&#39;s paradigm&lt;/b&gt;: Data warehouse is one part of the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Definition as Per Bill Inmon: &lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;b&gt;A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management&#39;s decision making process. &lt;/b&gt;&lt;/p&gt;  &lt;p&gt;&lt;b&gt;Subject-Oriented&lt;/b&gt;: A data warehouse can be used to analyze a particular subject area. For example, &amp;quot;sales&amp;quot; can be a particular subject.&lt;/p&gt;  &lt;p&gt;&lt;b&gt;Integrated&lt;/b&gt;: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product.&lt;/p&gt;  &lt;p&gt;&lt;b&gt;Time-Variant&lt;/b&gt;: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.&lt;/p&gt;  &lt;p&gt;&lt;b&gt;Non-volatile&lt;/b&gt;: Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered.&lt;/p&gt;  &lt;p&gt;His Approach towards towards the Data warehouse Design is &lt;strong&gt;Top-Down.&lt;/strong&gt; In &lt;i&gt;top-down&lt;/i&gt; approach to data warehouse design, in which the data warehouse is designed using a normalized enterprise &lt;a href=&quot;http://en.wikipedia.org/wiki/Data_model&quot;&gt;data model&lt;/a&gt;. &lt;a href=&quot;http://en.wikipedia.org/wiki/Data_element&quot;&gt;&amp;quot;Atomic&amp;quot; data&lt;/a&gt;, that is, data at the lowest level of detail, are stored in the data warehouse.&lt;/p&gt;  </description><link>http://informaticatutors.blogspot.com/2011/03/introduction-to-data-warehousing.html</link><author>noreply@blogger.com (Naveen)</author><thr:total>2</thr:total></item></channel></rss>