Answers to review questions: 1. Define each of the following terms: a. Data: Raw information that has not been processed b. Field: A place where data is entered. C. Record: A collection of the related information. d. File: A collection of the related records.
2. What is data redundancy, and which characteristics of the file system can lead to it? Data redundancy in a database occurs when same data is stored in different tables unnecessarily. A modification to a single piece of data requires change for all the copies. Use of flat file database designs and spreadsheets can lead to data redundancy. 3. What is data independence, and why is it lacking in file systems? Data independence, it exists when we were able to change the database structure or characteristics without affecting the accessibility to the data. In case of file systems, each time a user manipulates the data, the metadata changes which makes other programs using the same file inaccessible due to constant changes.
4. What is a DBMS, and what are its functions? i) DBMS, a collection of programs that are stored, managed and simultaneously given controlled access to the end users to create, modify and delete. ii) Functions of a DBMS: (1) Views (2) Indexing (3) Concurrency (4) Security (5) Integrity (6) Backup and recovery 5. What is structural independence, and why is it important? It occurs when we can change the file structure without affecting the accessibility of the data. It is important because any change to a file requires the change to the original file making it inaccessible to other applications/programs which further leads to a bad structure. 6. Explain the differences among data, information, and a database? Data: unprocessed facts collected together for analysis. Information: Processed data from the raw facts. Database: A database is where the information is stored in tables.
7. What is the role of a DBMS, and what are its advantages? What are its disadvantages? The role of DBMS is to manage and store the databases. Advantages: a. Lower data redundancy b. Data independence C. Allows different view of data regardless of which user is accessing it. d. Better security Disadvantages: e. Expensive. f. Design is time consuming. 8. List and describe the different types of databases. a. Single-user Database: A database that allows access to only a single user at a time. b.Multi-user Database: A database that allows access to more than a single user at a time. c. Enterprise Database: A database that supports entire department/ company and supports many users. d. Centralized Database: A database that is maintained at a single site and supports the data. e. Distributed Database: A database that is distributed across different sites.
9. What are the main components of a database system? Two main components are: 1. Data Warehouse(DWH) 2. Online Analytical Processing(OLAP) 10.What are metadata? The data that is Integrated and managed end-user data. 11.Explain why database design is important. a. It is a process of developing a detailed model of a database. b. It is important to design a good database, because one has to maintain consistent data. c. Maintain data redundancy. d. For ensuring smooth performance of application. 12. What are the potential costs of implementing a database system? a. Hardware b. Software C. Cost of Maintenance d. Security e. Frequent upgrades
13.Use examples to compare and contrast unstructured and structured data. Which type is more prevalent in a typical business environment? Unstructured data, are a data that exist in their raw or original state, i.e. in the state that are first collected. Structured data are data that are arranged or formatted to store, manage and modify. Some data might not be formatted(structured) but still can used for processing. So the structured data which is fully functional is suitable for the business environment. 14. What are some basic database functions that a spreadsheet cannot perform? a. A Spreadsheet doesn’t support listing of names based on their last names. b. It allows auto fill-in, which is a big problem.
C. And also you have to modify the whole spreadsheet for small change. 15.What common problems does a collection of spreadsheets created by end users share with the typical file system? a. Data sharing limited b. Creates data inconsistency C. Creates data redundancy d. Time taken to access the file e. Difficulty of getting the information quickly. f. Complexity of the file system. 16.Explain the significance of the loss of direct, hands-on access to business data that end users experienced with the advent of computerized data repositories. This is significant because it is giving access to end-users to collect the data of their choice and manipulate the data to generate new information.
Problem Solutions: 1. How many records does the file contain? How many fields are there per record? The file contains 7 records. And each record has 5 fields. 2.What problem would you encounter if you wanted to produce a listing by city? How would you solve this problem by altering the file structure? There is no separate column for city, and it is in the MANAGER_ADDRESS column. So we have to write more queries to separate the city names which is a time consuming process. Only solution is to make it a separate column.
3.If you wanted to produce a listing of the file contents by last name, area code, city, state, or zip code, how would you alter the file structure? Firstly, to produce a listing of last name, we have to break the PROJECT MANAGER column into MNGR_LNAME, MNGR_FNAME and MNGR_INITIAL which lets the user to query. Second, for city, state, or zip code, we have to break the MAANAGER_ADDRESS column into MNGR CITY. MNGR_STATE, MNGR_ZIP, which lets the user to retrieve or query the required data. Similarly, for area code, we have to break the MANAGER_PHONE column into MNGR_AREACODE and MNGR_PHONE which allows ease of retrieval. The more we decompose the data, the more will be the flexibility of data retrieval.
4. What data redundancies do you detect? How could those redundancies lead to anomalies? If we consider George F. Dorts, the name occurs in 3 records which occurs data redundancy, similarly address and phone number. If George moves from the address, we have to change the address at two places which makes it so difficult to update the records at both the places, because the user has to see that it is updated without any errors even for a single letter, which is same for phone number as well.
5. Identify and discuss the serious data redundancy problems exhibited by the file structure shown in Figure. The serious problem with the given file structure is its poor structure. If we consider the JOB_CODE ‘EE’, the JOB_CHG_HOUR is 85. The job code occurs at 4 PROJ_NAME. So if one of the EMP_NAME is deleted from the project the PROJ_HOURS might get affected.
6. Looking at the EMP_NAME and EMP_PHONE contents in Figure P1.5, what change(s) would you recommend? I would recommend that the data should be atomic. EMP_NAME can be decomposed into EMP_LNAME, EMP_FNAME and EMP_INITIAL which makes it easier to keep track of the employee data. Similarly, EMP_PHONE can be decomposed into EMP_PHONE and EMP_AREA_CODE which makes it easier to search the employee by using their area code. 7. Identify the various data sources in the file you examined in Problem 5. a. Employee Numbers, names and phone numbers. b. Job data such as JOB_CODE. C. Project data such as PROJ_NUM, PROJ_NAME and PROJ_HOURS. d. And the charge paid for the job JOB_CHG_HOUR.
8.Given your answer to Problem 7, what new files should you create to help eliminate the data redundancies found in the file shown in Figure P1.5? The new files that can be created to eliminate data redundancies are PROJECT, EMPLOYEE, JOB and CHARGE. PROJECT containing name, number and hours. EMPLOYEE containing name, number and phone number. JOB containing code. CHARGE containing the charge paid per hour for the specific job category.
9. Identify and discuss the serious data redundancy problems exhibited by the file structure shown in Figure P1.9. The file structure has data duplication if we consider the teacher data (last name, first name and initial). It is better if we create a common thing that is ID or a number that is easy to identify and retrieval there by eliminating data redundancy and larger data file size. 10. Given the file structure shown in Figure P1.9, what problem(s) might you encounter if building KOM were deleted? If we delete the building KOM, we will lose all the timely data about the teachers Hawkins, Cordoza and Williston and also the rooms data of 2048,123 and 34