Placeholder Image

字幕表 動画を再生する

  • What

  • is the first thing that comes to your mind when you hear the worddatabase”?

  • For many people, this question is more challenging than it might seem at first.

  • An answer like “a big file where much information is storedis not satisfactory and would

  • not please potential employers.

  • You should remember there are two main types of databasesrelational and non-relational.

  • The former will be the focus of this course, while the latter regards more complex systems.

  • Although understanding non-relational databases requires a serious mathematical and programming

  • background, some of the logic applied in its coding is the same as SQL.

  • Likewise, relational databases have a few advantages on their own.

  • A small bit of theory will explain why they are still the preferred choice in many companies

  • and institutions.

  • Databasesmain goal is to organize huge amounts of data that can be quickly retrieved

  • upon usersrequest.

  • Therefore, they must be compact, well-structured, and efficient in terms of speed of data extraction.

  • Today, people need such extra efficiency because data occupies memory space andthe bigger

  • its size, the more sluggish the database is and the slower the retrieval process becomes.

  • If we have a database containing one multi-million-row table, with many columns, then every time

  • a request has been received, the server must load all the records, with all fields, and

  • it would take too much time for a task to be completed.

  • Don’t forget every symbol is a container of information and requires bytes of storage

  • space.

  • Hence, loading that much data will not be an easy job for the computer.

  • So, what allows us to contain so much data on the server, but lets us efficiently use

  • only the portions we need for our analysis?

  • The secret lies behind the use of mathematical logic originating from relational algebra.

  • Please, don’t worry - we will not bother you with math.

  • Imagine each table with data is represented by a transparent circle that contains all

  • the data values of the table, categorized by columns or, as we will often call them,

  • fields.

  • Now, if our database consisted of only one table, a giant circle would represent the

  • entire database, something like this huge table from our fictional example with the

  • Salesdatabase.

  • And when we need a piece of information from the database, for example if we wish to see

  • who has bought something on a certain date, we will have to lift this whole big circle

  • and then search for what we need.

  • This challenge seems vague and the process of data extraction will not be efficient.

  • See what can happen if we split the circle into 3 smaller circles, just as we did with

  • theSalesdatabase.

  • One circle will stand for theSalestable, the other forCustomers”, and

  • the last one forItems”.

  • There are various theoretical combinations between 3 or more circles, but in our database,

  • we have the following model.

  • SalesandCustomershave the samecustomer IDcolumn, andSales

  • andItemshave the sameitem codecolumn.

  • This way, we can see the circles overlap as they have common fields.

  • So, if we’d like to extract the same information, the names of the customers who have purchased

  • something on a given date, we will need only thedate of purchasecolumn from the

  • Salestable and thefirstandlast namefrom theCustomerstable.

  • So, to satisfy this request, we will not need to lift the third circle from our database,

  • Items”.

  • This way, we can save energy or, more technically, increase efficiency.

  • Less data, represented as only two of the three circles, will be involved in this operation.

  • The mathematical trick lies in relating the tables to one another.

  • Relationships were formed namely through these common fields.

  • Anyway, I am sure that now you understand why we use the term relational databases.

  • Some professionals may refer to the tables, or the circles in our plot, as relations because,

  • theoretically, they are the smallest units in the entire system that can carry integral

  • logical meaning.

  • Likewise, the three circles are all part of ourSalesdatabase.

  • When we combine the database and its existing relations, we obtain the famous term relational

  • database management system, frequently abbreviated as RDBMS.

  • ? We hope this theoretical illustration makes

  • things clearer.

  • SQL is designed for managing relational database management systems and can do that by creating

  • relations between the different tables in a database.

  • To complete our theoretical preparation, we will need to learn more database terminology.

  • This is a necessary step that will help us when we start coding in SQL later.

  • Let’s go through the entire process of creating a database.

  • Assume our database containing customer sales data has not been set up yet, ok?

  • So, imagine you are the shop owner and you realize you have been selling goods quite

  • well recently, and you have more than a million rows of data.

  • What do you need, then?

  • A database!

  • But you know nothing about databases.

  • Who do you call, an SQL specialist?

  • No.

  • You need a database designer.

  • She will be in charge of deciding how to organize the data into tables and how to set up the

  • relations between these tables.

  • This step is crucial.

  • If the database design is not perfect from the beginning, your system will be difficult

  • to work with and wouldn’t facilitate your business needs; you will have to start over

  • again.

  • Considering the time and data (…and money!) involved in the process, you want to avoid

  • going back to point 0.

  • What do database designers actually do?

  • They plot the entire database system on a canvas using a visualization tool.

  • There are two main ways to do that.

  • One is drawing an Entity-Relationship diagram, an ER diagram for brief.

  • It looks like thisand, as its name suggests, the different figures represent different

  • data entities and the specific relationships we have between entities.

  • The connections between tables are indicated with lines.

  • This way of representing databases is powerful and professional, but it is complicated.

  • We will not focus in-depth on ER diagrams in this course, but you should know they exist

  • and refer to the process of database design.

  • Another form of representation of a database is the relational schema.

  • This is an existing idea of how the database must be organized.

  • It is useful when you are certain of the structure and organization of the database you would

  • like to create.

  • More precisely, a relational schema would look like this.

  • It represents a table in the shape of a rectangle.

  • The name of the table is at the top of the rectangle.

  • The column names are listed below.

  • All relational schemas in a database form the database schema.

  • You can also see lines indicating how tables are related in the database.

  • To this moment, it has been ideas, planning, abstract thinking, and design.

  • At this stage, it would be correct to say SQL can be used to set up the database physically,

  • as opposed to contriving it abstractly.

  • Then, you can enjoy the advantages of data manipulation.

  • It will allow you to use your dataset to extract business insights that aim to improve the

  • performance and efficiency of the business you are working for.

  • This process is interesting, and, practically, the main part of the course will be related

  • to that.

  • Rememberwell thought-out databases that are carefully designed and created are crucial

  • prerequisites for data manipulation.

  • If we have done good work with these steps, we could write effective queries and navigate

  • in a database rather quickly.

  • You will often hear the term database management.

  • It comprises all these steps a business undertakes to design, create, and manipulate its databases

  • successfully.

  • Finally, database administration is the most frequently encountered job amongst all.

  • A database administrator is the person providing daily care and maintenance of a database.

  • Her scope of responsibilities is narrower regarding the ones carried out by a database

  • manager, but she is still indispensable for the database department of a company.

  • All right!

  • So

  • In this lesson, we will use theSalesdatabase to illustrate the concept of relational

  • schemas a little better.

  • The data will be stored in 4 tables – “Sales”, “Customers”, “Items”, andCompanies”.

  • You see how these tables have a tabular shape.

  • Let’s see how a relational schema can be applied to represent them.

  • Let’s build the landscape piece by piece.

  • We care about the sales per customer of our company, and that’s why we have a “Sales

  • table.

  • It tells us the number with which each purchase was registered, the date of the purchase,

  • the respective customer ID, and the item code.

  • So, in this table, the dates of a few purchases may coincide, because it is normal to sell

  • some of the same goods in the same day.

  • Analogically, in the third column, the ID of a customer may appear a few times, given

  • that some customers make more than one purchase.

  • People may have bought many units of the same product; hence, there is a possibility to

  • see the same item code a few times in the last column.

  • This reasoning does not apply to the first field, though.

  • Each purchase is unique!

  • In databases, this means all the numbers in this column will be different, because each

  • purchase will be assigned a specific number that is unique.

  • There is a term for such type of field.

  • A column (or a set of columns) whose value exists and is unique for every record in a

  • table is called a primary key.

  • Each table can have one and only one primary key.

  • In one table, you cannot have 3 or 4 primary keys.

  • For instance, in ourSalestable, “number of purchasecan act as a single-column

  • primary key, and there will be no other primary keys.

  • If the database creators decide, the primary key of a table may be composed of a set of

  • columns, not of just one column.

  • In theSalestable, bothnumber of purchaseanddate of purchasecan

  • be thought of as a unique-identifier pair for the data in this table.

  • For instance, purchase number 1 and purchase date, which is the 3rd of September, will

  • form a unique pair and so will purchase number 2 and the same date, 3rd of September.

  • This means these 2 rows are different.

  • Please, don’t overlook the fact you cannot have a combination of the same purchase number,

  • 1, and the same date, 3rd of September, more than once in your table.

  • Either the number or the date must be different.

  • This is because, as we already said, a primary key must be unique.

  • Good!

  • Both ways of representation make sense.

  • Since having a one-column primary key here would mean all purchases in our database will

  • be recorded under a different number, this means they can go from 1 to, hopefully, a

  • million, or two

  • This logic will preserve the uniqueness of the purchases.

  • Ok.

  • So, we will proceed withpurchase numberas a single-column primary key.

  • Awesome!

  • Therefore, you can think of a primary key as the field or group of fields that identifies

  • the content of a table in a unique way.

  • For this reason, the primary keys are also called the unique identifiers of a table.

  • Another crucial feature of primary keys is they cannot contain null values.

  • This means, in the example with a single-column primary key, there must always be a value

  • inserted in the rows under this column.

  • You cannot leave it blank.

  • Please, be careful and don’t forget this characteristic of the primary key!

  • Technically, to create a relational schema, we draw a table in the form of a rectangle,

  • and we place its name on top.

  • Then, we enlist the fields vertically (as opposed to their horizontal organization in

  • a spreadsheet).

  • The field that is the primary key of the table is usually quoted on top of the other fields.

  • What is more important - it is always underlined.

  • Keeping these simple rules in mind, when you spot this table, you will immediately know:

  • It is calledSales”; its primary key ispurchase number”; and there are three

  • other fields – “date of purchase”, “customer ID”, “item code”.

  • This image corresponds to tabular data in the following form

  • All right?

  • Good.

  • One last remark about primary keys.

  • Please, remember that not all tables you work with will have a primary key, although almost

  • all tables in any database will have a single-column or a multi-column primary key.

  • To conclude, the term for representing a table with the help of such a rectangle and the

  • names of the fields is called a relational schema.

  • When you combine the relational schemas of the tables we have in a database, this gives

  • us a database schema.

  • Informally, it is ok to call the representation of the entire database in plural - relational

  • schemas.

  • All right!

  • Excellent!

  • TheCustomer IDcolumn does not give away much information about the purchaser,

  • does it?

  • We will store her first and last names, email address, and number of complaints submitted

  • in another table, calledCustomers”.

  • Here is a table containing all these fields, as well as a “Customer IDcolumn, which

  • is unique; there is no need to keep the information about the same customer more than once, right?

  • Therefore, in this table, theCustomer IDfield is the primary key.

  • The other columns represent the rest of the information about a customerfirst name,

  • last name, email address, and number of complaints.

  • Interesting!

  • We already have two tables – “SalesandCustomers”!

  • We know they are related through theCustomer IDfield, and as you can see, a “Customer

  • IDcolumn exists in theSalestable.

  • The coincidence of the two column names is chosen purely for convenience.

  • The same relationship could be based on columns with different names – “Column IDand

  • justID”, for example.

  • What is important is that the information in theSalestable can be obtained through

  • theCustomer IDcolumn in theCustomerstable.

  • Great!

  • Let’s stick to using the same name for both columns.

  • All right.

  • Here is the big trick.

  • Customer IDis a primary key for theCustomerstable and a foreign key in

  • theSalestable.