0% found this document useful (0 votes)
31 views

Lecture 01 - Class Overview, Databases

Uploaded by

xukunzh11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Lecture 01 - Class Overview, Databases

Uploaded by

xukunzh11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

CS5200: Database

Management Systems
Lecture 1: Introduction

9/4/2024 CS 5200, Fall 2024 (Cobbe) 1


Agenda
● Introduction and Syllabus
● Break
● Introduction to Databases

9/4/2024 CS 5200, Fall 2024 (Cobbe) 2


Introduction and Syllabus

9/4/2024 CS 5200, Fall 2024 (Cobbe) 3


Introduction and Syllabus
● Course Staff
● Lectures
● Communication
● Topics and Objectives
● Assignments and Grading
● Expectations
○ Generative AI/LLM
○ Academic Integrity
● Other Resources

9/4/2024 CS 5200, Fall 2024 (Cobbe) 4


Course Staff
● Prof. Richard Cobbe
○ 2nd year as full-time faculty at Khoury Seattle
○ PhD in programming language theory, Northeastern (Boston), 2009
○ 13 years of industry experience: Endeca, Oracle, Microsoft
○ Endeca/Oracle work: intersection of PL and databases
● TAs
○ Kaijun Chen
○ Yashvi Garg
○ Wenli Li
○ Zhiyuan Zhang

9/4/2024 CS 5200, Fall 2024 (Cobbe) 5


Lectures
● I'm responsible for 2 sections of CS5200 this fall:
○ Section 12: Wednesdays, 12:30–3:50, 225 Terry room 210
○ Section 13: Thursdays, 12:30–3:50, 225 Terry room 210
● Both lectures have the same content
● Please attend the lecture for the section you're registered for
○ One-time exceptions are probably fine
○ Space is limited; we will prioritize students who are registered for the corresponding section

9/4/2024 CS 5200, Fall 2024 (Cobbe) 6


Lectures
● Each lecture will have a 10-15 minute break
○ This is as much for me as for you; I will not be available for questions during break
○ Likewise, I'm unfortunately not able to stay after class for questions
● Slides will be posted to Canvas after the Thursday lecture
● As a rule, I do not plan to record lectures
● Recordings may be available, if attendance is high or there are extenuating
circumstances

9/4/2024 CS 5200, Fall 2024 (Cobbe) 7


Office Hours
● My office hours
○ Complicated by opening of 310 Terry, currently scheduled for Sept 23
○ Before 310 is open: Mondays, 2–4pm via Teams; use my bookings page to sign up
○ After 310 is open: Thursdays, 9:30–11:30am, faculty space on 2nd floor of 310
■ Ideally, you can just show up without scheduling a slot in advance
■ Layout of the space may make this difficult: how do I know that you're there?
■ We may have to try a few things to find something that works for everyone
● TAs will hold office hours
○ Expect a mix of in-person and virtual
○ Still finalizing schedule
○ Expect more information within the next week or so
● Other options available by arrangement

9/4/2024 CS 5200, Fall 2024 (Cobbe) 8


Communications: Canvas
● Syllabus
● Assignments (including submission)
● Grades
● Supplemental Files
○ Lecture slides
○ Recordings, if applicable
○ etc.
● Class membership tied to registration
○ If you're not a member of the Canvas class, please double-check your registration and email
me if there is still a problem

9/4/2024 CS 5200, Fall 2024 (Cobbe) 9


Canvas and Time Zones
● Canvas defaults to Eastern Time
● Affects how assignment deadlines are
displayed
● Be sure your Canvas timezone is set
correctly!
○ Click the “Edit Settings” button on the right
to
adjust.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 10


Communications: Piazza
We have a course Piazza page (signup link; also in Canvas and syllabus).

Primary means of communication outside of class.

● Where appropriate, I encourage public questions!


○ Some of your classmates may be wondering the same thing
○ Discussions between students can be valuable!
○ You may post anonymously if you feel more comfortable doing so
● For other matters, please post a private message on Piazza, addressed to
"Instructors"
○ Goes to me and all the TAs
○ With 5 people reading it, you will probably get a response faster than if you address it to one

9/4/2024 CS 5200, Fall 2024 (Cobbe) 11


Communications: Expectations
● I will do my best to respond to Piazza messages within 24 hours
○ Slightly longer over the weekends and holidays
● Email and Teams messages will have longer response times.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 12


Other Resources
● No textbook for this course
● We will use a few open-source software packages (links to be provided):
○ MySQL
○ MySQL Workbench
○ Eclipse/IntelliJ
● We'll talk about installing and configuring these during lecture at the
appropriate times.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 13


Course Summary

9/4/2024 CS 5200, Fall 2024 (Cobbe) 14


Course Description and Goals
Introductory masters-level course in database management systems.
Major topics:
● What is a database? What is a DBMS?
● Where are databases and DBMSs used?
● How do we design a database? What are some common pitfalls?
● How do we retrieve information of interest from a database?
● How do we add, remove, and update information in a database?
● How do we interact with a database from a program?
● How do we manage concurrent updates to a database?
● How do we implement a DBMS?

9/4/2024 CS 5200, Fall 2024 (Cobbe) 15


Prerequisites
● Designed to be accessible to any Masters student at Khoury
● Requires familiarity with basic Java programming (5004, 5010)
● We will cover all other languages and related topics in lecture, including
○ SQL
○ Relational algebra
○ JDBC
○ HTML, JSP (used for term project)

9/4/2024 CS 5200, Fall 2024 (Cobbe) 16


Coursework Overview
● Homework Assignments
○ Individual work
○ Roughly weekly, mostly first half of the semester
● Midterm
○ In-class, pen-and-paper
○ Oct 23–24, during normal class hours
● Term Project
○ Group work
○ Divided into 4 milestones
● No Final Exam

9/4/2024 CS 5200, Fall 2024 (Cobbe) 17


Term Project: Overview
● Develop a relational database for an application to be described
● Develop a simple web application that interacts with the database
● Work in teams of 5 students
● Split into 4 milestones:
○ Relational model
○ Physical model and sample data
○ Java interoperability (JDBC)
○ Web front-end (JSP)
● In-class presentations after each milestone
○ Everyone should be ready to present each milestone

9/4/2024 CS 5200, Fall 2024 (Cobbe) 18


Term Project
● I will shortly post all 4 term project assignments on Canvas
○ No need to start yet; this is just to give you an idea of what to expect
○ Details of the data that you will be asked to represent forthcoming
● For now: start thinking about forming your teams
○ 80 currently enrolled, so 16 teams of 5
○ Teams should be in place by the week of Sept 16
○ I'll have a page on Canvas where each team can list their members
● The first milestone will be out on Sep 26 and due on Oct 9.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 19


Term Project
Cross-section teams?

● In principle, this should be fine: both sections are doing the same project
● I'm still considering how to handle the presentations in this case. Be aware
that I might ask your team to present in both sections.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 20


Course Schedule
● Week 1: Introduction, Course Overview
● Week 2: Relational Algebra (first HW assigned)
○ Formal, mathematical basis for querying relational databases
● Week 3: Logical Modeling
○ Defining relations, attributes, and relationships used to represent data
● Week 4: Functional Dependencies and Normal Forms (first project assigned)
○ Problems we can encounter in designing a logical model
○ How to avoid them
● Week 5: Physical Modeling
○ Translating the logical model into table definitions for an RDBMS

9/4/2024 CS 5200, Fall 2024 (Cobbe) 21


Course Schedule
● Week 6: SQL
○ Queries and other operations we can perform on an RDBMS
● Week 7: Midterm Review
● Week 8: Midterm (in-class)
● Week 9: Project Setup
○ Installing & configuring the various open-source toolkits we'll use for the term project
○ JDBC (Java DataBase Connectivity): library for executing SQL queries in Java
○ JSP (Java Server Pages): technology for constructing a web application
● Week 10: Database Transactions; Project Debugging
○ How to manage multiple concurrent users in a single database?
○ Time to address any remaining concerns with project setup

9/4/2024 CS 5200, Fall 2024 (Cobbe) 22


Course Schedule
● Week 11: Data Storage; Query Evaluation
○ What data structures do DBMSs use to store data on disk?
○ What algorithms do DBMSs use to evaluate queries?
● Week 12: JSP setup
● Week 13: Thanksgiving Break (no class)
● Week 14: Final Project Presentations
● Week 15: Finals Week (no class)

9/4/2024 CS 5200, Fall 2024 (Cobbe) 23


Policies and Expectations

9/4/2024 CS 5200, Fall 2024 (Cobbe) 24


Grading Policies
● Homeworks: 10% per day penalty for late work in most cases
○ I will not be able to accept late work on specific assignments; details to follow
● Project Milestones: late work is not accepted
○ Presentations scheduled in class immediately after due date
● I will grant extensions in appropriate circumstances
○ Please contact the instructors via Piazza as early as possible
● Grade distribution:
○ individual homework: 30%
○ midterm: 30%
○ term project: 40%

9/4/2024 CS 5200, Fall 2024 (Cobbe) 25


Grading Scale
Final grades computed according to the scale shown here. 93% or higher A

● The Canvas course uses this same scale 90-93% A-

● Grades on the dividing line receive the higher grade: a 86-90% B+

final average of exactly 90.00% is an A-. 82-86% B

77-82% B-

73-77% C+

69-73% C

65-69% C-

below 65% F

9/4/2024 CS 5200, Fall 2024 (Cobbe) 26


Regrade Policy
If you have questions about your grade, or if you believe you were graded
incorrectly, please submit a message to Instructors on Piazza.

We will respond to all such requests.

However, regrade requests have lower priority than grading current assignments.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 27


Academic Integrity
● All coursework subject to NEU's Academic Integrity Policy (link in syllabus)
● Includes plagiarism: presenting another's code, ideas, designs, words as your
own
○ Cite work you derive from other sources
○ This includes ChatGPT and similar tools!
○ The university library has more information; link in syllabus
● Homework and midterm must be your own work; project must be your team's

9/4/2024 CS 5200, Fall 2024 (Cobbe) 28


Academic Integrity
● I encourage you to discuss assignments and projects with your classmates
● Do not provide solutions to anyone
● Good rule of thumb:
○ Discussions only in a natural language (English, Chinese, Quechua, …) are OK
○ Discussions that involve sharing code or design diagrams are generally not
● If in doubt, please discuss with me!

9/4/2024 CS 5200, Fall 2024 (Cobbe) 29


Generative AI (or, What About ChatGPT?)
● My goal: each of you develops a deep understanding of the topics covered
● Generative AI cannot replace this deep understanding
● Generative AI has the potential to take care of routine, repetitive work, but:
○ At the moment, it has problems with factual correctness
○ I don't yet know how to use generative AI to automate this routine work without putting the
deep understanding at risk, and that's a tradeoff I'm not willing to make yet.
● My position on generative AI is still evolving (but don't expect it to change
much during this semester)

9/4/2024 CS 5200, Fall 2024 (Cobbe) 30


Generative AI
So what does that mean for this class?

● I will not ask you to use generative AI tools on an assignment.


● I will not use AI detection tools: risk of false positives is too high.
● Use of generative AI on coursework carries risks.
● Issues with correctness: you are responsible for any errors.
● Course is structured to test your understanding, not your use of AI.
○ Midterm is pen-and-paper, no devices permitted.
○ Project presentations: "Why did you choose to implement it in this fashion? What tradeoffs did
you consider?"
● You might not have access to GenAI during job interviews.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 31


Final Administrivia

9/4/2024 CS 5200, Fall 2024 (Cobbe) 32


Course Evaluations: TRACE
● Toward the end of the semester, the registrar's office will notify you that
TRACE course evaluations are open
● You can submit these on any laptop or mobile device
● I encourage you to submit feedback:
○ Useful to me as I continue to develop and refine this course
○ Useful to Khoury as we work to develop and refine the curriculum
● Feedback is strictly anonymous
● I read all submitted feedback
● You are welcome to submit feedback directly to me via Email/Piazza, though
obviously not anonymous

9/4/2024 CS 5200, Fall 2024 (Cobbe) 33


Student Accommodations
● If you need accommodations due to a disability, please contact Disability
Access Services (link in syllabus) rather than me directly.
● If you are unable to attend lecture or complete an assignment because of a
religious holiday or other observance, we can provide accommodations;
please contact me via Piazza post.
● In either case: the earlier we know of circumstances that require
accommodations, the better we'll be able to help.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 34


Title IX, Discrimination, and Harassment
● NEU, Khoury, and I want to make sure you have a learning environment free of
discrimination and harassment.
● Submit reports of discrimination or harassment to Office of University Equity
and Compliance
● Faculty members, including me, are Mandatory Reporters:
○ If I become aware of discrimination or harassment, I am obligated to report it to OUEC
○ OUEC will contact the injured party to offer information about rights and resources
○ This report does not automatically cause a formal investigation; that requires consent of the
injured party

9/4/2024 CS 5200, Fall 2024 (Cobbe) 35


Title IX, Discrimination, and Harassment
● Confidential resources are available to you as well:
○ These are not mandatory reporters
○ Find@Northeastern: 24-hour mental health support (877-233-9477)
○ Sexual Violence Resource Center: support for Northeastern community members who have
experienced any sort of sexual violence
○ Confidential Resource Advisor: support for Northeastern students who have been accused of
sexual violence

9/4/2024 CS 5200, Fall 2024 (Cobbe) 36


Questions?

9/4/2024 CS 5200, Fall 2024 (Cobbe) 37


15-Minute Break

9/4/2024 CS 5200, Fall 2024 (Cobbe) 38


What is a Database?

9/4/2024 CS 5200, Fall 2024 (Cobbe) 39


What is a database?
● Organized collection of data
● “Organized” implies some notion of structure
● Data is usually represented as a collection of records
● Each record represents a particular real-world item or concept
○ Item in a product catalog
○ An individual order in an ecommerce system
○ The relationship between an order and an item included in that order
● Records have attributes describing the underlying item or concept
○ Item name, description, product, number
○ Order date, customer, shipping address
○ Quantity of item ordered

9/4/2024 CS 5200, Fall 2024 (Cobbe) 40


What is a Database?
● Records are typically grouped into related collections:
○ All records describing orders
○ All records describing customers
○ All records describing catalog items
● Within a collection, records generally have the same structure but different
values:
○ All orders have order date, but each order may have a different date
○ All customers have a billing address, but each customer has a different address
● Close parallel to classes/structs in C, C++, C#, Java
○ All instances of a Java class have the same fields
○ Different instances of that class (can) have different values in their fields

9/4/2024 CS 5200, Fall 2024 (Cobbe) 41


Where Do We Use Databases?
● Business & transactional systems:
○ appointments/calendars
○ point-of-sale (orders, product catalogs, payments, …)
○ financial systems: payroll, accounts payable, bookkeeping, …
● Personal data management
○ photo library
○ music catalogs (iTunes, etc)
○ address books
● Application/OS support
○ Windows registry
○ Authentication/authorization
○ Browser bookmark/favorites list
○ Email

9/4/2024 CS 5200, Fall 2024 (Cobbe) 42


Database Management System (DBMS)
DBMS: Software system that manages one or more databases.

Requirements include some or all of the following:

● Storage abstraction
● Programmatic interfaces
○ insertion/creation, deletion, modification
○ retrieval: query language, data processing
● Large scale: millions/billions of records
● Long-term durability

9/4/2024 CS 5200, Fall 2024 (Cobbe) 43


Database Management System (DBMS)
Requirements, continued:

● Data Integrity
● Support for multiple concurrent users
○ Authentication
○ Access control
○ Concurrency
● Transactions: ensure that data is always in a consistent state
● Access to metadata

9/4/2024 CS 5200, Fall 2024 (Cobbe) 44


DBMS Form Factor
DBMSs take several different forms:

● separate program(s) running on the same machine as the application


● program(s) running on one or more dedicated machines
● cloud computing service
● library linked into client application

9/4/2024 CS 5200, Fall 2024 (Cobbe) 45


DBMS Costs
● Storage overhead
○ Redundancy for data integrity
○ Indexes for faster data access
○ Metadata
● Performance overhead
○ Maintaining data integrity
○ Multi-user support: concurrency, synchronization
○ Security: multi-user authentication
○ Communications: marshalling data to/from client application
● Complexity overhead
○ Big DBMSs have a lot of administrative overhead (person-hours)
○ Can have steep learning curves

9/4/2024 CS 5200, Fall 2024 (Cobbe) 46


DBMS Benefits
So, why bother? Why not just create our own file format ("flat files")?

● Using a DBMS lets us leverage many engineer-years of work


○ Solved a lot of really tricky problems so we don't have to
○ Lot of investment into robustness
● Flat files generally tailored to specific access pattern
○ As soon as you need a new one, complexity increases
○ Ex: PoS system originally designed to look up orders by customer.
○ What happens when we need to look up orders by item?
○ DBMSs support ad-hoc queries

9/4/2024 CS 5200, Fall 2024 (Cobbe) 47


A Note on Terminology
● DBMS: program(s) that manage data & provide access to it
● Database: collection of related records for a particular application or group of
applications. Examples:
○ University: students, faculty members, courses, grades, registrations, …
○ Point-of-sale: products, customers, orders, shipments, payments, …
○ Blog application: users, posts, comments, …
● Frequently see multiple databases managed by the same DBMS instance.
● Confusingly, people often refer to a DBMS as a “database.”

9/4/2024 CS 5200, Fall 2024 (Cobbe) 48


Database Organization

9/4/2024 CS 5200, Fall 2024 (Cobbe) 49


Database Organization
For now, we'll concentrate on a logical view of data organization: let the DBMS
worry about file formats, etc.

Storage hierarchy:

● Database
● Table (class, entity, record set)
● Record (object instance, row)
● Attribute (column, field)

9/4/2024 CS 5200, Fall 2024 (Cobbe) 50


Database Records
Record: info about a single object or concept that we want to store

Examples:

● vehicle
● student in a university
● item in a product catalog
● financial transaction

9/4/2024 CS 5200, Fall 2024 (Cobbe) 51


Database Attributes
Attributes: properties of object or concept described by a record

Examples:

● vehicle: make, model, year, …


● student: name, GPA, number credit hours, …
● item in a product catalog: name, description, quantity in stock, price, …
● financial transaction: date, amount, payee, …

9/4/2024 CS 5200, Fall 2024 (Cobbe) 52


Database Tables
Table: collection of all records describing different instances of an item/concept

Traditionally, all records in the same table have the same schema: set of attribute
names and their types.

Tables & records can be thought of as corresponding to classes and instances, or


to rows in a table in a spreadsheet.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 53


Database Organization
Let's make this concrete with an example:

ID Last Name First Name Degree Program

000627296 Lovelace Ada MS Computer Science

000246936 Smith Adam Bachelor’s, CSSH

000175892 Liddell Alice Law

9/4/2024 CS 5200, Fall 2024 (Cobbe) 54


Records and Keys
Records in a database almost always have some unique identifying tag, called a
primary key. Can take several forms:

● One or more attributes containing real-world information about the record:


○ User in web app: email address
○ Car: VIN
● Identifying value only meaningful within the database:
○ Order number
○ NUID

9/4/2024 CS 5200, Fall 2024 (Cobbe) 55


Records and Keys
Essential properties of keys:

● Every record has a key value


● Each record in a table has a different key value

9/4/2024 CS 5200, Fall 2024 (Cobbe) 56


Database Relationships
Database relationship: logical connection between 2 or more database objects:
databases, tables, records

DBMSs generally have tools for establishing & managing these relationships

Examples:

● University registration: Student-Class


● Point of sale: Order-Customer
● Corporate: Employee-Manager

9/4/2024 CS 5200, Fall 2024 (Cobbe) 57


Kinds of Relationships
Several different kinds of relationships

Two most common: has-a, is-a

● Student has one or more related Courses


● Order has a Customer
● Employee has a Manager
● Individual Contributor is an Employee
● Manager is an Employee

9/4/2024 CS 5200, Fall 2024 (Cobbe) 58


Relationships & Supplementary Information
Relationships, particularly has-a relationships, often require additional information:

● Student registered for Course


○ registered in a particular semester
● Student grades for Course
○ for a particular section
○ in a particular semester

9/4/2024 CS 5200, Fall 2024 (Cobbe) 59


Specialization Relationships
Specialization (is-a) relationships are similar to OO inheritance:

● Student is-a Person


● Instructor is-a Person

Person has set of shared properties: name, ID, home address, …

Student has all of Person's attributes, plus some more: # credit hours, registered
courses, …

Instructor has all of Person's attributes, plus some more: department, courses
taught, committee assignments, …

9/4/2024 CS 5200, Fall 2024 (Cobbe) 60


Constraints
● Example database:
○ Student: ID, last name, first name, major
○ Faculty: ID, last name, first name, departmentName
○ Department: name, college, chair
● Within table: Student.ID must be unique
● Between tables: Faculty.departmentName must exist in Department.name

9/4/2024 CS 5200, Fall 2024 (Cobbe) 61


Database Models

9/4/2024 CS 5200, Fall 2024 (Cobbe) 62


Database Models
Broadly: different ways of organizing data in a database

Historically common:

● Flat-File
● Hierarchical
● Network
● Relational
● Graph
● Object-oriented
● Semi-Structured

9/4/2024 CS 5200, Fall 2024 (Cobbe) 63


Flat-File
Custom application-defined file format; doesn't use a DBMS

Arguably not really a database, but often discussed in this context

Example: old-style Unix password database:


root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false
_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false

Format changes require substantial code changes

9/4/2024 CS 5200, Fall 2024 (Cobbe) 64


Hierarchical
Records arranged in hierarchical structure

Master records (green); detail records (orange)

Traversals must follow this structure: only way


to access detail is by starting with
corresponding master

Easy to find all items ordered by a particular


customer

Harder to find outstanding orders for particular


catalog item

Examples: XML, Windows Registry

9/4/2024 CS 5200, Fall 2024 (Cobbe) 65


Network
Still have master & detail records

Richer traversals:

● links are 2-way


● details can be contained in multiple
masters

Examples: IDMS

Used in '80s & '90s for high-performance,


high-transaction systems: airline reservations,
credit card transactions

9/4/2024 CS 5200, Fall 2024 (Cobbe) 66


Relational
No master/detail division; all entities top-level

Relationships not stored as ptrs but as normal


attributes

Traversals therefore require lookup

Traversals more flexible: don't have to be baked


into schema

Examples: Oracle, MS SQL Server, MySQL,


PostgreSQL, SQLite, etc.

We'll focus on this model in this course.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 67


Graph Database
Stores nodes and edges as first-class concepts.

Both nodes and edges have attributes.

Support for graph algorithms (shortest


path, etc.)

9/4/2024 CS 5200, Fall 2024 (Cobbe) 68


Object-Oriented
Similar to data model in OO languages

Built-in inheritance

Explicit pointers similar to hierarchical/network,


but more free-form

DB layout driven by application's object model

Examples: db40 (defunct), Versant, ObjectStore

9/4/2024 CS 5200, Fall 2024 (Cobbe) 69


Semi-Structured (NoSQL)
Rows vary in schema. Sparse data extremely common.

Example: single-table inheritance

● All subtypes together in a single table


● Each subtype has different set of attributes
● Storage format doesn't consume space for "missing" attributes

Often used in Big Data applications: record counts in billions or trillions

Very common in cloud solutions

9/4/2024 CS 5200, Fall 2024 (Cobbe) 70


Semi-Structured: Document Store
Stores primarily text documents (plain text,
HTML, MS Word, etc.), often book-length

Full-text indexes (concordances), both forward


and inverted

Fast lookup by position, frequency

Additional features, like phrase search,


synonyms, stemming

from Wikipedia

9/4/2024 CS 5200, Fall 2024 (Cobbe) 71


Semi-Structured: XML
<?xml version="1.0" encoding="UTF-8"?>
Rough tree structure <BlogApplication>
<BlogUsers>
<BlogUser date="2016-04-02">
<UserName>username1</UserName>
However, siblings may have different structure <FirstName>First1</FirstName>
<LastName>Last1</LastName>
</BlogUser>
...
May or may not have an associated XML </BlogUsers>
<BlogPosts>
<BlogPost>
schema <PostId>1</PostId>
<UserName>username1</UserName>
<Title>title1</Title>
<Content>content1</Content>
Common query languages: XPath, XSLT, XQuery <Comments>
<Comment>
<CommentId>1</CommentId>
<UserName>username4</UserName>
<Content>comment1</Content>
</Comment>
<Comment>
<CommentId>2</CommentId>
<UserName>username4</UserName>
<Content>comment2</Content>
</Comment>
</Comments>
</BlogPost>
...
</BlogPosts>
</BlogApplication>

9/4/2024 CS 5200, Fall 2024 (Cobbe) 72


Semi-Structured: Key-Value Pairs
Records are free-form; indexed by a string "key" {
orderNumber: "1234",
Used in very high-performance systems, orderDate: "2013-01-01",
customerNumber: "C999",
particularly when data is extremely sparse details: [
{ itemNumber: "A123",
High scalability: can distribute data across description: "pencils",
networks pencil-hardness: "No2",
quantity: "50" },
Often used in cloud systems { itemNumber: "A456",
description: "paper",
Examples: Amazon EC2, MS Azure, Google paper-weight: "24 lb",
AppEngine color: "white",
quantity: "25" }
]
}

9/4/2024 CS 5200, Fall 2024 (Cobbe) 73


Our Approach

9/4/2024 CS 5200, Fall 2024 (Cobbe) 74


Our Approach
● Focus primarily on relational databases in this course
● Cover NoSQL/semi-structured as time permits
● Start with relational algebra: theoretical foundation for operations we can
perform on data in a database: retrieval, data processing
● Move on to designing a database: deciding how to represent data of interest
as tables & relations
○ Might appear backwards: why talk about queries if we don't know what the database looks
like?
○ Hypothesis: rules for designing databases make more sense if we know how we're going to
interact with them

9/4/2024 CS 5200, Fall 2024 (Cobbe) 75


Our Approach, continued
● Next: Structured Query Language (SQL). This is the language that we'll use to
submit queries to an actual DBMS.
● Two streams after that:
○ Project: construct a database for a particular purpose, and write a program that can interact
with the database.
○ Lectures: talk about some of the ideas involved in implementing a DBMS: concurrency,
transactions, performance, optimizations.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 76


For Next Week
● Survey and video introduction!
● No homework yet; first assignment to go out next week.
● Starting thinking about your project teams:
○ 5 people per team
○ I will set up a post on Piazza for people to look for teams.
○ Please have these in place by the week of September 16; details to follow.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 77

You might also like