Database Homework #2, Spring 2018

For each question below (most anyway), you will write an SQL query. Additionally, for each question with a star (*), provide the SQLite output (the table that is printed). The output can be cut-and-pasted from your SQLite browser.

Each of these questions should be answered with a single SQL query that would hypothetically work for any logical database instance. In other words, you shouldn't "hard-code" anything in a query that will make the query return incorrect information if any of the information in the database changes.

  1. (Easy SQL) Using SQLite, load the Flights database. The schema is as follows:

    1. *Select the IDs, first, and last names of all customers who live in Memphis. [Sanity check: 6 rows]

    2. *Select the airline code, flight numbers, and departure/arrival times of all flights scheduled to fly from MEM to ATL (MEM and ATL are the airport codes for Memphis and Atlanta). [Sanity check: 7 rows]

    3. *Find the names, codes, and cities of all the airports in Tennessee that serve cities of at least 100,000 people. [Sanity check: 6 rows]

    4. Find the total number of airports in each state in the country. Return two columns: state, and number of airports.

    5. *Find the total number of people on Delta flight 45. Hint: Delta's two-character code is DL.

    6. *Find the total number of rows occupied on Delta flight 45. A row is occupied if at least one person is sitting in the row.

  2. (Medium SQL) Consider the following relational design used at a Memphis pizza restaurant:

    The Customer table maintains the personal information for people who order pizzas, such as a unique ID, name, phone number, credit card number (ccn), and neighborhood (e.g., Midtown, East Memphis, Evergreen). It is possible that two different people have the same name.

    In the Pizza table, every pizza has an ID, name (e.g., "the works"), particular size (e.g., 7 inches) and price. Note that different pizzas may have the same name, but different sizes.

    The Order table includes the records about which customer ordered which pizza, quantity of pizzas, orderyear (e.g., 2014), ordermonth (e.g., 12), orderday (e.g., 27), and the order time (e.g., "6:13pm"). Note that an order can contain more than one pizza; there will just be multiple tuples with the same cust_id, date, and time.

    The Supplies table includes the information of the various groceries used by the store: the name, unit price, and the amount left in the store (e.g. the store might have 3 lbs of mozzarella left with a unit price of $5 per lb).

    The Ingredient table keeps the records about the amount of ingredients used by each pizza (so "The works 7 inch" might use only 0.1 lbs of mozzarella, while "Four cheese 12 inch" might use 0.4 lbs).

    Write SQL queries to answer parts b through f. Use only the operators and SQL statements we have learned in class. In addition, avoid the operators IN, ALL, ANY, and EXISTS.

    1. What are logical keys for the five tables?
    2. Find the ids of the customers with name "Bob."
    3. List pizza ids of the pizzas ordered by all the customers. Remove duplicates.
    4. Find pizza ids of all the pizzas with the price more than 15 dollars.
    5. List ids and names of the customers who ordered at least one pizza and live in Midtown.
    6. Find all the pizzas (pizza id, pizza name, and size) where the store makes a profit of at least 10 dollars per whole pizza. (Here, we assume Profit = Pizza Price minus Ingredient Cost)

    Suggestion: It is very tempting to create a database and throw some sample data into it to do this problem on a computer. I encourage you not to do this, or at least only do it to check your work. I purposefully did not give you a database for this question so you could practice writing SQL queries without being able to test them out, because the midterm will have questions like this.

  3. (Advanced SQL) Using the Flights SQLite database again, write SQL queries to answer the following questions. Note that some of these queries will require multiple seconds to run. On my computer, the last query takes a good 15-20 seconds.

    1. Find a listing of all flights and the number of people on each flight. Return a table with columns for airline code, flight number, and number of passengers. [Sanity check: 23983 rows]

    2. *Find the largest number of passengers on any single flight. Return this number of passengers (one row/one column). [Sanity check: the number is between 100 and 200.]

    3. *Find the airline code, flight number, source, and destination airport codes for the flight(s) with the largest number of passengers. Do not use any literal numbers in the query; your query should calculate the size of the largest flight and retrieve the relevant information in one query.

      Note that in general, there is no guarantee that there will be only one flight that has this number of passengers on it! Your query should be able to return them all if there is more than one.

      [Sanity check: For this particular database instance, there is only one flight with the largest number of passengers on it and it involves a New York City airport.]

    4. *Find, for each airline, the most popular one-way route that airline flies, defined as the route (pair of source/destination airports) that carries the most passengers. Return the airline code, the source and destination airport codes, and the total number of passengers that are flying that route.

      Note that there are multiple airlines in the database that fly the same route, and possibly multiple flights per day for each airline on that route.

      Note that "one-way route" means we are not combining flights from Memphis to Atlanta with flights from Atlanta to Memphis. Those are two separate one-way routes.

      Hint: this is hard. You may want to use SQL VIEWs to save parts of your query as you develop them. An SQL view is similar to the relational algebra shorthand notation: a view allows you to give a name to a query that you plan to use later on (for instance, as a subquery). If you use views, you should write down your CREATE VIEW statements as well as your final SELECT statement.

      Hint 2: You may find it useful as well to read the book section on correlated subqueries. This may be helpful as well.

      Hint 3: You should end up with a 12-row table. [Sanity check: You should end up with the most popular Delta (DL) route being from Minneapolis/St. Paul (MSP) to Salt Lake City (SLC), carrying 572 passengers in total.

Other hints, guidelines, and clarifications