• PageId is a class containing table name, base rid, column index, base or tail, and page’s sequential ID (within the column)
  • Bufferpool
    • maintain in-memory cache of pages (frames)
      • pages: dict[PageId, Page]
      • dirty_pages: set[PageId]
      • range_refcount: dict[int, int]
      • tail_index: for each page range, store RID mapping; when range_refcount for that page range reaches zero, then delete the range RID index after a while or if we’re nearing memory limit.
      • when len(frames) exceeds configured breakpoint (e.g. a percentage of memory, or fixed memory usage like 1GiB), use LFU eviction policy
    • handle read & prefetch & write requests and automatically pin frames when they are requested
      • Read can be done simultaneously, but write requires exclusivity. A write requires all previous reads and write to be finished.
    • handle unpin request given PageId
  • PageReader
    • reads a page given a PageId and database base path; used by bufferpool only
  • PageWriter
    • writes a page given a PageId and database base path; used by bufferpool management only
  • RecordReader
    • should it be a context manager?
    • can be configured to read all or select columns of a table
    • functionality
      • prefetch for linear scan
      • prefetch for indexed scan
      • allow for predicates
      • unpin pages
  • Query implementation
    • Select (match search key via an index)
    • Select (match search key via linear scan)
    • Insert
    • Update
    • Sum
    • Delete
  • Rename Record to TableRow
  • Write-Ahead Log
    • writes records changed for each transaction to disk
  • Writer
    • periodically saves dirty pages to disk
  • Create a new QueryResult class that doesn’t have metadata columns
  • Implicit wrapping of all single operations as transactions
  • Page Merge Worker
    • What do we need to store in addition to the data? Sequence number? How do we represent & track merged pages on disk?
    • So base pages are all stored in a single file. When base pages and tail pages are merged, the merged offset are stored at a fixed offset depending on the merge sequence number. Alternatively, to optimize away the wasted space when a merge happens without a full page range, we can store file offsets to the beginning of each merge.