Sunday, February 25th, 2024

An invincible determination can accomplish almost anything and in this lies the great distinction between great men and little men.

— Thomas Fuller

  • new column(s) needed
    • base RID
  • new RID allocation strategy
    • All tail records across page ranges share the same pool of tail RIDs. Tail records have unique RIDs for ease of storage.
    • Page directory is responsible for allocating new RIDs in a thread-safe way.
  • page directory rewrite
    • keep track of tail PageIds that a page range possesses.
    • Page directory stores tail page indices for all page ranges.
    • Page directory (and page range) should not be the one reading and writing tail records. From now on, page directory and page range will only be responsible for getting PageIds of a base/tail record, allocating base/tail record ID, and updating tail page index.
    • keep track of unmerged tail pages
  • A PageRange
    • keeps track of tail page IDs for each column
  • also need to insert backup tail records
  • merge
    • how tf does merging work?
  • checkpoint
    • ensure that all committed transactions (up to the latest point in WAL at the time of checkpointing) are written to disk
    • bufferpool need to write all currently dirty buffers to disk.
  • parse WAL
    • each transaction is:
      • start time: 8 bytes
      • number of queries: 8 bytes
      • each query is:
        • 1 byte query type marker (enum)
        • table name: string length + string
        • insert
          • 2 bytes for number of columns
          • ints for column values
        • update
          • 2 bytes for number of columns
          • ceil((number of columns + 7) // 8) bytes for schema encoding
          • ints for column values
        • delete
          • table name: string length + string
          • key