Client
Graft Clients support reading and writing to Volumes.
Local Storage
Section titled “Local Storage”Graft client uses Fjall, an embeddable rust key-value store based on LSM trees, for local storage. Graft splits up the data between three Fjall partitions with the following key layout and value types:
volumes: {vid}/config -> VolumeConfig {vid}/status -> VolumeStatus {vid}/snapshot -> Snapshot {vid}/watermarks -> Watermarks
pages: {vid}/{pageidx}/{LSN} -> PageValue
commits: {vid}/{LSN} -> Graft
VolumeConfig: sync: Disabled | Push | Pull | Both
VolumeStatus: Ok | RejectedCommit | Conflict
Snapshot: local: LSN remote: RemoteMapping pages: PageCount
RemoteMapping: Unmapped Mapped { remote: LSN, local: LSN }
Watermarks: pending_sync: Option<LSN> checkpoint: Option<LSN>
PageValue: Pending Empty Available(Page)
Graft: Splinter of all PageIdxs changed in the commit
Reading
Section titled “Reading”To issue a local read against a Volume snapshot:
-
Lookup the latest page in storage such that
page.LSN <= snapshot.local
- If this page is either Available or Empty return the page
-
If
snapshot.remote
is empty, return an empty page -
Request the page from the Pagestore
- This may be batched along with prefetches
-
Save the requested page into storage at
page.LSN
Writing
Section titled “Writing”Writes commit locally and then are asynchronously committed remotely. This section only deals with the local commit.
Writes go through a VolumeWriter
which buffers newly written pages in a memtable. Reads check the memtable to enable RYOW before falling back to the regular Read algorithm. Each VolumeWriter
is pinned to a Snapshot.
The commit process happens atomically via a Fjall batch.
- Set
commit_lsn = snapshot.local.next()
- Persist the memtable at
commit_lsn
- Write out a Graft to the commits partition at
commit_lsn
- Take the local commit lock
- Set
latest
to the latest volume Snapshot - Fail if
latest.local != snapshot.local
- Write out the new snapshot (without changing the remote mapping)
- Commit the Fjall batch
- release the commit lock
The Graft Client runtime supports asynchronously pushing and pulling from the server. Since this process happens out of band, two writers committing to the same Volume will frequently conflict and will need to rebase or reset to continue.
Future work:
- synchronous commit+push to make conflicts easier to detect
- MVCC automatic conflict resolution
- Rebase conflict resolution
Sync: Pull
Section titled “Sync: Pull”The Graft runtime polls /metastore/v1/pull_graft
for changes. When a change is detected, the runtime attempts to “accept” the change.
The pull process happens atomically via a Fjall batch.
-
Take the local commit lock
-
Read the latest Volume Snapshot and Watermarks
-
If
remote_mapping.local < pending_sync
: FAIL withVolumeNeedsRecovery
-
If
remote_mapping.local < snapshot.local
: FAIL withRemoteConflict
- set Volume status to
VolumeStatus::Conflict
- set Volume status to
-
Set
commit_lsn = snapshot.local.next()
-
Update the snapshot
local=commit_lsn, remote=(remote_lsn, commit_lsn), pages=remote_pages
-
Update the watermarks
pending_sync=commit_lsn
-
For each changed pageidx in the remote commit, write out
PageValue::Pending
into the pages partition usingcommit_lsn
. This ensures that future reads know to fetch the page from the PageStore. -
Commit the Fjall batch
-
release the commit lock
FAIL states:
-
VolumeNeedsRecovery
: This means that we had previously crashed in the middle of pushing the Volume to the server. The client needs to recover or reset the volume before continuing. -
Conflict
: This means that we have made local commits since the last successful sync. The client needs to reconcile with the server before continuing.
Sync: Push
Section titled “Sync: Push”When the Graft runtime detects a local commit has occurred, it tries to push the commit to the server.
-
Take the local commit lock
-
Read the latest Volume Snapshot and Watermarks
-
If
remote_mapping.local < pending_sync
: FAIL withVolumeNeedsRecovery
-
update
watermarks.pending_sync
tosnapshot.local
-
calculate the LSN range to push:
start_lsn = remote_mapping.local.next()
end_lsn = snapshot.local
-
release the local commit lock
-
iterate through the local commit splinters
- send the most recent page for each pageidx to the pagestore
- collect new segments
-
commit the segments to the metastore
-
take the local commit lock
On commit success:
- Open a Fjall batch
- Read the latest Volume Snapshot and Watermarks
- Assert that the new remote LSN is larger than the last remote LSN
- Assert that
watermarks.pending_sync == snapshot.local
- Update the snapshot’s remote mapping to
(remote_lsn, snapshot.local)
- Remove all successfully synced commit grafts
- Commit the batch
- Release the local commit lock
On commit failure:
- Update
watermarks.pending_sync = snapshot.remote_mapping.local
- Set Volume status to
VolumeStatus::RejectedCommit
Crash recovery
Section titled “Crash recovery”The Graft client runtime must be able to crash at any point and recover. Fjall already has its own recovery mechanisms built in, so we just need to handle failed Pushes. Failed pushes can be detected when pending_sync
is larger than remote_mapping.local
and no concurrent Push job is running.
When a volume is in this failed push state, it needs to determine if the commit was successfully accepted by the Metastore or not. It does so by retrying the commit process with the same idempotency token.
Lite Client
Section titled “Lite Client”In some cases, a Client may want to boot without any state and quickly read (+ possibly write) to a particular Volume snapshot. In the most minimal case, if the client already knows the LSN of the snapshot they want to access, they can read from the Page Server immediately. If they want to issue a write, they will need to read the latest snapshot to get the page count and current remote LSN.
Supporting Lite Clients is desirable to help enable edge serverless workloads which want to optimize for latency and have no cached state.