Class RdeStagingAction
- All Implemented Interfaces:
Runnable
Pending Deposits
This task starts by asking PendingDepositChecker
which deposits need to be generated.
If there's nothing to deposit, we return 204 No Content; otherwise, we fire off a job and
redirect to its status GUI. The task can also be run in manual operation, as described below.
Dataflow
The Dataflow job finds the most recent history entry on or before watermark for each resource type and loads the embedded resource from it, which is then projected to watermark time to account for things like pending transfer.Only Contact
s and Host
s that are referenced by an included Domain
will
be included in the corresponding pending deposit.
Registrar
entities, both active and inactive, are included in all deposits. They are
not rewound point-in-time.
Afterward
The XML deposit files generated by this job are humongous. A tiny XML report file is generated for each deposit, telling us how much of what it contains.
Once a deposit is successfully generated, For RDE an RdeUploadAction
is enqueued which
will upload it via SFTP to the third-party escrow provider; for BRDA an BrdaCopyAction
is
enqueued which will copy it to a GCS bucket and be rsynced to a third-party escrow provider.
To generate escrow deposits manually and locally, use the nomulus
tool command
GenerateEscrowDepositCommand
.
Logging
To identify the reduce worker request for a deposit in App Engine's log viewer, you can use
search text like tld=soy
, watermark=2015-01-01
, and mode=FULL
.
Error Handling
Valid model objects might not be valid to the RDE XML schema. A single invalid object will cause the whole deposit to fail. You need to check the logs, find out which entities are broken, and perform database surgery.
If a deposit fails, an error is emitted to the logs for each broken entity. It tells you the key and shows you its representation in lenient XML.
Failed deposits will be retried indefinitely. This is because RDE and BRDA each have a Cursor
for each TLD. Even if the cursor lags for days, it'll catch up gradually on its own, once
the data becomes valid.
The third-party escrow provider will validate each deposit we send them. They do both schema validation and reference checking.
This job does not perform reference checking. Administrators can do this locally with the
ValidateEscrowDepositCommand
command in the nomulus
tool.
Cursors
Deposits are generated serially for a given (tld, mode) pair. A deposit is never started beyond the cursor. Once a deposit is completed, its cursor is rolled forward transactionally.
The mode determines which cursor is used. Cursor.CursorType.RDE_STAGING
is used for thick
deposits and Cursor.CursorType.BRDA
is used for thin deposits.
Use the ListCursorsCommand
and UpdateCursorsCommand
commands to administrate
with these cursors.
Security
The deposit and report are encrypted using Ghostryde
. Administrators can use the
GhostrydeCommand
command in the nomulus
tool to view them.
Unencrypted XML fragments are stored temporarily between the map and reduce steps and between Dataflow transforms. The ghostryde encryption on the full archived deposits makes life a little more difficult for an attacker. But security ultimately depends on the bucket.
Idempotency
For the Dataflow job we do not employ a lock because it is difficult to span a lock across three subsequent transforms (save to GCS, roll forward cursor, enqueue next action). Instead, we get around the issue by saving the deposit to a unique folder named after the job name so there is no possibility of overwriting.
Deposits are generated serially for a given (watermark, mode) pair. A deposit is never started
beyond the cursor. Once a deposit is completed, its cursor is rolled forward transactionally.
Duplicate jobs may exist <=cursor
. So a transaction will not bother changing the cursor
if it's already been rolled forward.
Enqueuing RdeUploadAction
or BrdaCopyAction
is also part of the cursor
transaction. This is necessary because the first thing the upload task does is check the staging
cursor to verify it's been completed, so we can't enqueue before we roll. We also can't enqueue
after the roll, because then if enqueuing fails, the upload might never be enqueued.
Determinism
The filename of an escrow deposit is deterministic for a given (TLD, watermark, mode) triplet. Its generated contents is deterministic in all the ways that we care
about. Its view of the database is strongly consistent in Cloud SQL automatically by nature of
the initial query for the history entry running at READ_COMMITTED
transaction isolation
level.
Here's what's not deterministic:
- Ordering of XML fragments. We don't care about this.
- Information about registrars. There's no point-in-time for these objects. So in order to guarantee referential correctness of your deposits, you must never delete a registrar entity.
Manual Operation
The task can be run in manual operation by setting certain parameters. Rather than generating deposits which are currently outstanding, the task will generate specific deposits. The files will be stored in a subdirectory of the "manual" directory, to avoid overwriting regular deposit files. Cursors and revision numbers will not be updated, and the upload task will not be kicked off. The parameters are:
- manual: if present and true, manual operation is indicated
- directory: the subdirectory of "manual" into which the files should be placed
- mode: the mode(s) to generate: FULL for RDE deposits, THIN for BRDA deposits
- tld: the tld(s) for which deposits should be generated
- watermark: the date(s) for which deposits should be generated; dates should be start-of-day
- revision: optional; if not specified, the next available revision number will be used
The manual, directory, mode, tld and watermark parameters must be present for manual operation; they must all be absent for standard operation (except that manual can be present but set to false). The revision parameter is optional in manual operation, and must be absent for standard operation.
- See Also:
-
Field Summary
-
Method Summary