Class RdeStagingAction

  • All Implemented Interfaces:
    java.lang.Runnable

    public final class RdeStagingAction
    extends java.lang.Object
    implements java.lang.Runnable
    Action that kicks off a Dataflow job to stage escrow deposit XML files on GCS for RDE/BRDA for all TLDs.

    Pending Deposits

    This task starts by asking PendingDepositChecker which deposits need to be generated. If there's nothing to deposit, we return 204 No Content; otherwise, we fire off a job and redirect to its status GUI. The task can also be run in manual operation, as described below.

    Dataflow

    The Dataflow job finds the most recent history entry on or before watermark for each resource type and loads the embedded resource from it, which is then projected to watermark time to account for things like pending transfer.

    Only ContactResources and HostResources that are referenced by an included DomainBase will be included in the corresponding pending deposit.

    Registrar entities, both active and inactive, are included in all deposits. They are not rewinded point-in-time.

    Afterward

    The XML deposit files generated by this job are humongous. A tiny XML report file is generated for each deposit, telling us how much of what it contains.

    Once a deposit is successfully generated, For RDE an RdeUploadAction is enqueued which will upload it via SFTP to the third-party escrow provider; for BRDA an BrdaCopyAction is enqueued which will copy it to a GCS bucket and be rsynced to a third-party escrow provider.

    To generate escrow deposits manually and locally, use the nomulus tool command GenerateEscrowDepositCommand.

    Logging

    To identify the reduce worker request for a deposit in App Engine's log viewer, you can use search text like tld=soy, watermark=2015-01-01, and mode=FULL.

    Error Handling

    Valid model objects might not be valid to the RDE XML schema. A single invalid object will cause the whole deposit to fail. You need to check the logs, find out which entities are broken, and perform database surgery.

    If a deposit fails, an error is emitted to the logs for each broken entity. It tells you the key and shows you its representation in lenient XML.

    Failed deposits will be retried indefinitely. This is because RDE and BRDA each have a Cursor for each TLD. Even if the cursor lags for days, it'll catch up gradually on its own, once the data becomes valid.

    The third-party escrow provider will validate each deposit we send them. They do both schema validation and reference checking.

    This job does not perform reference checking. Administrators can do this locally with the ValidateEscrowDepositCommand command in the nomulus tool.

    Cursors

    Deposits are generated serially for a given (tld, mode) pair. A deposit is never started beyond the cursor. Once a deposit is completed, its cursor is rolled forward transactionally.

    The mode determines which cursor is used. Cursor.CursorType.RDE_STAGING is used for thick deposits and Cursor.CursorType.BRDA is used for thin deposits.

    Use the ListCursorsCommand and UpdateCursorsCommand commands to administrate with these cursors.

    Security

    The deposit and report are encrypted using Ghostryde. Administrators can use the GhostrydeCommand command in the nomulus tool to view them.

    Unencrypted XML fragments are stored temporarily between the map and reduce steps and between Dataflow transforms. The ghostryde encryption on the full archived deposits makes life a little more difficult for an attacker. But security ultimately depends on the bucket.

    Idempotency

    For the Dataflow job we do not employ a lock because it is difficult to span a lock across three subsequent transforms (save to GCS, roll forward cursor, enqueue next action). Instead, we get around the issue by saving the deposit to a unique folder named after the job name so there is no possibility of overwriting.

    Deposits are generated serially for a given (watermark, mode) pair. A deposit is never started beyond the cursor. Once a deposit is completed, its cursor is rolled forward transactionally. Duplicate jobs may exist <=cursor. So a transaction will not bother changing the cursor if it's already been rolled forward.

    Enqueuing RdeUploadAction or BrdaCopyAction is also part of the cursor transaction. This is necessary because the first thing the upload task does is check the staging cursor to verify it's been completed, so we can't enqueue before we roll. We also can't enqueue after the roll, because then if enqueuing fails, the upload might never be enqueued.

    Determinism

    The filename of an escrow deposit is determistic for a given (TLD, watermark, mode) triplet. Its generated contents is deterministic in all the ways that we care about. Its view of the database is strongly consistent in Cloud SQL automatically by nature of the initial query for the history entry running at READ_COMMITTED transaction isolation level.

    This is also true in Datastore because:

    1. EppResource queries are strongly consistent thanks to EppResourceIndex
    2. EppResource entities are rewinded to the point-in-time of the watermark

    Here's what's not deterministic:

    • Ordering of XML fragments. We don't care about this.
    • Information about registrars. There's no point-in-time for these objects. So in order to guarantee referential correctness of your deposits, you must never delete a registrar entity.

    Manual Operation

    The task can be run in manual operation by setting certain parameters. Rather than generating deposits which are currently outstanding, the task will generate specific deposits. The files will be stored in a subdirectory of the "manual" directory, to avoid overwriting regular deposit files. Cursors and revision numbers will not be updated, and the upload task will not be kicked off. The parameters are:

    • manual: if present and true, manual operation is indicated
    • directory: the subdirectory of "manual" into which the files should be placed
    • mode: the mode(s) to generate: FULL for RDE deposits, THIN for BRDA deposits
    • tld: the tld(s) for which deposits should be generated
    • watermark: the date(s) for which deposits should be generated; dates should be start-of-day
    • revision: optional; if not specified, the next available revision number will be used

    The manual, directory, mode, tld and watermark parameters must be present for manual operation; they must all be absent for standard operation (except that manual can be present but set to false). The revision parameter is optional in manual operation, and must be absent for standard operation.

    See Also:
    Registry Data Escrow Specification, Domain Name Registration Data Objects Mapping
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.lang.String PATH  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void run()  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • run

        public void run()
        Specified by:
        run in interface java.lang.Runnable