15 Sep OXANA Overview
This document describes the data export and access subsystem of Hubert III built around the MP-Anything XML and JSON format. It consists of export definition layer, exporting engine and access layers. Overall structure of the subsystem can be pictured in the following way:
Exporting subsystem gets the data from currently running Hubert/FRED staging database to not introduce workload on production database. Data are checked for modifications in requested intervals by the exporting engine and exported according to the export definition settings into export folders.
Files from export folders can then be accessed using the access layer, by simple file access, REST API or pushed to external services/databases automatically.
Export definition layer
This layer allows administrator to create numerous export definitions used by exporting engine to create output in MP-Anything format. For each export, administrator can define:
- Type of the export – there are three types of export: channel, product and content.
- Target folder – folder where the files will be exported.
- List of channels to export – for channel exports, administrator can define list of the exported channels.
- Areas – in multi-customer scenarios, user can limit the visibility of the content in exports by specifying the list of visibility areas.
- Product, rubrics and preselections – administrator can limit the visibility or rubric and photo selections in the export.
- Schemas exported – administrator can define which schemas will be exported and which change triggers the export.
- Format of the export – MP-Anything in XML or JSON variant.
- Multilanguage handling – administrator can specify whether he wants full Multilanguage content in export or just the content in the language of the channel.
- Export frequency – administrator can define how often the export will be generated. He can also set the exports to be executed only by demand, using the REST PULL interface.
- Date type – for product types. Described below.
Each export can run in the full synchronization mode (usually used initially to sync the contents of the target folder with the data) or incremental mode (exporting only changes). Full synchronization can be forced using the button in the GUI.
Export types
File and folder structures created in the target folders vary depending on selected type of the export.
Channel export type
This is the most common export type used to synchronize external databases. Every export refresh (triggered by the time interval or REST PULL) creates a pack of XML/JSON files with every record from the defined record scope that has changed. This export build the following file/folder structure in the target folder:
target folder CONFIG timestamp3 timestamp2 timestamp1 timestamp2 ... state.xml
Special file state.xml is present in the folder. It describes the current state of the exporting process.
Inside of the target folder, you will find subfolders with the names corresponding with the timestamp of the start of the export. The folders are named according to the following pattern:
YYYYMMDDhhmmss
Where:
- YYYY – year in 4 digit format
- MM – month number
- DD – day number
- hh – hour in 2 digit format
- mm – minute in 2 digit format
- ss – second in 2 digit format
Each folder contains the set of XML/JSON files in MP-Anything format with the records that have changed. MP-Anything files are named according to the following pattern:
[schema name]_[unique numeric id].XML | JSON
Where:
- schema name – name of the root schema (eg. tv-event)
- unique numeric id – numeric id of the record (eg. 12345678)
In the root target folder, you will also find the CONFIG subfolder containing subfolders named with the timestamps. These subfolders contains configuration files consistent with the exported data from the timestamp on. For description of the configuration files please refer to the MP-Anything specification (chapter named “Configuration files”).
Consuming the timestamped subfolders in the order of creation will ensure that the external database state is consistent with the Hubert/Fred.
Product export type
This is the export type that should be used to synchronize/access the rubric selections used within Hubert production. Because of this fact, you use only the product/rubric/preselection settings configuring this export (channels export is not available).
Target folder is structured in the following way:
target folder CONFIG product1 date1 rubric1 rubric2 .... date2 rubric1 rubric2 .... ... ... state.xml
Special file state.xml is present in the folder. It describes the current state of the exporting process.
Inside of the target folder, you will find subfolders named with the short names of the selected Hubert products. Within each product folder, you will find the subfolders named with the dates of production (start dates of the records in exported rubrics). The date folders are named according to the following pattern:
YYYYMMDD
Where:
- YYYY – year in 4 digit format
- MM – month number
- DD – day number
Depending on the export definition setting, this date is either the emission date or calendar date.
Within the date folders there are rubric folders named with the short name of the rubric. So the structure can look like:
OUT CONFIG PRODUCT1 20150712 TIPPS SPORT SERIEN 20150713 TIPPS SPORT SERIEN PRODUCT2 … …
Each rubric folder contains the set of XML/JSON files in MP-Anything format with the all records belonging to the rubric. MP-Anything files are named according to the following pattern:
[schema name]_[unique numeric id].XML | JSON
Where:
- schema name – name of the root schema (eg. tv-event)
- unique numeric id – numeric id of the record (eg. 12345678)
Within OUT folder there will also be the CONFIG subfolder containing set of configuration files consistent with the exported data. For description of the configuration files please refer to the MP-Anything specification (chapter named “Configuration files”).
Content export type
Content export type allows you to track changes on the records of a schema (schemas) even if the schema is not the root one (is not attached to the channel). During definition, you have to specify which content schemas you would like to track. You can also define which fields’ update triggers the export. In effect every change of any record of this schema will emit the update in the export.
Structure of the export folder and its logic is the same as in case of the channel export type.
Access layer
Access layer of the exporting subsystem provides the way of accessing data exported in the target folders. Three methods of access to the files will be supported.
Direct file access
In this method, target folders will be placed on the network share or SFTP account and the recipient will be able to access it freely. In this scenario it is up to the recipient to clean up the folder after retrieving the data.
REST PULL access
In this method, target folder will be accessible using a simple REST interface. Interface consists of two parts – export control and data retrieval.
In the description we refer to two common elements:
- <endpoint> – is the http/https endpoint to the REST interface
- <export id> – is the unique, long identifier of the export (GUID)
Each REST function returns the HTTS status. You can find detailed description of this interface in the [[oxana-rest-interface|OXANA REST Interface]] document.
PUSH access
In the PUSH access scenarios we export the data directly to the external document database for further querying. In the current state we support the following databases:
- for JSON variant MP-anything:
- Azure DocumentDB
- PostgresSQL (version 9.4+)
- Elasticsearch
- for XML variant MP-anything:
- eXist
- BaseX
Sorry, the comment form is closed at this time.