First Meeting of the IODE Steering Group for the IODE Ocean Data Portal Oostende, Belgium 20-22 September 2010 OPTIONS TO PROVIDE DATA THROUGH ODP Introduction The ODP technology is considered as distributed data system that comprises local data systems managed by IODE data centres and provides transparent access to the metadata, data and products (resources) generated by these local data systems. The ODP technology includes the following components: Technical specifications defining the namespace and structures of metadata and data records, data exchange protocol, etc. for data exchange based on a distributed data sources network; Data Provider (software) providing the access to the local data system of participating centres. As soon as the Data Provider is installed as a wrapper of the local data system, this system becomes a data source for the ODP distributed data system; Integration Server (software) provides the management and use of the data sources network via interaction with Data Providers. The data center which agrees to be the ODP data provider should provide: the appropriate middleware for communications: web server and application server; installation of the ODP Data Provider software to connect the local data system with ODP Integration Server; registration of the data source and its resources, support of the local data system 1. Data Provider requirements 1.1. Technical requirements The software and hardware requirements to install and use the ODP Data Provider software are listed here. 1.1.1. The hardware. To install and operate the ODP Data Provider, it is recommended to use a computer with the following minimum characteristics: CPU 1GHz or more, 1 GB RAM, 300 Mb hard disk space. It is recommended to use a dedicated computer for Data Provider installation. 1.1.2. The middleware. The following is required to install and operate the ODP Data Provider: operational system Windows or Unix-based (have been tested on Windows XP, Windows 2003 Server, Windows 2007 server, RedHat, openSUSE). web server. It can be any web server that supports PHP. Apache or Microsoft's IIS (this is preferable to Apache 2.2.x). Web server Apache with PHP is needed only if database is planned to use. PHP is a cross-platform web scripting language. Centres providing data should have a PHP interpreter (version 4.4.4 or later). The PHP and Apache software can be obtained from the Ocean Data Portal along with the Data Provider software. A description of the Data Provider installation process is given in the document Data Provider user guide - installation and setup that is available at  HYPERLINK "http://www.oceandataportal.org/index.php?option=com_content&task=view&id=24&Itemid=67&catid=9" http://www.oceandataportal.org/index.php?option=com_content&task=view&id=24&Itemid=67&catid=9 1.2. Functional requirements The local data administrator should provide: design of resources; data source registration; resources registration; provision of data 1.2.1 The design of resources This work can be done before or at once after the Data Provider installation and includes the following actions: analysis of local data system(s) structure, data ordering, key elements, data storage type and other specifications and resolving: assessment of resource contents (in other words deciding which part of the data of the local system will be presented as a resource); resources type (single or serial); key elements to provide the data granularity if resource is serial; resource titles and other reference information according the sections of E2ESearchMD; analysis and comparing the concept XML-schema (accessible from ETDMP web-site) with parameter list of local data system and expanding the concept XML-schema by additional parameters (if it is needed) in contact with E2E developers; preparation of the code lists used in local data and the comparing this lists with system codes accessible from Data Provider user interface, making decisions on Aode conversions from local codes to system codes. 1.2.2 The Data Provider installation and the data source registration This work is carried out by the local system administrator according to the  Data Provider user guide - installation and setup . During installation the registration of the data source is provided using the Data Provider web-form and recommendations of the document Data Provider - Registration and Maintenance of Resources under the Distributed Data Source System. The estimated time for the Data Provider installation is one working day. The ETDMP plans to make the Data Provider installation easier by providing software for a controlled installation process. 1.2.3 The registration of resources This work should be based on results of the resource design using the Data Provider user interfaces (web-forms). For each resource the following should be provided: description of the resource; registration of the resource interface with local data system and local codes. The all work is controlled by the Data Provider web-forms or services. The estimated time for one resource registration is 2-3 hours for first user session and is usually about 1 hour for subsequent ones. The preparation of the description of the resource instances with DBMS/data files types is provided automatically by the special function of the Data Provider starting from the registration web-form. To prepare the description of resource instances with object file types the Data Provider web-form provides the possibility to copy the existing description to use as a template for describing other instances (with appropriate editing). 1.2.4 The provision of the resource life-cycle. To provide the resource life-cycle the local data administrator should: define the schedule for updating of the resource description (time in seconds reflecting update period minute, hour, day, month and etc) in the appropriate string of the web-form for resource registration; check the local data (connection and actual description) from time to time using the report (appropriated screen table) submitted by the Data Provider; take the needed actions to provide the resource actuality check the physical data source address etc according the recommendations of Data Provider User Guide. 1.3. The support requirements. To test and operate the Data Provider it is recommended to designate a person who will be responsible for the ODP technology use at data centre. 2. Using ODP Data Provider software in data centre 2.1. ODP Data Provider installation. The data centre source is made available to the ODP system by the Data Provider software which is placed in the data center telecommunication node and generates the resources of the ODP distributed data system interacting with the local data system(s). The data source operates on the basis of the following rules: local data system(s) of a data centre can have various storage types: DBMS or(and) data files (structured/formatted) or (and) object files (non-structured - jpeg, shape-files, png, etc. or software applications); local system administrator at the data centre installs the Data Provider software which provides access of ODP technology services to all local data irrespective of number of local data systems and their storage types. The ratio one data centre - one (or many) local data systems - one ODP data source is supported. local system administrator provides the registration of a data source (identifier, physical addresses and etc.) using the Data Provider web-form; local system administrator provides the technical support of the Data Provider operations in contact with ODP developers if it is needed. 2.2. The information resource and resource instances The information resource is a data (metadata) set (or software application) submitted by the data centre to ODP distributed data system. 2.2.1. To provide effective search, access and delivery processes the resource must contain data with homogeneous properties according to the following rules: resource must contain data of one category: observation data, derived (climatic) data, forecast data, analysis data, etc.; resource must contain data with unique space-time resolution (irregular observation data, fixed point data, monthly data, etc.); resource must belong to one type of local system storage (DBMS or data files or object data files). To reflect all of the above-mentioned data properties there are appropriate elements in the resource description structure (E2ESearchMD). 2.2.2. The resource can be presented as a single unit (called a single resource) or as a set of resource instances (called a serial resource) reflecting the local data granularity of the local data system. 2.3. The resources registration The resources registration is provided by the local data system administrator and the registration process includes two stages: resource design; resource registration. 2.3.1. Resource design The resource design is carried out taking in account of the specifics of the local data system (contents, data ordering, coding used, etc.) including: - decisions concerning the resource type (single or serial), resource content and granularity for splitting the resource into resource instances (for serial resource) are accepted for the local data system being made available to the ODP distributed data system; decisions about needs for conversions of the local codes to system codes are determined. 2.3.2. Resource registration This is provided through the special web-form, given by Data Provider and having the following sections: Identification and reference information; Time, space and vertical data extent; Data content and granularity; Processing and quality level; Platforms, instruments and methods Data distribution information. The single resource registration is carried out once through the Data Provider web-forms. The serial resource registration is fulfilled in the following way: local data administrator provides the root resource description using web-forms; for local system storage types: DBMS and data files the descriptions of the resource instances are created automatically by the Data Provider service based on the granularity of the data described in the section Data content and granularity of the root resource description; for local system storage type: object files the description of resource instances are provided in a similar fashion to the root resource description, i.e. using web-forms of Data Provider service. The Data Provider implements: automatic use of the controlled code-lists and dictionaries through the interaction with the Integration Server which provides the management and use of the metadata; quality control of the resource (or resource instances) elements as described in metadata input. Physically the resource descriptions are stored in XML-files in the format of the E2ESeachMD record. Depending on the chosen granularity scheme the resource description can be presented in one or several E2ESeachMD records: a single resource description is stored in one E2ESearchMD record; serial resource descriptions are stored in several E2ESearchMD records (N+1, where N is the number of resource instances). During resource registration the description of the resource interface with the local data system is also provided: DBMS type - assignment of table and attribute names of the database which will be used for resource generation; data file types - the description of data file elements which will be submitted in a resource (in current version E2E only with flat (non-hierarchical) structures). The description of the resource interface includes also filling a cross-mapping table for conversions of local codes to system codes. All of the above-mentioned operations for setting the resource interfaces are provided by special Data Provider web-form and services. 2.3.3 Operational provision of resource descriptions It is important to have resource descriptions available on-line (if local data system was changed) to permit data search and access. This process is dependent on the system storage type: DBMS/data files - resource (single or serial) metadata updating is provided automatically by the data centre according to a schedule set by the local data administrator or on a harvesting request from the Integration Server; object files resource metadata (or adding new resource instances) is supported by the local data administrator who must update the resource (resource instances) descriptions using the Data Provider web-forms. The update schedule (one time in minute, hour, day, month, year and etc.) of the resource description is controlled by local data administrator by means of assessment of the appropriate web-form element in the process of resource registration. The specification of this element depends on the update frequency of the local data system and it can be changed at any time. 3. Using of remote Data Provider software (v. 1.5.5) Data centre which cant install Data Provider can use another Data Provider software in remote mode. In this case the owner of the Data Provider must create new user with login and password and provide this information to remote user. This functionality provides mechanisms of registering data sources placed at HTTP or FTP servers available for the remote Data Provider. Data source should be represented as structured files with separator ; or , and be available for downloading for remote Data Provider. Data centre can use remote Data Provider for providing catalogs of data to ODP distributed system. There are several requirements for representing this type of interaction: one inventory file corresponds to one dataset description; inventory file should be registered at Data Provider and contain information about data files which a located at FTP(HTTP) server; inventory file must contains URLs to data files; data files should be placed at the same FTP (HTTP) server as inventory file. Inventory file is used during creating dataset description using Data Provider software. You should make one-to-one mapping between system data elements and fields of inventory file. After this procedure Data Provider will use the mapping to register metadata description for data files automatically. If new data files will be put to FTP (HTTP) server, Data Provider will download and analyze inventory file by scheduler mechanism and register new metadata in the ODP distributed data system. The number of parameters inside inventory file can be different for different dataset descriptions. It is allowed to specify parameters in any order inside inventory file. The position of each parameter will be specified at mapping section of dataset description. 