[RFC 004] Operation Review Comment 15
Organization: NCAR
Review of DAP2 operational experience:
NASA's Earth Science Data Systems Standards Process Group (SPG) is considering the Data Access Protocol, Version 2, (DAP2) for adoption as a community standard. This is the second review of DAP2, this one focusing on its operation experience.� The questions below are provided to guide your feedback. You only need to answer questions applicable to you. Please send comments to ese�rfc�004@spg.gsfc.nasa.gov.�
- Describe in a sentence or two your overall operational experience related to DAP2 (e.g., scientific analysis; science users, server operation; database management; or data translation, etc).� What kinds of DAP2 systems do you have experience with?� (e.g., OPeNDAP netCDF server, OPeNDAP HDF server, OPeNDAP Matlab Toolkit, Ferret, GrADS, and etc).
I work developing OPeNDAP servers to standarize (thus simplify) access to data sources in multiple formats. I will say I am in the database management and server operation side. I have written OPeNDAP server for Web and Grid Environments as well as OPeNDAP data handlers for FITS, netCDF and CEDAR data formats,
- How long have you been using DAP2 operationally?
We have been using it for about 7 years.
- What types of applications do you use DAP2 systems for?� Are the DAP2 systems applicable to your applications? (e.g., Do they work well with the data types and data manipulations in your application?)��
Data distribution from the server side and in the client data usage/visualization (using IDL)
- How many of your applications use DAP2?
i. Total number of applications
3 large data distribution system including the Earth System Grid (http://www.earthsystemgrid.org) and CEDAR (http://cedarweb.hao.ucar.edu)
ii. Percentage of applications
Not applicable since my focus is to use OPeNDAP to improve data access in NCAR/HAO.
- Why do you choose to use DAP2 systems over other systems for your applications?
OPeNDAP give us standard access to data regardless of the format in which it is storaged.
- What alternative technologies did you consider?
We consider filters to translate data to one unique format but only OPeNDAP was comprehensive enough to allow us to represent flat data (for example netCDF) hierarchical data (CEDAR) , images� (FITS)
- Are the DAP2 systems easy to use?� (e.g., Is it hard to learn how to use DAP2 systems?)
It took a while to get a grasp but by now we can use it and develop with it with the eyes closed
It actually increased but that is because since we started using it, our users base has grown. In terms of bugs, failures, sometimes it is hard to clean the system from problems but its complexity is proportional to its usefulness. It has reduced the amount of human intervention in preparing special data request.
- Does the performance of the DAP2 systems you have experienced meet your requirements?� (e.g., Does it take a long time to access data in DAP2 systems?)
It scales well, specially because we tailor ourselves our servers.
- Have your bandwidth issues changed since you have been using DAP2 systems? (e.g., Have you seen increased bandwidth requirements because of increased data access or DAP2 overhead, decreased bandwidth requirements because of reduced data volume from subsetting, no impact on bandwidth because of low usage?)
This is a hard question, people know can ask many queries since it is very easy to just hit the server with a request and then dump the output (just browsing) however, the subsetting makes each query smaller. We have place certain controls over maximun size of OPeNDAP objects since we allow for data to be �aggregated� online, in order to avoid overrunning the system.
- What operational challenges do the DAP2 systems present? (e.g., Does it require advanced processing power, large amounts of memory, complex configuration, etc.? Are the systems easy to deploy and maintain?)
It requires more CPU but as I mention before that is proportional to the work is doing. It definitively requires more memory which is not a big deal in the type of system we get today (The laptop where I am has 1 Gb) imagine the server. The configuration is custom (that is, we use OPeNDAP more as a framework than a off the shell application).
- How has the use of DAP2 affected your systems administration workload?
No, it is an add on to the web server (in the web based system and its is a Unix daemon in the Grid computing environment). It runs well thus once it is setup it just works.
- How well do the DAP2 systems scale to large numbers of simultaneous users, or to large datasets?
Good question. The off the shell product did not scale well, that is why we went ahead and use it as a framework within a Grid environment.
- How much data does your DAP2 system handle in a typical month?
I have access to only the CEDAR logs, the other are not available. I will need to make some statistics out of the log and get back to you on this later. I must mention the usage is not uniform, in CEDAR they go a lot after data campaigns and atmospheric events thus one week can get heavy followed by another of very seldom access (may be just a person writing a paper or perhaps a new student looking around)
i. Total data volume
ii. Total number of data files
iii. Percentage of data volume
iv. Percentage of data files
- Can you provide information on user statistics of your DAP2 systems?�
Just for CEDAR. I will need to look at the logs lately and this will take longer. From experience over the years there can be an average of 10 to 15 different users per month moving from just few bytes to many, many datasets. The community is currently at about 350 people.
- How many users does your DAP2 system handle in a typical month?
i. Total number of DAP2 users
ii. Percentage of your overall users
- Of the feedback you have received from your users on DAP2, what percentage is positive and what is negative?
���� People do not really know they are using it. It is a middleware element.
- How have the user statistics changed over time?
They have grown, I will not say OPeNDAP is everything why it keeps grown since it is a part of the overall system, however it is a very important part of the overall deployment.