https://github.com/impetus-opensource/Kundera
Object persistence (ORM) for BigTable databases
SQL-like language for querying data stored in flat files on HDFS; user-defined interpreters serialize/deserialize data to/from files
Presentation on facebook's use of Hive:
http://www.slideshare.net/zshao/hive-data-warehousing-analytics-on-hadoop-presentation/
http://research.google.com/pubs/pub36632.html
Google system for query/analysis of large-scale datasets
Query language for RDF
http://www.w3.org/TR/rdf-sparql-query/
RDF is a web ontology language developed for the semantic web
Hadoop Distributed Filesystem (HDFS)
http://hadoop.apache.org/docs/r0.17.1/hdfs_design.html
Distributes data across commodity cluster nodes; handles replication and failover; implemented in Java
http://wiki.lustre.org/index.php/Main_Page
Parallel distributed filesystem, often used in HPC systems; common on the Top 500 list; used as a central filesystem on dedicated machines (unlike Hadoop clusters in which compute nodes also serve as storage nodes)
HBase and Accumulo are implementations of Google's BigTable paper (2006):
http://research.google.com/archive/bigtable-osdi06.pdf
All data stored as triples, i.e. (row_key, column_key, value)
Rows are distributed across tablet servers and sorted, assuring fast lookup of any given row key; do-it-yourself relationships; no/limited transaction support
Google's new (2012) distributed datastore:
http://research.google.com/archive/spanner.html
Handles replication across physically disparate data centers (on a per-application basis); presents SQL-like query model; relational data model with transaction support; complex timing/synchronization scheme
http://www.scidb.org/about/publications.php
Stores arrays of tuples for scientific applications; supports basic and user-defined operations/queries distributed across cluster
The Hadoop Database:
Built on HDFS; implements BigTable
http://thinkaurelius.github.com/titan/
Graph data model and operators built on top of HBase/Cassandra