Data Virtualization
CaosDB shall allow search on data stored in other sources.
-
Research on how such a data virtualization can be achieved; 2 people (E) -
decide for the best two options and outline how it can be done -
Discuss options in ST -
Roadmap for implementation
How can rights be perserved?
FOSS Software for Data Virtualization
- Apache Drill: Ex Google, SQL <-> * (vivid FOSS project, seems well suited)
- Trino (fka PrestoSQL): Ex Facebook SQL <-> * (vivid FOSS project, focus on large scale distributed system?)
- PrestoDB: Ex Facebook, SQL <-> * (use trino instead?)
- JBoss Enterprise Data Services Platform: Part of JBoss Enterprise SOA Platform. SQL/XQuery <-> * (use teeid instead?)
- OpenLink Virtuoso Universal Server: Merger of Kubl and OpenLink, SQL/SPARQL <-> * (looks oldfashioned; documentation in bad shape, little contribution in FOSS repo)
- Teiid (FOSS without community contribution because RedHat does it?)
We should look which data sources (*) are actually supported in each of the systems
Two Architectural Ideas
Server As Delegator
graph TD
A[CaosDB Server] --- B[Legacy MySQL Backend]
A --- C[Virtualization]
C --- D[SQL]
C --- E[NoSQL]
C --- F[RDF]
This allows easy prototyping and will be used for the next step.
Virtualization Layer replaces Legacy Backend
graph TD
A[CaosDB Server]
A --- C[Virtualization]
C --- B[Legacy MySQL Backend]
C --- D[SQL]
C --- E[NoSQL]
C --- F[RDF]
Due to differences between CQL and SQL we would loose significant capabilities here.