we are in the process of building a portal solution based on gatein and some exo components. eXo JCR as storage for documents etc. But we are not quite sure about the use of workspaces. How many workspaces can JCR handle and does it make sense to use many workspaces. Our dilemma is to choose from one of the following setups:
(1) Many workspaces where project data is separated from each other. This will from an infrastructurel point of view look good. We will be able to link documents easily from within a workspace (project), but linking documents between projects will not be as easy. Search will be limited to only this project. It will be a challenge to do a search across multiple projects because we will have separate searchresult lists we will have to merge and present as one list (if we want to do multi project search). Security is easy to deal with because we can set security on workspace level and know that only people with workspace access will ever be able to access the data. Replication from a remotely deployed portal (ie. a portal deployed in Panama) can be setup to replicate on workspace level to headoffice (Denmark) making it easy to backup data in headoffice. Is replication 2 way?. Performance should be easy to deal with because we will be able to move a number of projects to a separate JCR server and connect to this via a separate datasource. Creating a new dynamic workspace from code is not documented – or we cannot find documentation for it - do we have to create workspaces in xml files and restart portal?.
(2) One workspace where all project data is stored in the same workspace. Each project will be defined in it's own node making cross project search easy to do and present the result as one list. Document linking between projects will be easy to do. Security can be a bit more challenging if we have documents attached to a project node and we will most likely have to code a bit more in our data-access layer. Replication is not possible on a per project basis because replication works on workspace level and the result would be a replication of all project data. Performance can be a challenge in the future when the amount of data gets bigger in the next couple of years because everything is in the same database.
This is the documentation used - but can't seem to find a best practice when it comes to jcr.
Ref:
http://wiki.exoplatform.org/xwiki/bin/view/JCR/Asynchronous+Replication
http://wiki.exoplatform.org/xwiki/bin/view/JCR/Access+Control
Any comments or suggestions are welcomed :-)
Thanks in advance.
/Kåre Pedersen