As XML is becoming so central to both publishing content and the operation of publishing endeavors (including metadata wrapped around images and video), a strong XML repository is the keystone of the CCR.
VML prefers to work with the MarkLogic Server, a discussion of which is provided here or visit MarkLogic's website. MarkLogic Server is a native XML engine that provides for comprehensive storage, discovery, transformation and presentation of XML.
A well-designed XML repository deals with unstructured content and provide the Enrichment Sequence. Further, MarkLogic indexes content for discovery as it is flowed into the repository and when it is modified. As such, and in general, the method of storing content within the repository (e.g. the folder structure or hierarchy) is purely for the visualization of the administrators of the system – the repository itself searches against the entirety of the content set each time it is queried.
Primarily in the form of PDF should be ingested into the XML repository to immediately make them searchable by page.
Packaged content, primarily ePub, should be ingested in the XML repository to make them immediately searchable. ePub is a zipped-up collection of an XML manifest and XHTML documents (usually by chapter of the book) with the XHTML calling out to associated binary (images, video, etc) content. When ePub is ingested into the XML repository, it must be un-zipped to make the XHTML documents accessible to the repository indexer.
The original packaged content container remains valuable for delivery to devices in Uncontrolled Environments.
Obviously Publisher XML is stored directly in the XML repository and instantly searchable.
Less obvious is a transformation to a flattened packaging XML (e.g. ePub as per the previous section) supported by the system’s reader subsystem to facilitate delivery through the Controlled Environment reader. This is a straight-forward XSLT.
Any associated binary files (images, videos, etc) are stored in the non-XML repository and pointed to from the Content Framework XML.
Category entries, if enabled, are content-level enhancements that are stored in the XML repository as part of either the Publisher or Framework XML.
User Generated Content
VisualML Websites include the option for user-generated content via a WYSWIG editor natively within the toolset as well as common file – type upload (RTF, TXT, Word docx, etc). All of these types are natively stored and indexed in the XML repository.
Binary media includes basically everything media-oriented that is not natively index-able/discoverable. This includes images, video, audio, Flash, etc. It is expected that the vast majority of such items stored in the non-XML repository will be pointed to (called out by) XML content stored, indexed and discoverable in the XML repository.
For items that include metadata (such as Adobe XMP-enhanced images) or have metadata available (closed captioning text for videos), there is an associated XML file stored in the XML repository to support discovery of these items.
Although rich media, such as audio and video, are becoming increasingly important in the delivery of information, the associated files, particularly video files, can be very large. This size creates challenges, including time to download and, primarily, storage space on the display device for offline viewing. If the display device is a computer, storage space is not likely to be an issue. However, tablets, and particularly Smartphones, are limited in their storage, and cannot easily accommodate large files, or may not be able to accommodate them at all. Additionally, there is a standards war with respect to video, such that Apple devices, the iPhone and the iPad, will not display Flash video files. Flash is a popular method used to decrease the size of video files. Leveraging the Content Framework concept discussed in the XML Repository section, a publisher can choose to solve this issue by storing PDF, Packaged and Publisher XML included rich media in a central location and enable it to be streamed for online viewing. The download overhead would only be incurred when provision devices that do not support streaming or for offline viewing. This capability is designed into the system.
Encryption is not generally a real-time function. Typically when a new content object is ingested into the storage system and will be expected for use in a Digital Rights Management (“DRM”) scheme, there is an encryption step that occurs. Subsequent packaging of the encrypted content to define the actual parameters of the DRM is done as a separate, real-time step when the content is being provisioned for delivery. The non-XML repository includes a function for storage of pre-encrypted files to support the Content DRM subsystem.
Adding and Maintaining Non-XML Content
Objects are added and maintained using either EVN’s Digital Asset Collection (“DAC”) tool, FileCabinet, which is further explained later in this section, or via a publisher’s existing Digital Asset Management (“DAM”) tools and may be stored in any storage that is URI-addressable (such as a Content Arrary Storage system) or in a cloud-based storage system such as Amazon ECS.