Learn how Scrivito CMS can help you deliver amazing digital experiences
See Scrivito CMS in action

Exporting CMS Content

Exporting CMS Content

Retrieving all CMS data

The code below represents a generic retrieval script able to handle any kind of data stored in a Scrivito CMS. Don't be intimidated by the sheer amount of code, mind you, it's a fully fledged exporter.

EXPORT_PATH = Rails.root.join('tmp/scrivito_export') WORKSPACE_TITLE = "Export Workspace (Do not Edit)" def retry_once begin yield rescue yield end end # Export an array of CMS objects to the file 'tmp/scrivito_export/objs.json'. def export(objs) json_objs = objs.map do |obj| object_to_json(obj) end File.write(File.join(EXPORT_PATH, 'objs.json'), JSON.pretty_generate(json_objs)) end # Convert a given CMS object or widget to a nested hash. def object_to_json(obj) json_hash = extract_custom_attrs(obj) # Add the internal attributes to the hash. json_hash[:id] = obj.id %w(_path _obj_class _permalink).each do |attr| json_hash[attr] = obj[attr] if obj[attr] end if value = obj[:_last_changed] json_hash['_last_changed'] = {date: value.iso8601} end json_hash end def extract_custom_attrs(obj) # Iterate over the attributes of a given CMS object or widget and convert each one # to a hash. obj.class.attribute_definitions.inject({}) do |json_hash, attr_definition| attr_name = attr_definition.name value = obj[attr_name] if value.present? json_hash[attr_name] = attr_to_json(value, attr_definition) end json_hash end end def attr_to_json(value, definition) case definition.type when "widgetlist" # For a widget list, each widget is recursively extraced using 'object_to_json'. {widgets: value.map { |widget| object_to_json(widget) }} when "binary" # Binaries are downloaded and saved locally. {file: download_binary(value)} when "reference" {reference: value.id} when "referencelist" {references: value.map(&:id)} when "link" {link: link_to_json(value)} when "linklist" {links: value.map { |link| link_to_json(link) }} when "date" {date: value.iso8601} else value end end def link_to_json(link) target_attr = if link.internal? {id: link.obj.id} else {url: link.url} end target_attr.merge( title: link.title, query: link.query, fragment: link.fragment, target: link.target ) end def download_binary(binary) file_name = "#{File.dirname(binary.id).parameterize}-#{File.basename(binary.id)}" file_path = File.join(EXPORT_PATH, "files", file_name) uri = URI(binary.url) retry_once do Net::HTTP.start(uri.host, uri.port, use_ssl: true) do |http| request = Net::HTTP::Get.new(uri) http.request(request) do |response| open(file_path, 'wb') do |io| response.read_body do |chunk| io.write chunk end end end end end file_path end FileUtils.rm_rf(EXPORT_PATH) FileUtils.mkdir_p(File.join(EXPORT_PATH, "/files")) # A working copy is created first to freeze the current content. This way, editors may # still publish changes without the new content affecting the export. Scrivito::Workspace.find_by_title(WORKSPACE_TITLE).try(:destroy) workspace = Scrivito::Workspace.create(title: WORKSPACE_TITLE) Scrivito::Workspace.current = workspace # Use Obj.all to fetch the CMS objects in batches. export(Obj.all) workspace.destroy

To run the script, first save it to a file at the root of a Scrivito project, export.rb, then execute rails runner export.rb.

Most of the code above structures the exported data. However, the export data format produced by this script probably won't be understood by the target system, so the script needs to be adapted to make it generate export data that accomodates the new system.

Adapting the output format

The script doesn't make any assumptions with respect to the data contained in a Scrivito CMS. However, if you know how your content is structured, you can optimize the extracted data according to your needs. For example, it is common to have widgets that embed images. Such widgets usually utilize a reference attribute for storing the binary CMS object representing the image. The script, as given above, exports an ImageWidget as:

{ _obj_class: "ImageWidget", _id: "bar", source: "foo" }

The referenced CMS image object whose ID is foo is exported as:

{ _obj_class: "Image", _id: "foo", blob: "/path/to/the/image.png" }

For importing the output later on, it might be useful to include the path of the downloaded image directly in the ImageWidget. To achieve this, you could adapt the object_to_json method so that it processes image widgets separately, differentiating them from CMS objects of other types. For example:

def image_widget_json(widget) json_hash = { _obj_class: "ImageWidget", _id: widget.id } json_hash[:image] = download_binary(widget.source.blob) if widget.source json_hash end

This method calls download_binary for every widget that references an image. Thus, if an image is referenced more than once, the method is called for each reference, downloading the binary data more often than required. Therefore, it's a good idea to check whether the file already exists and download it only once:

def download_binary(binary) file_path = "tmp/scrivito_export/files/#{binary.id.parameterize}" IO.copy_stream(open(binary.url), file_path) unless File.exist?(file_path) file_path end

After applying these changes, the exported data will look like this:

{ _obj_class: "ImageWidget", _id: "bar", image: "/path/to/the/image.png" }

Further adjustments to the script might be required to store the retrieved content in a format the target system is able to process.

Scrivito's extensive API lets you make full of use of Ruby to extract exactly the data you need and format it according to the requirements. You could even render all widgetlist attributes of a CMS object into an HTML string, if that's what you're aiming at.

Processing large amounts of content

Although the script above has not been optimized for speed, it still exports larger amounts of content in an acceptable period of time. Exporting a medium-sized website with 800 CMS objects of which 500 are binaries takes approximately seven minutes. So, for most CMS instances, it should be a reasonable approach to load all data into memory and store it locally in one go. The error handling of the script is also minimal, as it just retries once when fetching a binary piece of content. It is, for most use cases, good enough to simply restart the script if items weren't available in time.

However, if the amount of content stored in a CMS instance gets very large, this approach is not advisable. The formatted export data may not fit into memory, and just restarting the script on an error is not an option if the export takes several hours. In a special case like this, the script needs to be adapted to include some error handling.

The first step in adapting the script is to ensure that after exporting an individual CMS object, the current state is stored. For this, the script could generate an individual file for each CMS object, meaning that the exported data doesn't have to be kept in memory, which solves the first problem. For very large amounts of content, it may even be suitable to store the data in a database or export the content directly to the target system.

The next step is to ensure that the data is in a stable order. This is easy, just replace export(Obj.all) with export(Obj.all.order('id')). This allows the script to continue the export starting at the last exported CMS object in case of an error. If the most recently exported ID is saved to last_exported_id, the code to start the export could be adapted like so:

objs_to_export = Obj.all.order('id') objs_to_export = obj_to_export.drop_while { |obj| obj.id <= last_exported_id } export(objs_to_export)

That's it! Now the script is able to handle even the largest Scrivito CMS instances.

If questions arise while further adapting the export script or writing your own one, don't hesitate to contact the Scrivito support team. They're just one click away.