Search API Examples

This article describes how to use the ObjSearchEnumerator API in order to programmatically search the CMS content for particular objects.

General remarks

By default, all CMS content is indexed automatically. This includes the attributes of the CMS objects themselves as well as the widgets and their attributes contained in widgetlist attributes of the objects.

Thus, searching a particular attribute, e.g. title, causes all title attributes to be searched, not only the one that is part of the CMS object, but also the ones that were given to widgets through their object class.

For example, the following query spells out to: Find all objects whose text field contains "words":

Copy
Obj.where(:text, :contains, "words")

If the field name, text, is used in CMS objects as well as their widgets, the above query reads like this: Find all objects where the text field of the object itself, or the text field of one of its widgets contains "words." Thus, the conditions of a search query are met if the object or at least one of its widgets meets them.

It is advisable to keep this in mind when adding attributes to object classes or when creating widgets, particularly if you use field-specific searches in your web application.

Finding the latest blog posts

The blog posts on scrivito.com have an object class named BlogPost. They are supposed to live in or anywhere below a folder that has the Blog object class, e.g. /en/blog. The object names of the blog posts are irrelevant for this tutorial, even though in our case they contain the date of the post and a short name. For example, this particular post has the path /en/blog/2013/20131025-rc-cache in the CMS. On the blog page, all blog posts are reverse-sorted by their published_at custom attribute.

The BlogController fetches the latest posts for the view (@obj is the blog object, i.e. the blog page):


Copy
class BlogController < CmsController
  def index
    @posts = @obj.latest_posts(10, params[:page].to_i)
    respond_to do |format|
      format.html { @posts }
      format.rss { @posts }
    end
  end
end

The following code is part of the Blog model. An instance of this Blog object class is meant to be the folder that contains the blog posts, as described above.


Copy
class Blog < Obj
  […]

  def latest_posts(blog_posts_per_page, page_index)
    BlogPost.all.
      and(:_path, :starts_with, path + "/").
      order(published_at: :desc).
      offset(page_index * blog_posts_per_page).
      batch_size(blog_posts_per_page).
      take(blog_posts_per_page)
  end
end

BlogPost.all returns an ObjSearchEnumerator:

Copy
#<Scrivito::ObjSearchEnumerator:0x007fa3dbf4fd18
 @initial_offset=0,
 @options={},
 @query=[{:field=>:_obj_class, :operator=>:equal, :value=>"BlogPost"}]>

By method chaining, this enumerator is further refined. All the methods, including batch_size, chained together return:

Copy
#<Scrivito::ObjSearchEnumerator:0x007fe5b660d198
 @initial_offset=0,
 @options={:sort_by=>:published_at, :sort_order=>:desc, :size=>10},
 @query=[
    {:field=>:_obj_class, :operator=>:equal, :value=>"BlogPost"},
    {:field=>:_path, :operator=>:prefix, :value=>"/en/blog/"}
  ]>

The last chained method, take, behaves like the standard, Enumerable#take. It limits the result set to the first n items, in our case blog_posts_per_page.

The path prefix constraint and(:_path, :starts_with, path + "/") is there to list only blog posts that live in the subtree of this blog. So we could have multiple blogs on our websites, each of them having blog posts.

The view finally walks over the blog posts in the usual manner:

Copy
<% @posts.each do |post| %>
  # render the BlogPost
<% end %>

Previous/next navigation for blog posts

Based on the previously mentioned blog setup, we are going to add a navigation to every blog post. One link points to the previous blog post, the other one to the next blog post.

Copy
class BlogPost < Obj
  […]

  def next_post
    @next_post ||= BlogPost.all.and(:published_at, :is_greater_than, published_at.utc.to_iso).order(:published_at).take(1).first
  end

  def prev_post
    @prev_post ||= BlogPost.all.and(:published_at, :is_less_than, published_at.utc.to_iso).order(published_at: :desc).take(1).first
  end
end

As before, all returns an ObjSearchEnumerator, this time one that filters all objects of the BlogPost class.

In next_post, the search finds all the posts — sorted by published_at — whose date is greater than the date of the current post. From the resulting list of posts, the first one is taken. prev_post, respectively, reverse-orders the result set, searches for all posts whose published_at date is less than the date of the current post and again takes the first one. This approach utilizes the sort order to determine the next and the previous posts.

published_at.utc.to_iso converts the published_at date and time stamp (e.g. Tue, 26 Feb 2013 13:21:24 CET +01:00), which has the ActiveSupport::TimeWithZone format, to UTC (2013-02-26 12:21:24 UTC), which is a Time object, and then to the CMS ISO format (20130226122124), which is a String. The search engine compares these ISO date string values.

Let's refactor the code. As you can see, it's not very DRY.

Copy
  def next_post
    @next_post ||= search_post {|enum| enum.and(:published_at, :is_greater_than, published_at_iso)}
  end

  def prev_post
    @prev_post ||= search_post {|enum| enum.and(:published_at, :is_less_than, published_at_iso).order(published_at: :desc)}
  end

  private

  def search_post
    enum = BlogPost.all.order(:published_at)
    enum = yield enum
    enum.take(1).first
  end

  def published_at_iso
    published_at.utc.to_iso
  end

Here, we factored out a method, search_post, that handles all the common search logic, and the published_at_iso method that converts the date to a format suitable for searching.

The view that renders the navigation looks like this, this time in HAML syntax:

Copy
%ul.pager
  - if @obj.next_post
    %li.previous
      = link_to(scrivito_path(@obj.next_post)) do
        %strong= @obj.next_post.title

  - if @obj.prev_post
    %li.next
      = link_to(scrivito_path(@obj.prev_post)) do
        %strong= @obj.prev_post.title

The search page

The search page uses a custom search request class, SearchRequest, to perform the search. The SearchPageController then calls @hits = SearchRequest.new(@query, offset: 0, limit: 100).fetch_hits. This keeps the controller free from the search logic.

This is our newly created file app/models/search_request.rb:

Copy
class SearchRequest
  def initialize(query_string, options = {})
    @query_string = query_string
    @offset = options[:offset] || 0
    @limit = options[:limit] || 10
  end

  def fetch_hits
    search_results.take(@limit)
  end

  private

  def search_results
    now = Time.zone.now.to_iso
    search_enum = Obj.where(:published_at, :is_less_than, now)
        .and_not(:published_at, :is_less_than, now)
        .and_not(:_obj_class, :equals, 'Image')
        .order(published_at: :desc)
        .offset(@offset)
        .batch_size(@limit)

    @query_string.strip.split(/[\s]+/).each do |word|
      search_enum.and(:*, :contains_prefix, word)
    end

    search_enum
  end
end

The actual search is performed by a private method, search_results. Obj.where returns an ObjSearchEnumerator. In the previous example we used BlogPost.all to find all objects of the BlogPost class. MyObjClass.all is in fact a shortcut for Obj.where(:_obj_class, :equals, 'MyObjClass'). So, the first method call for creating an ObjSearchEnumerator is always a call to Obj.where

Please note that the .all shortcut only works for Obj and not for widget types.

Again, by chaining method calls on the search enumerator we add constraints and modifiers to the search. Finally, the words from @query_string are added to the search enumerator. fetch_hits runs the search by accessing the enumerator via take. Other accessors like size, to_a, each, etc. will also trigger the search.