Avoid using WordPress and Apache disk_cache extension together … unless you understand what is going on and you now how to fix the problem.

Using both together led one of the sites I have worked on (http://www.gdium.com) to display RSS feeds instead of posts and pages. After a forced refresh of the page (CTRL+R in Firefox) the correct page was displayed. This have been driving me crazy for several days until I actually remembered I had enabled Apache disk caching for another website on the same server.

After disabling disk_cache I did a search and I found someone that was having the same problem which confirmed what I thought.

Today I have spent a few minutes trying to understand how Drupal renders a Not found page when accessing a node that does not exist. After going through the node.module code I couldn’t find anything.

This is one of the most frustrating things with Drupal, you need to know pretty well how it works internally to understand what is happening and where to look because there is some magic here and there.

It turns out that the menu system is one of the places where some of this magic is implemented to show a Not found page when an object cannot be retrieved from the database. Basically, the menu system implements a way for automatically load objects from the database to display or edit. If the object cannot be retrieved then you get the error. Let’s see how this works.

When you write a module you register the mapping between paths and function calls using the hook_menu function. For instance, for the node module you have

function node_module() {
...
  $items['node/%node'] = array(
    'title callback' => 'node_page_title',
    'title arguments' => array(1),
    'page callback' => 'node_page_view',
    'page arguments' => array(1),
    'access callback' => 'node_access',
    'access arguments' => array('view', 1),
    'type' => MENU_CALLBACK);
  ...
}

This is adding a mapping for the path node/%node to the node_page_view function. The arguments that are passed are defined by the array page_arguments. As you see the array contains one single element: 1.

1 relates to the position 1 in the array generated from the path. In our case:

  • 0 -> node,
  • 1-> %node

% is used as a wildcard, so it will match anything  (except special characters like / that is used as separator) after node/. When instead of  % you use %something some magic is performed: a call to something_load(%something) is performed. Imagine that you tap into the url http://www.example.com/node/45, then instead of passing 45 to the node_page_view function, it will pass node_load(45). Amazing, isn’t it? Furthermore, if node_load(45) fails to load an object from the database because there is no node with a nid (the primary key of the node table) of 45 a Not found page is returned. That is the kind of tricks that are difficult to spot but can increase productivity.

In case you need to pass other parameters to the something_load function, you can add a ‘load arguments’ entry to the hash for a path with the extra arguments to the load function as an array.

More information in the Drupal documentation.

Handling Drupal forms submission is dead simple. You basically write a three functions: a form generator, a form validator and a form submitter.

The form generator is called everytime the form needs to be built and it can be used to populate fields default values or the values previously entered by the user.

In the form generator you can specify the functions that should be called to validate the form and to submit it (to actually do something with the form values, like adding a new row to a table for instance):

$form[‘#validate’][] = ‘car_edit_validate’;
$form[‘#submit’][] = ‘car_edit_submit’;

Where car_edit_validate is the name of the function that will be called to do the validation and car_edit_submit is the function that will be called after the form validates.

Every time you submit the form the validate function is called first and if the validation does not encounter any errors the submit function is called.

The problem I had is that I wanted to return an error and keep the form values if  an error occurred in my submit function. By default, after the submit function is called an empty form will be rendered and user entered values will be lost even if you add an error to the form with form_set_error. The error is displayed but this does not prevent the form values from being lost. If you want to keep the form values you must also add this line:

$form_state[‘redirect’] = FALSE;

Example:

function car_edit_submit($form, &$form_state) {
  global $user;
  $car = (object)$form_state['values'];
  $car->uid = $user->uid;
  if(car_save($car)) {
    drupal_set_message(t('car successfully saved')));
    drupal_goto('car/'.$car->cid);
  } else {
    form_set_error('form', t('The car could not be saved, please try again'));
    $form_state['redirect'] = FALSE; //To prevent Drupal from cleaning the form
  }
}

YAML vs Marshal performance

January 29, 2008

A colleague of mine has built a quite sophisticated mechanism that allows for components to automatically reload if necessary in case a user interaction requires it. In order to do that though, it needs to store a significant amount of information for each reloadable component in a page. This information contains the different parameters needed to reload each component. This context information is associated to a particular browser window and is stored in a hash. This hash is persisted using the serialize method provided Rails. This method uses YAML for serialization.

We are currently working to improve the performance, and thanks to the excellent ruby-prof profiler I detected that an important amount of time was spent serializing the hash before persisting it. I decided to look for alternatives and the first one I came across was Marshal.dump.

I wrote a simple test case:

#!/usr/bin/ruby

require ‘yaml’

hash = {:key1 => ‘value1’, :key2 => ‘value2’, :key3 => ‘value3’, :key4 => {:key41 => ‘value41’, :key41 => ‘value42’}}

iterations = 10000

serialized_hash = nil

start = Time.now
1.upto(iterations) { serialized_hash = Marshal.dump(hash) }
puts “Marshal hash: #{Time.now – start} seconds”

start = Time.now
1.upto(iterations) { reloaded_hash = Marshal.load(serialized_hash) }
puts “Reload marshalled hash: #{Time.now – start} seconds”

start = Time.now
1.upto(iterations) { serialized_hash = hash.to_yaml }
puts “YAMLize hash: #{Time.now – start} seconds”

start = Time.now
1.upto(iterations) { reloaded_hash = YAML::load(serialized_hash) }
puts “Reload YAMLlized hash: #{Time.now – start} seconds”

The results show that YAML is awfully slow. I will not put here the complete report, but here are the timings:

Marshal hash: 0.13829 seconds
Reload marshalled hash: 0.184913 seconds
YAMLize hash: 4.792248 seconds
Reload YAMLlized hash: 1.046568 seconds

In my tests, YAML is 34.65 times slower in serialization and 5.66 times slower in unserialization.

So be careful when serializing big objects with YAML as the performance impact can be significant .

Ever wondered how to easily add image attachment support to your Rails application? Then you should definitely give attachment_fu a go, a very easy to use Rails plugin by Rick Olson.

(Note: This article would not have been possible without Mike Clark’s excellent attachment_fu tutorial.)

Step 1: Installation (on Ubuntu 6.10)

Installing the plugin is as easy as it gets:
script/plugin install http://svn.techno-weenie.net/projects/plugins/attachment_fu/

In order to do some image processing you need to install one of the following packages as well:

  • ImageScience
  • RMagick
  • minimagick

ImageScience is the simplest of all of them only allowing to resize images. It depends on FreeImage and RubyInline.
This is the one I have ended up using as it is enough for me.
It is not available on Ubuntu repositories, so I had to install it manually following the instructions in their website:

sudo gem install -y image_science

which also installs RubyInline, hoe and rubyforge gems.

Installing FreeImage required me to install cvs (to check out the sources) and g++ first:

sudo apt-get install cvs g++

cvs -z3 -d:pserver:anonymous@freeimage.cvs.sourceforge.net:/cvsroot/freeimage login (just type enter when asked for a password)
cvs -z3 -d:pserver:anonymous@freeimage.cvs.sourceforge.net:/cvsroot/freeimage co -P FreeImage
cd FreeImage
make
sudo make install

Step 2: Preparing your Rails application

In my application I have a Work model to which I want to associate images. Images are submitted by users and are associated to one single Work, a has_many / belongs_to association between a Work and the associated images. My application has also users and I want to know who added a particular image (to prevent abuse).

In order to make use of the functionality provided by attachment_fu you need to create an ActiveRecord model with at least the following attributes:

  • content_type: what sort of content you are storing. This is used by web browsers to know how to present this information to users (open an external application, show embedded using a plugin, etc).
  • filename: a pointer to the image location
  • size: the size in bytes of the attachment

When you store images, attachment_fu makes use of some other useful fields:

  • parent_id: if you store thumbnails to associate them to the parent image (this could actually be used for other type of content as well)
  • thumbnail: as you can have more than one thumbnail, this fields contains the identifier assign to each type of thumbnail.
  • width: the width of the image.
  • heigth: the height of the image.

In my case as I have added the following attributes:

  • work_id: the work that the image is associated to.
  • user_id: the user that added the image
  • default: whether this is the default image to be used when displaying the work
  • created_at: when the image was added

Let’s create the model:

script/generate model WorksImages

My migrations file looks like this one:



class CreateWorkImages < ActiveRecord::Migration

  def self.up

    create_table :work_images, :options => 'ENGINE=InnoDB DEFAULT CHARSET=utf8' do |t|
      t.column :work_id, :integer, :null => false
      t.column :user_id, :integer, :null => false
      t.column :default, :boolean, :null => false, :default => false
      t.column :created_at, :datetime, :null => false
      t.column :parent_id,  :integer, :null => true
      t.column :content_type, :string, :null => false
      t.column :filename, :string, :null => false
      t.column :thumbnail, :string, :null => true
      t.column :size, :integer, :null => false
      t.column :width, :integer, :null => true
      t.column :height, :integer, :null => true
    end
    execute "alter table work_images add constraint fk_wi_works foreign key (work_id) references works(id)"
    execute "alter table work_images add constraint fk_wi_user foreign key (user_id) references users(id)"
  end

  def self.down
    drop_table :work_images
  end
end

Let’s edit the WorksImages model to make use of the attachment_flu plugin:


class WorkImage < ActiveRecord::Base  
  has_attachment :content_type => :image,
                 :storage => :file_system,
                 :max_size => 100.kilobytes,
                 :resize_to => '200x200>',
                 :thumbnails => { :thumb => '50x50>' },
                 :processor => 'ImageScience'

validates_as_attachment

  belongs_to :work
  belongs_to :user

  #The block will be executed just before the thumbnail is saved.
  #We need to set extra values in the thumbnail class as
  #we want it to have the same extra attribute values as the original image
  #except for the default flag that is always set to false
  before_thumbnail_saved do |record, thumbnail|
    thumbnail.user_id = record.user_id
    thumbnail.work_id = record.work_id
    thumbnail.default = false
  end
  end

I wanted to be able to attach images by providing its url, rather than asking the user to download the image and upload it to the system, This can also be used when querying ecommerce apis (like the amazon one) to retrieve and store the images they return. So I enriched my WorkImage model with an extra method (which I guess would be a good feature to be added to the attachment_fu plugin)


def source_url=(url)
  return nil if not url
  http_getter = Net::HTTP
  uri = URI.parse(url)
  response = http_getter.start(uri.host, uri.port) {|http|
    http.get(uri.path)
  }
  case response
  when Net::HTTPSuccess
    file_data = response.body
    return nil if file_data.nil? || file_data.size == 0
    self.content_type = response.content_type
    self.temp_data = file_data
    self.filename = uri.path.split('/')[-1]
  else
    return nil
  end
end

I also enrich my Work model to easily retrieve associated images. You can easily add new relationships for easy access to thumbnails.


class Work < ActiveRecord::Base
...
  has_many :images, :class_name => 'WorkImage', :conditions => ["work_images.parent_id is null"] #The condition avoids retrieving thumbnails
  #Easily retrieve the default image
  has_one  :default_image, :class_name => 'WorkImage', :conditions => ["work_images.default"]
...
end 

Step 3: Make use of the new model in the controller and view

In my controller, when I want to add an image to a model I do something like the following:


def add_image
...
  #Store the image if any
  if params[:image_source_url]
    image = WorkImage.new(:source_url => params[:image_source_url])
    image.work_id = @work.id
    image.user_id = self.current_user.id
    image.default = true if params[:is_default_image]
    image.save!
  end
...
end

Images will be saved in public/work_images using something that Jaimis buck from 37signals called id partitioning.
That way you can theoretically store 9999 * 10000 attachments (thumbnails are not counted as attachments), which for standard purposes is enough. Anyway, this can easily be changed to support more files if you need it. Look for a method named partitioned_path in vendor/plugins/attachment_fu/lib/technoweenie/attachment_fu/backends/file_system_backend.rb.

In order to display the default image in a view I just need to do the following:

<%= image_tag(@work.default_image.public_filename()) %>

If what you want to display is the thumbnail, just pass the thumbnail identifier (in our case :thumb) to the file:

<%= image_tag(@work.default_image.public_filename(:thumb)) %>

And that should be it really. If you have questions, leave a comment.

Note:
I found a small bug in the plugin. It was not storing resized image sizes properly. I had to add edit the vendor/plugins/attachment_fu/lib/technoweenie/attachment_fu/processors/image_science_processor.rb file and set the correct size just after the image is saved in the resize_image method:


...
img.save self.temp_path
self.size = File.size(self.temp_path)
...

I also noticed that for images that do not need to be resized, something is done as the size of the images changes, although the dimensions remain the same. I have a file of 5KB that has a size of 12 KB after the resizing process!!! The size of the image is the same and it should have not been modified. Not sure what is going on here but I guess this is an ImageScience issue.

This is my first post sent using Deepest Sender as it will make updating this log much easier.

When a request asking for an action in a controller that does not exist in your application a not found error page is displayed. You can actually use routes to redirect this requests to the a default page.

Just add the following line the last rule in your config/routes.rb file:

map.connect ‘*path’, :controller => ‘main’, :action => ‘redirect_to_default’

Whenever a request asking for an action in a controller that you have not defined hits your application, rails will call the action ‘redirect_to_default’ in the ‘main’ controller (you can obviously change the controller and the action to fit your needs).

The code for the redirect_to_default action is a simple rails redirect:

def redirect_to_default
  redirect_to :action => 'index'
end

If you want to pass specific options like the table type and the charset to use when creating tabled though Rails migrations, pass an options parameter to the create_table method:

create_table :my_table, :options => 'ENGINE=InnoDB DEFAULT CHARSET=utf8', :force => true do |t|
t.column :column1, :string
t.column :column2, :string
end

In order to set a maintenance page in apache 2 you need:

  1. Enable mod_rewrite. In my debian server I just need to do the following:
    ln -s /etc/apache2/mods-available/rewrite.load /etc/apache2/mods-enabled/rewrite.load
  2. Create the maintenance page somewhere in your disk server. I created it under /srv/www/maintenance
  3. Set up apache2 to redirect all requests to your site to the maintenance page (you will need to comment out the current apache2 directives for your website). In my case I have a /etc/apache2/sites-available/mysite file that is linked from /etc/apache2/sites-enabled/mysite.
    #Maintenance page
    <VirtualHost *:80>
    ServerName mysite.com
    ServerAdmin postmaster@mysite.com
    RewriteEngine on
    RewriteCond %{REQUEST_URI} !/index\.html$
    RewriteCond %{REQUEST_URI} !/logo\.gif$
    RewriteRule ^(.*)$ /index.html [L]
    DocumentRoot “/srv/www/maintenance”
    </VirtualHost>
  4. Reload apache2 configuration:
    /etc/init.d/apache2 reload

In my case I only have an index.html and a logo.gif file in the /srv/www/maintenance folder. If you have more files that are needed to render the maintenance page you will need to add some extra “RewriteCond %{REQUEST_URI} !/yourfile\.extension$” rules.

Note if you do not use the rewrite engine, the maintenance page will show up when you users access http://mysite.com or http://mysite.com/index.html, but if they access http://mysite.com/something_else they will get a nice “Page not found error”.

When one of the hard disks in a RAID 1 gets out of the RAID because it is no longer in sync with the other disk, you can easily resynchronize it with the following command:

raidhotadd /dev/mdX /dev/sdY

X and Y should be set to the appropriate values.

‘cat /proc/mdstat’ would tell you if your RAID system is healthy.

The configuration of your RAID is set in the file /etc/raidtab which will tell you the disks in the RAID and you can compare the results with the cat command above to see which disk is missing.

I use this script to verify if all disks in my RAID 1 are working fine:


#!/bin/bash
#Check if both drive are up
if [ `grep [UU] /proc/mdstat | wc -l` != 2 ] || [ `grep "2/2" /proc/mdstat | wc -l` != 2 ]; then
  cat /proc/mdstat
  exit 1
fi
exit 0

More info here.