Sunday, March 07, 2010

Drupal Hell: A Controlled Descent Story

As I mentioned before, I've been working on integrating Drupal into our (Java) publishing infrastructure. Over the last few weeks, I have been learning about Drupal, what is possible and what is not. This week, I describe two use-cases where I had to "extend" Drupal somewhat. In the second case, I came very close to hacking the core, hence the title, based on the advice given in The Road to Drupal Hell.

The setup is that we use Drupal as our blogging platform, and then publish the blogs using XML-RPC to a Java application. Only the editorial staff and bloggers have access to the Drupal app. Readers read the blog on the Java based website.

Of course, blogs by their very nature are interactive, so readers should have a way to comment on the blogs. This is done by a Java action which calls Drupal's comment.save XML-RPC service. By default, comments need to be moderated, so the editorial staff will review and publish these comments, which would then result in a node republish, and the comments will appear on the Java site.

Use Case #1: Batch Publishing

Trying to test using the Drupal interface is kind of painful (too many mouse clicks required), so I built a little PHP script that publishes all publishable blogs in one call. This post from Stonemind Consulting was very helpful. Basically, I followed the ideas in here to call the send_request() function. Here is the complete script if you are interested:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
<?php
require_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
error_reporting(E_ALL);

# include local settings for scripts. These are the values of the remote
# publishing and preview servers.
require_once './sites/default/local_settings.php';

$user = $GLOBALS['user'];
if (isset($user)) {
  if ($user->uid != 1) {
    echo 'Sorry, only administrator can perform this task.';
    return;
  }
} else {
  echo 'Not logged in. Please log in as administrator before performing task';
  return;
}

echo '<b>Republishing nodes...</b><br/>';
echo 'This tool republishes ALL nodes with node.status=1 and node.type=blog.<br/><br/>';
echo '<table cellspacing=3 cellpadding=3 border=1>';
# Header
$num_publishers = 1;
echo '<tr><td><b>Node-ID</b></td><td><b>Title</b></td>';
foreach ($healthline_drpub_remote_urls as &$remote_url) {
  echo '<td><b><a href="' . $remote_url . '">Publisher-' . $num_publishers . '</a></b></td>';
  $num_publishers++;
}
echo '</tr>';
$published_nids = db_query('select nid from node where status = %d and type=\'%s\'', array(1, 'blog'));
$num_nodes = 0;
$num_attempted = 0;
$num_ok = 0;
$num_failed = 0;
while ($result = db_fetch_object($published_nids)) {
  $node = node_load($result->nid);
  $preview_url = sprintf($preview_url, $node->nid);
  echo '<tr><td>' . $node->nid . '</td><td><a href="' . $preview_url . '">' . $node->title . '</td>';
  foreach ($remote_urls as &$remote_url) {
    $response = dxi_send_request($remote_url, 'publish', $node);
    if ($response == 0) {
      echo '<td><font color="green">OK</font></td>';
      $num_ok++;
    } else {
      echo '<td><font color="red">Failed</font></td>';
      $num_failed++;
    }
    $num_attempted++;
  }
  $num_nodes++;
  echo '</tr>';
}
echo '</table><br/>';
echo '#-nodes republished: ' . $num_nodes . ', #-publish attempts: ' . $num_attempted . ', #-successes= ', $num_ok . ', #-failures= ', $num_failed;

As you can see, most of this is plain old PHP, with a few Drupal API method calls thrown in for convenience. You basically call this on a browser from within Drupal's interface, similar to /update.php or /install.php. You will need to be logged in as administrator to run the script.

One problem that I had (which I also had in the second use case, and which I solved differently, and I think a bit more elegantly) was to get the value of the URL for the publisher XML-RPC service, and the URL for the preview service (to allow one-click viewing of the published node). Ideally, I should be able to get these from Drupal itself, since these are already provided to it, but I couldn't find an API call that provides this information, so I had to build up my own mechanism, similar to the settings.php file. My properties file is called local_settings.php and is located as a sibling of the settings.php file in sites/default.

1
2
3
4
5
<?php
$remote_urls = array(
  'http://somehost.mycompany.com/myapp/publish.do'
);
$preview_url = 'http://somehost.mycompany.com/myapp/blog/%d';

I don't really like this approach, as its not DRY, but its the best I could come up with at the time. The approach of setting it in the Drupal variables table (described below) is slightly better, but still not as DRY as I would like.

Use Case #2: Comment Publish/Unpublish

Wanting to do the right thing as a new Drupal developer, I had read the Pro Drupal Book, and heeded and understood the warnings about hacking Drupal's core. In fact, I had structured my module (described earlier) so it exposed a configurable action - that way a site administrator can attach it to whatever trigger he saw fit, rather than have to do this in code by implementing a hook_XXX() method.

Drupal exposes triggers on "insert", "update", "delete" and "view" comment actions. So I figured that a comment publish/unpublish results in a status change, therefore my action should be attached to a comment update trigger, but apparently I was wrong. For some reason (probably performance), the publish/unpublish operation is done through a straight SQL call (specified in comment_operations()), and does not result in the comment update trigger(s) being fired. Actual modification of the comment (such as modifying the content or the title), however, does.

So my first approach was to go in and hack the core, specifically, the comment_hook_info() method, to add the 'publish' and 'unpublish' operations in it. This allowed me to hook up my custom action, but still did not result in it being called (which was a good thing, since it forced me to re-evaluate my options).

Running through the publish/unpublish operation with a debugger, I found that it calls comment_invoke_comment() from comment_admin_overview_submit() when a comment is published from the Approval Queue (or vice versa). The comment_invoke_comment() invokes the hook_comment() method in all loaded modules, so I needed to explicitly implement a dxi_comment() function which would delegate to my custom action. I had originally tried using a hook_nodeapi() implementation and removed it in favor of a custom action, so this was kind of a step backward. In any case, this was finally what I came up with:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
<?php
function dxi_comment($comment, $op) {
  if ($op != 'publish' && $op != 'unpublish') {
    return;
  }
  if (! isset($comment->nid)) {
    watchdog('dxi', 'No Node for Comment(' . $comment->cid . ', op=' . $op . ')');
    return;
  }
  $nid = $comment->nid;
  $node = node_load($nid, NULL, TRUE);
  if (! $node) {
    watchdog('dxi', 'Node load failed for nid=' . $nid);
    return;
  }
  // prepare the context
  $context = array(
    'op' => $op,
    'hook' => 'comment',
    'remote_url' => variable_get('dxi_remote_url', DEFAULT_DXI_REMOTE_URL)
  );
  return dxi_call_action($node, $context);
}

Note how I have to build up the $context object. Unlike in the custom action, the $context object does not have the 'remote_url' field which is set by dxi_call_action_submit(). So I had to make another change here to set the remote_url value set up by the client into the variables database table, as shown below, so I could use it in my dxi_comment() to populate the $context.

1
2
3
4
5
6
7
8
<?php
function dxi_call_action_submit($form, $form_state) {
  $remote_url = $form_state['values']['remote_url'];
  variable_set('dxi_remote_url', $remote_url);
  return array(
    'remote_url' => $remote_url
  );
}

With this approach, I have basically worked around what I think is an oversight in Drupal's comment module. Since the comment module does pass in 'publish' and 'unpublish' in the $op variable when it calls dxi_comment(), these operations are recognized in the code, so I don't see why they are not exposed so custom actions can be added to them.

Conclusion

I am still learning Drupal, and my main programming language is Java, not PHP, and this probably shows in my code. I am wondering what the best practices are for this kind of stuff. Specifically, I am looking for answers to:

  • What is the best practice for accessing action configuration variables? I have two different ways of pulling the remote_url value from my custom action. Ideally, I would like to just pull it out of the context somehow, or use some Drupal call to get it out of the actions table. Does such a method exist?
  • Have others had this kind of problem with comment publishing and unpublishing? If so, how did you solve them? Is there a reason why the publish and unpublish actions are not exposed as triggers?

I guess I could ask around on the Drupal mailing list, and I probably will, but as someone who really cares just enough about Drupal to get things hooked up correctly with the Java application, I will probably hold off on this until my Java application is more stable. Meanwhile, if you have answers or alternative approaches to this problem, would appreciate hearing from you.

2 comments (moderated to prevent spam):

Anarchist said...

Before getting into the specifics on what you want to accomplish, how about a little more background. Why are you using a Java application as the view?

Not criticizing, I know my clients have wonky requirements, just want an honest breakdown.

Sujit Pal said...

Hi Anarchist, the requirements aren't quite as wonky as you think. Hopefully you will agree once you read through my response.

We are a Java shop, we have at most 4-5 people who have programmed in PHP sometime in their careers. Also, in this particular app, the data from Drupal will be a small part of the overall content. The templates for rendering the content are already available in the Java webapp. Finally, we do some processing of the data to extract context out of it, for search and advertising purposes, etc, for which Java modules are already developed and in-use.

Arguably, we could continue to use Drupal to serve the blogs within the overall webapp by proxying through to it from the front-end, but then we would have to interface with Drupal from all these other (post-processing) apps. However, this would mean multiple PHP-Java integration points instead of one, since the backend modules are not as integrated as the front end Java webapp.