Filtering Content

A key aspect of all Drupal sites is that they gather text-based input from users and display it in web pages. Whenever user-provided text is included in an HTML document, there is risk that the text might interfere with the HTML in some way, or even worse, allow attackers to damage a site or render it useless. This could happen in many ways, from malformed or inappropriate HTML tags breaking the carefully designed layout of a site to a single line of JavaScript code that redirects the page to a different site selling questionable merchandise. Thankfully, Drupal has a sophisticated tool for handling this threat: content filtering.

The content filtering system allows administrators to decide what type of content each user role is allowed to contribute. This can range from restrictive (no HTML tags and no scripts) to complete freedom (anything goes, even PHP code, which will be executed). Each different profile is called an input format and can consist of as many or as few filters as you choose. The filters are applied at the time content is served, which means that the same input can be represented in different ways depending on which filters are applied. To configure input formats, select administer> input formats (admin/filters).

The admin/filters page lists all input formats and which user roles are allowed to use them. There are also radio buttons to indicate which input format should be the default for all new content that is created. When a user's role has more than one input format available, he will be able to decide which one to use whenever he creates new content. If you feel that this places too much burden on your users to make decisions, enable only one input format for that particular role, and the choice will be removed.

An input format consists of zero or more filters, which are applied to content in an order that you specify. To see which filters are involved in any input format, click Configure for one of the input types from the admin/filters page. The resulting page lists all of the available filters. Check off each filter that should apply to this input format.

In the default Drupal installation, the available filters include HTML filter, line break converter, and PHP evaluator. Various modules can add their own filters, which then show up in the list. Module-contributed filters range from fun gimmicks (for example, the Smileys module replaces certain combinations of symbols like :-) with a small graphic smiley) to useful additions (for example, the Wiki module handles a full wiki syntax and functionality, all built into a content filter).

Tip Many contributed modules, such as the Glossary (, Textile (, and Markdown with Smartypants ( 9838) modules, leverage content filtering. They work to give the people creating content more flexibility or to enhance the quality of their input. For example, some filters simplify the process of generating HTML, and others scan what is written for important vocabulary words or technical terms that have been defined elsewhere.

HTML Filter

The HTML filter strips or escapes any tags that are not explicitly allowed. The administrator controls the list of allowed tags for the content. To see exactly how this filter behaves, click the Configure tab for one of the filter formats for which the HTML filter is enabled (administer > input formats> configure> configure). The first option, Filter HTML Tags, specifies whether HTML tags are removed from the output completely or escaped so that the tag itself is visible in the output. Escaping involves replacing the following characters with their HTML entities:

Escaping has the advantage that if a user enters a tag that isn't allowed, she sees the tag in the output and can conclude that using that tag won't work. The disadvantage is that she might leave the escaped tags there, detracting from the quality of the content.

Tip The Codefilter module ( is ideal for sites that want to discuss PHP code.

The next field for configuring the HTML filter is Allowed HTML Tags. This is a space-separated list of HTML tags that you allow for this input format. The HTML Style Attributes field decides whether tags can possess an HTML style attribute. Since it is legal for almost any HTML tag to have this attribute, the HTML filter strips it by default to prevent users from writing things like this:

<a href="path to my site" style="font-size: 50em">check this out</a>

While you might not mind someone linking to his site from within a post, you will probably object if the text appears in gigantic sizes on the screen. Worse than destroying your layout, however, are the various security risks involved with allowing anyone with a user account (potentially anyone) to enter content on your site without some level of control. The family of attacks that one could perpetrate against your site in this case are referred to as cross-site scripting (XSS, defined at, and the HTML filter is your first line of defense against these types of attacks.

If the Display HTML Help check box is checked, each content creation form will display a link to the Compose Tips page (filter/tips), where users can read more instructions on using the particular filters that are enabled for them. This might be helpful if your target audience doesn't know any HTML and you want to encourage them to apply markup to their posts for linking or formatting purposes. Note that there are several contributed modules available that address this need; most users will balk at having to write HTML.

Was this article helpful?

0 0
Article Marketing Gold

Article Marketing Gold

Discover How You Can Use The Huge Power of Article Marketing To Drive Highly Targeted Traffic To Your Website and Instantly Become THE Person Your Prospects Want To Buy From.

Get My Free Ebook

Post a comment