Questions? Comments? Email articles-phpbestpractices [at] tucuxi [dot] org. This article in particular is a work in progress, so take it with a grain of salt.

Introduction

Over a number of years of PHP development, I've come across a number of quirks in others' code as well as my own that led me to build a list of things that are either questionable, hard to read, or just general guidance and advice pertaining to PHP development. Some items more than others can be generalised to any software development, whereas others are PHP-specific.

Please keep in mind that this is a work in progress, and the statements expressed here are just opinions and observations. No formal research has gone into any of this.

The SDLC

Many a person has parroted on about the SDLC in one form or another - waterfall, RUP and XP are various methodologies that've been pushed during the life of the software engineering industry. None of them are perfect, but the general concepts underlying software engineering in addition to specific principles from each development methodology can have a real positive impact on projects.

Requirements, Design, Implementation

Many a PHP developer dives straight into the implementation phase of a project. Remember - unless you're coding Hello World, you'll probably be doing something that resembles a development effort more than just a simple webpage. Make life easier on yourself - define what you want before you start tooling around in PHP. I'm not talking UML, detailed design documentation or any serious undertaking - just a few paragraphs defining the scope of the system, what you want to achieve, and the context of the system (databases? SOAP interfaces?) will give immediate returns when looking at how to implement the system.

If you're beyond a simple system, you should consider formal requirements and/or design specifications for your web application. For example, if you're designing a CMS or gallery system, or a business web application, these are the right scale of application where formal requirements and design make sense. One caveat to take notice of is that many preexisting design document templates will not be appropriate for web applications. Concurrency and the stateless nature of the web will impose certain restrictions on the behaviour possible, and the requirements/design documentation should explicitly define the tools and/or methods used to address these concerns.

Requirements Elicitation

Frequently, the end-users of a proposed system will not be consulted until the system is well and truly implemented, which can lead to some rather embarassing consequences, especially if the system is an internal business system. The end-user is typically a key stakeholder in any system being created, and should be, at the very least, informed of any major requirements/design decisions to ensure that key use cases are not missed.

If you have a large audience with your system, grab a key user or two, and a user who is representative of the typical end-user. While you don't want to miss the advanced use cases, don't make things overly complex. Tailor the system to the typical user, then permit advanced use cases beyond that.

System Context and Architecture

Once requirements have been drawn up, one of the first design tasks is to determine the context that the system will be operating within, that is, the technical environment where the application will reside. Is there a pre-existing farm of web application servers, database servers and middleware that existing systems rely on? Is this the first web application that you've commenced?

After taking into account any preexisting environment, the high-level architecture of the system should be defined. Depending on the type of application, you may wish to separate the system into various layers - for example, data storage, business logic, and presentation would form a three-tier application. Most frequently, web applications will be a front-end to a database, and you need to decide what type of server will be used. This will shape both the application server environment, as well as the server-side scripting language used - is PHP the best choice? Why not ASP? Or, for that matter, Cold Fusion? There are a number of languages and environments to choose from, and it is up to you to determine which is the best choice for the pre-existing skillset, and the problem being tackled.

Design of web applications

Common Design Concerns

Requirements and Standards

If one thing is constant within the IT industry, it's change. One day the dominant browser will be Mosaic. Or Netscape. Or maybe Internet Explorer. If you design a system to take advantage of and depend upon a single platform's features, your system will break when users move to a different platform. The recent upheaval in the browser market between Netscape, Internet Explorer and Firefox reinforces this, and as a consequence, many developers have made the move to specify standards compliance in system requirements specifications. Designing and implementing a system to meet common standards ensures that you can leverage a larger installed base, in addition to new platforms, such as mobile phones that implement HTML browsing.

Multi-tier applications

Graceful Degradation

Graceful degradation is an issue specific to distributed applications that utilise generic clients - for example, the world wide web. Specifically, the multitude of web browsers available implement differing sets of functionality that intersect with published standards, most notably the HTML/XHTML specifications, along with the JavaScript DOM and CSS. Graceful degradation is the process of making applications work with no loss of functionality with varying levels of feature support.

The key issue to consider with graceful degradation is baseline functionality - what is the minimum feature set required for a user to access your web application? If the application is being used in a government or business capacity for the public at large, various pieces of disability and anti-discrimination legislation may apply to your system. In this case, the established 'baseline' for access is typically a text-mode HTML 4.01 browser. Regardless of whether legal issues force compliance with baseline access methods to your web application, you should consider designing your application for the lowest-common-denominator, for reasons described below.

Designing your web application for the most generic client possible not only enables access from a broader subset of the community, but also permits future technologies to access your application without issue. For example, consider an application that requires both JavaScript and Macromedia Flash to be active on the client platform. Such an application would immediately exclude the plethora of mobile devices currently being deployed in the market, such as mobile telephones, PDAs, smartphones and internet tablets which will open up web applications to a new level of accessibility. If the application was instead designed to only require HTML 4.01, and permit JavaScript and Flash to enhance the experience if available, all those new possibilities for client platforms are opened up.

The real-world implementation of this topic is best addressed with a simple example, for which we will consider three levels of functionality. The first, most basic level of functionality is a text-only browser that supports HTML 4.01. The second level of functionality is a graphical web browser on a mobile telephone supporting HTML 4.01 and CSS 2.0. Thirdly, we will consider a graphical web browser supporting HTML 4.01, JavaScript and CSS 2.0, running on a desktop computer.

The example application in question is a client-facing quoting system for a building supply company. The original application was written as a client-side JavaScript-dependent webpage, with all business logic executed in the web browser. Additionally, all navigation and design has been implemented with a table-and-images approach. This style raises the bar of entry to graphical platforms supporting JavaScript, and increases the cost for mobile users to view the site, as such users are often paying for metered data.

An alternative approach to implementing the business logic in a client-side script would be to develop a server-side web application that returns calculations as a pure HTML page. Additionally, redesign of the application's interface to utilise CSS, specifically, media-dependent stylesheets would improve the experience for all users of the application. By separating out the elements as such, we realise the following benefits:

  1. Business logic is moved to a platform controlled by the organisation. This permits greater confidence in the system, as the execution takes place on a platform that may be tested extensively by the organisation.
  2. Business logic is hidden from the client. In the case where volume discounts are to be hidden from low-volume clients, or other trends are to be hidden, moving the business logic to the server is the only sane option.
  3. Moving the business logic to the server permits further interfacing with preexisting systems, both extending the capabilities and leveraging the investment in existing code.
  4. By separating layout into a separate stylesheet file, caching can lower the bandwidth requirements of the site for mobile users, thereby lowering the cost of access.
  5. With the use of device-specific stylesheets, layout can be tailored to specific platforms rather than having a single fixed layout for all devices.
  6. Reformatting of content to use HTML in a logical rather than visual markup role performs two tasks - it makes your site easier to understand for visually-impaired users, and it improves readability for search engine spiders.

Accessibility

As mentioned earlier, accessibility is an offshoot of the graceful degradation principle at the core of any accessible web application. However, the concept of accessibility encompasses a number of other areas, of which most are prompted by physical disabilities, and are addressed in the following paragraphs. Primarily, web design is concerned with visual impairment, for example, colour blindness.

One of the primary tenets of accessible web design is catering to colour-blind and other vision-impaired users. You should not communicate a message solely by using colour, as some of your audience may not be able to discern between the colours chosen, even those that are not affected by colour blindness - colours may have differing meanings in other cultures. Additionally, those users not accessing your application through a graphical web browser, such as those using a screen reader will not be able to notice the difference either. You should instead communicate your message via text, with colour providing a supporting role. For example, if you wanted to show the status of a system (operational or broken), using a red and green circle would not suffice. Instead, using the text 'Operational' with a slight green tint, and 'Broken' with a slight red tint would be more acceptable.

Moving past the issue of communicating information purely by colour, there is also the issue of contrast in web pages. Users who have vision impairments ranging from bad vision (requiring large, high-contrast text) through to colour blindness should shape the graphic design of your site or application. Remember to keep a high level of contrast between the background and text, and to view the site through a colour-blindness simulator that shows the effects of protanopia, deutanopia, and tritanopia.

Blindness issues - shortcomings of screen readers

Animated Images

Keyboard accessibility and timeouts

Auditory disabilities

Implementation Details

OOP: Use of Classes

Use of short tags

When developing any sort of application, you should strive to be strict in what you generate, and liberal in what you accept. This also extends to the source code in PHP - developers are often given a choice between ASP-style tags (<%), short tags (<?), and long tags (<?php). You should always use the latter option - long tags. Only these tags are accepted on every PHP installation, and in many deployment environments, you will not have the permission to change the options that determine which tags are accepted. Don't be a lazy developer - always use long tags.

Use of Globals

Along with not using short tags, the use of automatically registered globals ($foo as opposed to $_GET['foo'], or $_POST['foo']) is a tactic that depends upon the settings defined in php.ini. To allow for maximum portability of PHP code, the use of register_globals is highly discouraged. Data should only ever be accessed via the relevant superglobal arrays.

Use of the generic global variables can be a security risk, as the source of the information varies depending on the configuration in php.ini. In some instances, the user may be able to override settings such as HTTP_REMOTE_USER by including the data in the URL string, overriding that set by the web server process. Explicitly using the correct superglobal; in this case, $_SERVER, prevents this sort of attack, whilst at the same time, ensuring that your code will be portable.

Strict Typing

Don't Reinvent the Wheel

A common problem encountered in maintenance programming is the reinvention of the wheel. Often caused by the Not-Invented-Here syndrome, developers will tend to reimplement items that have been written and published elsewhere. To borrow a quote from Fight Club, you are not a special and unique butterfly. It's likely that somewhere, somebody smarter than you has already implemented part of what you're doing and published it on the internet. Please, do some research, and if the code is of sufficient quality, and it's released under an appropriate license, reuse the code. We'd be getting nowhere if we literally had to reinvent the wheel for every vehicle. Software engineering shouldn't be the same way.

Templating

Templating is one area where there are numerous open-source toolkits that have solved the issue of separating presentation from behaviour, and the entire web development community benefits as a result. Smarty is a widely-used template engine, and as such, has strengthened due to the many-eyes nature of open source projects.

A more formal method of generating presentation from application data is to use XSLT. XSLT, or XML Stylesheets, comprises of a single XML document that transforms one XML document into another, or any generic format for that matter. Within this site, there are a couple of examples of XSLT with Azureus Statistics which may give an insight into the work involved in developing XSLT documents. XSLT is a case of trading off runtime speed for implementation time and reusability - XML as a document format is well-defined, and the toolkits for manipulating both XML and XSLT have matured over recent years. In business systems used for both user-facing functions and data interchange, machine-generated XML and XSLT processing may be a way to reduce rework, simply by splitting the XML generation and XSLT processing into the business logic and presentation layers, respectively.

There are also a number of templating toolkits on PEAR, however the maturity of these toolkits is questionable. Thorough research into the capabilities, limitations and quality of the template implementations would be advisable before deciding to use any of these in a production system.

Databases

Database Libraries

Escaping and Parameters

Placement of Business Logic

It is rare (and unwise) that business logic should appear entirely within the presentation tier of any application. PHP should be primarily used as a presentation layer atop another system that permits data-centric transactions to be executed in a reliable manner. Specifically, place your business logic in a middle tier, or in the database if you only have a web front-end. You never know when your system will grow or be linked with another system, and reliable interfaces only permit good data. In this instance, the database (via stored procedures, triggers and referential integrity measures) should only accept data that complies with the business rules.

Security

Sessions and Cookies

Web applications add a new dimension to those that are used to the typical single-user client side application paradigm; concurrency, locking and other system behaviours are closer to a system invoked by RPC or other stateless means. The web operates as a stateless entity, and state is added on top of this model with elements such as cookies, or session data posted with every query. As a PHP developer, you must be aware of both the inner workings of these stateful models, and the limitations imposed. Security is another important aspect when attempting to impose state on a stateless model, and anybody working on a system handling confidential data should be well-versed in generic and specific attacks against these sorts of systems.

Input Validation

Input validation is rarely done well in PHP applications, as many a maintenance programmer and code reviewer would testify. One of the most critical items when dealing with any application, and not just PHP web applications, is that input cannot, and should not be trusted at all. Users may submit anything they like to an application, and if the system blindly accepts the data, attacks such as SQL injection exploits can take place.

Every facet of the application that accepts data from the client should validate the data that is acceptable from the client. The following questions should be addressed in every action in the system:

  1. Does the user have the permission to know that this action even exists?
  2. Does the user have permission to use this action? For example, should a guest user be able to delete a page in a CMS?
  3. What data should the user be giving to the application? What fields?
  4. By what means should the user be giving the data to the application? GET variables? POST variables?
  5. Does the data match the formats defined in the requirements? Length? Data type? Range? Defining valid grammars or regular expressions can help to explicitly define what you expect from the user.
  6. What do you do with invalid data?
  7. Does the user have permission to see any objects in the database/system that they're posting to? For example, if they post Book ID 9, do they have access to see that this book exists? Should they be able to see the title?
  8. Does the data contain any escape characters or string delimiters when we don't expect it, for example, single quotes in a numeric field? This is probably a sign of an SQL injection attack, and we should probably notify somebody, to at least keep an eye out for similar attacks.

Use ereg() to validate length and contents at once. Having a single check for field validity is not only efficient, it's safer programming. You're less likely to forget a single check for a field than you are 1/3 of the checks for a field - if you eschew checking strlen() elsewhere, and instead specify the field length by grammar/regular expression, you know that both the formal definition of valid inputs is correct, and that the implementation is simplified.

Using ' and "

Know the difference between ' and " in PHP. Not doing so can cause grave security problems down the track when you forget to correctly escape "special" characters in a string. The bottom line is - ' does not allow any evaluation of other variables within the string (and is slightly faster), whereas " allows variable substitution, which can pose a security risk if not understood.

Use of GET and POST

Understanding the difference between GET and POST requests is critical when writing stable, predictable web applications. Read RFC 2616, the specification for HTTP/1.1, and understand the difference. GET should be used for safely-repeatable actions, for example, queries against a database. POST should be used when modifying data, or performing transactions that have adverse effects when repeating the request, say, a request to debit a credit card in an online store.

Implementors should be aware that the software represents the user in their interactions over the Internet, and should be careful to allow the user to be aware of any actions they might take which may have an unexpected significance to themselves or others.

In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe". This allows user agents to represent other methods, such as POST, PUT and DELETE, in a special way, so that the user is made aware of the fact that a possibly unsafe action is being requested.

Naturally, it is not possible to ensure that the server does not generate side-effects as a result of performing a GET request; in fact, some dynamic resources consider that a feature. The important distinction here is that the user did not request the side-effects, so therefore cannot be held accountable for them.

-- RFC 2616, Section 9.1

Separate Code and Presentation

Separating code and presentation is of paramount importance in well-designed, long-lived web applications. If you want your application to last beyond a few development or design iterations, it is essential that you use a method to separate out the business logic from the presentation of the application. Typically, this will be achieved by use of a templating engine. One such engine that gets quite a bit of press is Smarty, and it seems to be the default templating engine of choice for those starting out on medium-scale OSS projects.

Far from blowing Smarty's horn, I don't think I'd outright recommend Smarty for use in a large web application - it is far too focused on the style of code in PHP4 (i.e. non-OO coding style), and the syntax for accessing objects suffers as a direct consequence. Additionally, mathematical functions can be subsceptible to syntax errors due to a rather lacking parser, so some portion of the template is typically dedicated to assignment statements, rather than focusing directly on the output.

That said, any template engine is going to have its flaws, so it's best to learn and work around said flaws (which I may cover in a future article). So, do a bit of digging around, and find the templating engine which seems the most natural to you before deciding upon a single one.

php.ini settings

Data Objects

Transactions