How’s WordPress even running on .NET?

You may have come across the fact that we managed to run WordPress on .NET. In this article, we take a look at what’s under the hood and we’ll examine some of the obstacles and the technical solution behind this project.

WordPress running on .NET – what does that mean? WordPress is a fairly large application written in the PHP language, which we compiled to .NET using PeachPie so that it does not require PHP anymore and thus runs purely on the .NET Core runtime. This has received a considerable amount of attention, so we wanted to dive into the guts of how this works.

If you want to skip the entire article, you can find the whole project with some juicy interoperability demos on github.com/iolevel/wpdotnet-sdk.

Visual Studio Solution Explorer with WordPress as a native .NET project.

Since we’re replacing the whole runtime and all the native libraries with something that works completely differently, while providing a reasonable developer experience, there is a lot to talk about. This article might help you understand what’s going on under the hood with an application written in PHP still thinking it’s running under PHP, but actually there’s no PHP.

Let’s summarize the main differences we’re tackling:

  1. The app is deployed without sources, compiled to an intermediate language.
  2. Web requests now run within threads instead of processes.
  3. The Server API (SAPI) is ASP.NET Core compliant now.
  4. A completely different base class library (BCL).
  5. Some source files are actually not intended to be used; they depend on missing components, or contain errors.
  6. String values are Unicode encoded.
  7. .NET reflection does not treat PHP specifics.
  8. The concept of global code and functions does not exist in .NET.
  9. Static properties behave differently – in PHP they are bound to a request, in .NET they are bound to a process.
  10. The configuration is not hardcoded into the sources anymore but declared in the appsettings.json file.

This gives us roughly the following areas we have to redesign with respect to backward compatibility:

  • compiler: transforms PHP sources to equivalent .NET intermediate code
  • runtime: operators, built-in types, reflection, and supporting API’s
  • base class library (PHP extensions, built-in functions)
  • request lifecycle

Base Class Library

One of the simpler tasks is to reimplement the BCL. In PHP, using the required configuration for WordPress, there are hundreds of classes and interfaces, more than a thousand global functions and more than a thousand global constants. It is functions like strlen(), max() or the more complicated pcre_match() or mysqli_connect(). A WordPress application uses most of them. The BCL was reimplemented in the managed C# language, complying with the PHP manual.

Wait, but C# does not have global functions and constants, right? Also, it doesn’t have a few other features that PHP has. In order to satisfy the PHP specs while using the C# language, there are specific rules on how to export a PHP-specific declaration from C# code. This is described at docs.peachpie.io/api/Libraries-Architecture/.

For example, the implementation of the function strlen() would look as follows in C#:

[assembly: Pchp.Core.PhpExtension]
public static class StandardFunctions {
  /// String length implementation.
  public static int strlen(string value) {
    return value != null ? value.Length : 0;
  }
}

The compiler and runtime will understand that public static methods in assemblies denoted with the attribute PhpExtension are in fact treated as PHP’s global functions. In this way, all the necessary BCLs in PHP have been rewritten to C# – a strongly typed language.

Runtime

PHP has the Zend Engine – basically an interpreter with opcaches and its own Just-In-Time compiler in the future, defining all the operators, garbage collector, error handling, built-in types and structures, stack implementation, and much more. It’s all written in the C language, compiled for various platforms and CPUs.

Our replacement runtime (Peachpie.Runtime) has to provide the same functionalities except that it actually doesn’t. The app will run on the .NET runtime, which already provides many of the operators and basic types, such as double or long, and provides its own garbage collector, Just-In-Time compiler and many other features we can take advantage of. So our runtime provides additional API’s (reflection, dynamic code evaluation, ..), PHP-like operators and the remaining built-in types (resource, array, ..), always trying to take advantage of what’s already implemented in .NET.

There are a few important types implemented in our runtime in C#:

  • class Context [ref] is an abstraction of a single PHP request. It manages the application’s lifecycle, provides access to static class properties and understands the compiled PHP assemblies with its additional metadata.
  • class PhpTypeInfo [ref] provides access to the additional reflection API.
  • struct PhpValue [ref] is a union of any possible value that a variable can have. A more performant allocation-less dynamic if you wish. PHP is a loosely-typed language and in most cases, variables are dynamic. In case the compiler (below) won’t be able to resolve the exact type of variable or property, PhpValue is used instead.
  • class PhpArray [ref] implements the native type array.
  • class Convert or class Operators define the semantic of various PHP operations. The runtime and compiled code make calls to those methods in order to perform an operation.

Compiler

The main difference – the compiler – a standalone code analysis tool performing ahead-of- time translation of PHP to .NET binary code. You can challenge it at https://try.peachpie.io/ and see how it converts various PHP constructs into MSIL byte code that will have the same semantic as if it ran on PHP.

A PHP script compiled to IL and decompiled to the C# language as a demonstration.

The compiler understands the conventions of the BCL, the semantic of operations and the APIs of the runtime. It performs fixed-point type analysis, resolves CLR types of variables if possible, performs diagnostics, and emits compliant IL instructions, CLI metadata, type definitions in a certain format, making use of the runtime and BCL, and PDB debug information.

For every PHP construct, a corresponding IL sequence is emitted. If possible, the sequence is the same as if it were emitted by the C# compiler. This helps with reverse engineering (that’s why we can show a C# equivalent of the PHP code) and also increases the chances of the .NET JIT understanding the emitted pattern and being able to optimize it better.

A PHP function compiled to IL and decompiled to the C# language.

For example, .NET does not have the concept of script files, it only has methods. Script files are very important to the PHP world, because they represent an entry point of every request. Hence, the compiler treats a script file (global code) as a static method within its own specially named static class (as described at docs.peachpie.io/api/assembly/compiled-assembly). This allows to call the global code like a static method. The runtime understands this convention and provides the following API – Context.Include(string cd, string path, array locals) – that finds the corresponding static method in the compiled assembly and invokes it. The image above depicts how the script file is compiled into MSIL.

Compiler Usage

How’s the compiler being used? As you can see, the whole project consists of many libraries and APIs. There is a lot of ways of compiling code either using a web service try.peachpie.io, in-memory evaluation, Visual Studio, or the simplest way by using the dotnet command-line tool.

The compiler as a NuGet package involved in the build process.
The yellow box is the resulting assembly file.

In the last case, the compiler itself is implemented as an MSBuild Task in its own MSBuild SDK called Peachpie.NET.Sdk. This SDK is available as a NuGet package and hosted on nuget.org. This, in combination with the new MSBuild 15.0 project format, makes the whole process very simple, because the SDK and the corresponding runtime and BCLs are seamlessly downloaded by MSBuild itself (which is part of Visual Studio or the .NET SDK) when first used. MSBuild (invoked roughly as msbuild.exe or dotnet build command-line tools) consumes a project file, in which we state what SDK we use and what the parameters of the project are. In our case, it is an XML file with the following content:

<Project Sdk="Peachpie.NET.Sdk/0.9.970">
  <PropertyGroup>
    <TargetFramework>netstandard2.0</TargetFramework>
  </PropertyGroup>
  <ItemGroup>
    <Compile Include="**/*.php" />
  </ItemGroup>
</Project>

Read more about the project file at docs.peachpie.io/php/msbuild/. With the project file, you simply run dotnet build. What happens next:

  • The Peachpie.NET.Sdk NuGet is downloaded by MSBuild SDK Resolver.
  • Project dependencies specified implicitly in the SDK are resolved including the package Peachpie.App, which has dependencies on Peachpie.Runtime and Peachpie.Library** (the BCL).
  • MSBuild target Build is invoked, which calls the compiler.
  • The compiler gets a list of <Compile> items (the list for .php files to be compiled), a list of resolved dependencies (paths to their .dll files) and some other properties, such as the level of optimization or an output path.

WordPress to DLL

With all this in place, we can proceed to the compilation of WordPress source code. The following is the simplified project file (peachpied.wordpress.msbuildproj) we use:

<Project Sdk="Peachpie.NET.Sdk/0.9.970">
  <PropertyGroup>
    <TargetFramework>netstandard2.1</TargetFramework>
    <NoWarn>PHP0125,PHP5011,PHP6002,PHP5018,PHP5026</NoWarn>
  </PropertyGroup>
  <ItemGroup>
    <Compile Include="**/*.php" Exclude="
      wp-includes/class-json.php;
      wp-config-sample.php;
      wp-content/uploads/**;
      " />
    <Content Include="**" Exclude="**/*.php" />
  </ItemGroup>
</Project>

Here, we specify that our target framework complies with .NET Standard 2.1. Then we ignore some warnings that we consider not so important for the time being. Take a look at the list of warnings produced by the compiler if you are interested docs.peachpie.io/php/diagnostics/. The <Compile> section specifies the source files that will be compiled. Notice that we exclude files that are actually not meant to be used.

After a successful compilation, we’ll have a peachpied.wordpress.dll file in the newly created bin/debug/netstandard2.1/ subfolder. The file is a standard .NET assembly file, so we can read it using any IL decompiler, such as ILSpy. The decompiled code is not actually meant to look nice, but still, we can peek inside to get an idea of how it works. The picture below depicts the compiled index.php global code as a static <Main> method inside a dedicated static class. It has a non-compliant C# name, so it cannot be directly called from C#.

index.php file is compiled as static method <Main> in class <Root>.index_php.
index.php global code decompiled to the C# language.

As you can see, the compiled code has weirdly named magic identifiers. This is so that it doesn’t conflict with the user’s variables and functions. Also, note that this code is not intended to be worked with; you continue by modifying the original PHP code if you want to make changes. Be that as it may, the compiled code is CIL compliant, runs on the .NET runtime, and can be used directly from other .NET applications.

Request Life-Cycle

Now we have the index.php global code compiled as a .NET method and we need to make sure it gets called properly in response to a web request. The Server API (SAPI) is responsible for setting up the auto-global variables $_SERVER, $_GET and others. In our case, the SAPI is ASP.NET Core, which of course doesn’t know anything about PHP.

Request handling in modern ASP.NET Application,
with the middleware for PHP scripts.

The PHP behavior of a web request under ASP.NET Core is implemented within the Peachpie.AspNetCore.Web library, especially by the middleware PhpHandlerMiddleware. The middleware implements the PHP-like lifecycle. It waits in the request pipeline for requests to PHP files and once the request starts:

  1. Context is created and initialized with everything expected by the PHP web application, regarding Microsoft.AspNetCore.Http.HttpContext. This includes auto-global variables or even the response output stream, which is used by PHP’s echo statement.
  2. <Root>.index_php.<Main>() is found in the assembly peachpied.wordpress.dll and invoked. Notice an instance of Context is passed to the <Main>() method.
  3. Resources created during the request get disposed.

Configuration in a JSON file

Wouldn’t it be nice to configure WordPress in the standard appsettings.json file [ref]? Note that WordPress distinguishes configurations and options:

  • configurations are placed in the source code, in the wp-config.php file, as global constant definitions, such as DB_NAME, DB_PASSWORD or WP_DEBUG.
  • options are found in the database.

We’ll take care of the first case – the configurations. We’re going to replace the global constants defined in the source code with the JSON file. First, we’ll remove the configurations from the source code as follows:

/** MySQL database username */
// define('DB_USER', 'root'); // comment this

As you can see, the global constant definition DB_USER in PHP is just a BCL function call. It is not a real constant as you know it from .NET, it behaves more like a read-only variable. All the defined constants are stored within the current Context object and that’s where the define() function puts it internally. Here is the rough implementation of define() in C#:

// runtime is passing an instance of Context to every method
Context context = <ctx>; 
// storing the constant within a hashtable in Context
context.DefineConstant("DB_USER", "root");

First, let’s define the model of our JSON configuration (full source at GitHub), which we can load using the IConfiguration service (ref).

/// <summary>
/// WordPress configuration.
/// The configuration is loaded into WordPress before every request.
/// </summary>
public class WordPressConfig
{
  /// MySQL database user name.</summary>
  public string DbUser { get; set; } = "root";
}

Now let’s get back to our request life-cycle; before step 2, right before invoking index.php, we simulate the call to the define() function on the Context instance:

context.DefineConstant("DB_USER", config.DbUser);

Making it even better

The solution above allows us to have the configuration in the appsettings.json file. You may say that those values won’t change, they are actually application-wide constants, and it is unnecessary to redefine them for each request start. That’s where the power of the compiler comes into play.

The compiler doesn’t know if and when the wp-settings.php script will get included, so it cannot assume the constant definitions that will always
be called in the runtime. But we know that.

Our runtime distinguishes two kinds of global constants – (a) application-wide and (b) context-wide. This is for performance reasons, avoiding a redefinition of built-in constants with every request. Also, the compiler optimizes the usage of built-in constants by inlining them, avoiding a lookup into hashtables (remember, PHP has to have variables and constants in hashtables).

/// 
/// The class serves as a container for implicitly defined global PHP constants and PHP functions.
/// See https://docs.peachpie.io/api/Libraries-Architecture/ for more details about PeachPie library architecture.
/// 
public static class WpStandard
{
  /// <summary>MySQL database username</summary>
  public static string DB_USER { get; set; } = "root";
}

Built-in constants in PeachPie are defined in assemblies with the [assembly PhpExtension] attribute, as static class constant, a readonly static field or as a static property – the compiler and runtime understand this convention and treat them as PHP’s global constants (more at docs.peachpie.io/api/Libraries-Architecture/). The code above demonstrates a definition of a global PHP constant DB_USER in C# (full source here). This class is implemented in a C# project that is then referenced by the peachpied.wordpress project! As a result, every usage of such a constant is optimized by the compiler as a direct call to the property getter, avoiding the use of the hashtable and lightening the request startup.

Then, instead of redefining the constant at every request start within our Context, we only set the value of WpStandard.DB_USER once at the application start.

Other differences

As noted in the beginning, there are certain aspects in which the PHP and .NET runtimes are radically different. Sometimes we choose a certain compromise, other times we have to implement the behavior in our runtime. Let’s take a look at some of these cases.

Unicode

.NET has great support for Unicode. Actually all the textual values are natively UTF16 encoded. On the other hand, PHP’s string type is in fact a C-style, zero-terminated, 8-bit char array. This causes compatibility issues. The PeachPie runtime deals with it by providing both (ref), but if possible it keeps the values as UTF16 strings. There is the native System.String and also PeachPie’s PhpString object, which can contain a byte array – both are string in the context of PHP code. Certain BCL functions expect bytes, others expect strings, while the runtime performs conversions using the current local’s encoding when needed.

It is important to make use of native System.String as much as possible in order to provide interoperability, make use of immutable string values that are memory-friendly and in order to make use of the .NET BCL, which usually expects string. Also, it prevents common issues with PHP applications where characters get scrambled.

Static properties

Another difference – a static property (or even a class constant). In PHP, where we don’t have the concept of threads (since it comes from the Unix environment), the value of a static property is only static within the current request. It has to be reinitialized and can have a different value in another request.

<?php
class MyClass {
  static $myStaticProperty;
}
return MyClass::$myStaticProperty;

A .NET application has threads, so treating the PHP static property as CIL static would have undesirable side-effects. That’s why the static property is compiled into a synthesized class _statics, whose instance is created when needed and stored within Context. Accessing a static property is then actually a matter of retrieving the lazy instance of the class _statics within the current Context:

// compiled code:
return context.GetStatic<MyClass._statics>().myStaticProperty;

More at the compiled class reference: docs.peachpie.io/api/assembly/compiled-class/

Source-less deployment

The goal is to deploy only the resulting DLL file along with the CSS, JS and image files onto the server. However, the issue is that WordPress programmatically checks for some source files to be present in the target location, including the wp-config.php file and entry points of plugins and themes.

With the help of (our own) BCL, we can trick WordPress into thinking that the file is there when it actually isn’t. For example the built-in function
file_exists()
returns TRUE even though the file does not exist but was compiled into the executing assembly. The same goes for PHP’s file listing functions.

In some cases, WordPress reads the actual content of a plugin source file only in order to read a textual comment from it (it contains a plugin’s name, copyrights, etc.). We include those required files into the <Content> item group, so it gets deployed together with the DLL.

There are a few options of how to make it even better:

  • strip off the comment in the file, so there is no source code available on the server.
  • embed the file into a resource and implement some sort of virtual content provider (we already have it in ASP.NET Core). Then our BCL I/O functions would look into the content provider instead of the actual file system.

What about those CSS, JS and Image files? Since all of them are being requested by the web client only, and they are not necessarily read by the PHP code, we can embed them all into a resource in a single assembly file as well. This is just an idea we can implement in the future. ASP.NET Core has a middleware for static files that can operate on an abstract file provider, which can use the resource file instead of the actual file system.

Conclusion

Those are the issues, solutions and design decisions on how WordPress is compiled and running natively on the .NET runtime. Why do that in the first place, you ask?

  • We subsequently gain some performance by running and JITtering the application on .NET.
  • We take advantage of fast in-memory response caching.
  • We don’t have our precious source code on the webserver.
  • We can implement widgets, shortcodes, and portions of the web in C#, e.g. using Razor templates.
  • We can implement plugins, e.g. authentication, in C#.
  • We can have a WordPress blog as a part of our C# MVC app.
  • We don’t need PHP installed on the webserver if we prefer the .NET runtime.
Posted on May 2, 2020, in category Information, tags: , , ,