Article
02/01/2019

June 2016

Volume 31 Number 6

[Azure App Services]

Using Azure App Services to Convert a Web Page to PDF

Converting aWeb page to a PDF is nothing new, but my goal—to place a link on my Web site that gave visitors a simple way to convert a specific page to a PDF document in real time—turned out to be somewhat complicated. There are numerous Web sites and open source binaries that let you do this, but I wasn’t ever able to connect all the dots and get the output I wanted in the way that I wanted it.

The best, or at least my favorite, Web page-to-PDF converter is the open source program called wkhtmltopdf (wkhtmltopdf.org), which uses the command line, as shown in Figure 1.

Figure 1 Running the wkhtmltopdf Converter from the Console

However, running a program from a command line is a long way from real-time conversion with a button on a Web page.

I worked on different portions of this solution over the past months, but the execution of the wkhtmltopdf process stubbornly prevented me from achieving my goal. The question that remained unanswered was: “How can I get Microsoft Azure App Service Web Apps to spawn this process to create the PDF?” App Service Web Apps runs within a sandbox and I knew from the start that I couldn’t do that—there was zero possibility of having a request sent from a client machine starting and running a process on the server. Having worked on the IIS support team for many years, I knew that making this happen even on a standalone version of IIS would require configurations that would make security analysts lose sleep. Then I thought of WebJobs.

WebJobs are made for exactly this situation because they can run executables either continuously or when triggered from an external source; for example, manually from the Azure SDK or by using an Azure Scheduler, CRON or the Azure WebJob API (bit.ly/1SD9gVJ). And, bang, there was the answer. I could call the wkhtmltopdf program from my App Service Web App using the WebJob API. The other components of the solution had already been worked out; I finally had the last piece of the puzzle, as Figure 2 shows.

Figure 2 The Complete Solution

The example code contains an ASP.NET Web site with an index page that allows a user to enter a URL, send that Web page to get converted to a PDF and then download the PDF to a client device. It takes very little effort to dynamically set this URL to the current page and have the button send the page to the WebJob API for conversion and download. The next few sections of this article discuss the technologies used to create the solution, and explain how you can build and utilize them.

HTML-to-PDF Converter Overview

I’ve used numerous technologies to create the real-time HTML-to-PDF App Service Web App solution. The table in Figure 3 presents a brief description of these technologies, and I describe them in more detail in the sections that follow.

Figure 3 Technologies Used in the Solution

Technology	Brief description
Azure App Service Web App (S2 Plan)	Front end that hosts SignalR code
App Service Authentication and Authorization	Confirms client identity
Azure Storage	Stores the PDF document
Azure WebJob	Converts HTML to PDF, uploads PDF to Azure Storage
Azure WebJob API	An interface for triggering a WebJob
ASP.NET SignalR	Manages response from server back to client

Each section includes a functional and technical description of a technology, plus the details of coding and/or configuration requirements. I’ve ordered the different portions of the solution as I created them, but it could be done using a number of different sequences. The technical goal is to pass a URL to the App Service Web App and get back a PDF. Let’s get started.

Azure App Service Web App

Azure App Services lets you work with a variety of app types: Web, Mobile, Logic (preview) and API. All App Services function in the same way in the back end, with each having additional configurable capabilities on the front end. By back end I mean that App Services run in different service plans (Free, Shared, Basic, Standard and Premium) and instance sizes (F1-P4); see bit.ly/1CVtRec for more details. The plans provide features such as deployment slots, disk-space limits, auto-scaling, maximum number of instances and so forth, and the instance sizes describe the number of dedicated CPUs, as well as the memory per App Service Plan (ASP), which is equivalent to a virtual machine (VM). And for the front end, the features for a given App Service provide specially designed capabilities for a particular App Service type to get your application deployed, configured and running in the shortest amount of time.

For the HTML-to-PDF converter, I’ll use an S2 Azure App Service Web App because I don’t need any of the features provided by the other App Service types.

To start, create the Web App within the Azure portal by selecting New | Web + Mobile | Web App, then provide the App name, Subscription, Resource Groups and App Service Plan and press the Create button. Once you’ve created the app, you use this location to deploy the source code contained in the downloadable Visual Studio 2015 solution, convertHTMLtoPDF. Deployment details are provided at the end of the article; you’ll need to make some changes to get the code to work with your particular Web App and WebJob.

Web apps, Mobile apps and API apps include a federated identity-based feature for setting up authentication and authorization with Azure Active Directory and other identity providers like Facebook, Microsoft Live, Twitter and so on, as discussed in the next section.

App Service Authentication and Authorization

I decided to configure the App Service Authentication / Authorization feature for my Web app because it fit nicely into the SignalR scheme, in which a display name or the identity of the client is desirable. SignalR creates a ConnectionId for each client, but it’s friendlier and more personal to use the real name of a visitor when sending or posting messages. This can be done by capturing it from the callback of the Authentication feature and then displaying it using the SignalR code. As I implemented the Microsoft Account identity provider (IDP), the name of the authenticated visitor is returned in the X-MS-CLIENT-PRINCIPAL-NAME request header. The identity name is also accessible from the System.Security.Principle.IPrinciple.Identity.Name property.

Getting the Authentication / Authorization feature to work requires no code changes on the app back end and you can simply follow the instructions at bit.ly/1MQZZdF. The implementation requires only that you enable App Service Authentication, accessible from the Settings blade for the given App Service, and configure one or more of the Authentication Providers, as shown in Figure 4.

Figure 4 The App Service Authentication / Authorization Feature

The feature offers numerous choices for an “Action to take when request is not authorized.” For example, in order to access the HTML-to-PDF Web app, you must have a Microsoft Account and be authenticated by the identity provider; no Web app code is executed before this IDP authentication takes place. In this case, pre-authentication is required because I selected “Log in with Microsoft Account” from the dropdown. All App Service resources require such authentication once an action is applied. You can configure the authentication feature so that visitors can access a login page or other endpoints of the Azure-hosted App Service, which is accomplished by selecting the Allow request (no action) item from the dropdown. However, it would then be up to the application code to restrict access to protected pages. This more granular approach is commonly achieved by checking the Context.User.Identity.IsAuthenticated Boolean before executing the code within the page.

The last component of the no-code, real-time HTML-to-PDF conversion solution is the creation and configuration of the Azure Storage account and container.

Azure Storage

The Azure Storage container is the location where the PDF file is stored for download. If the storage container is made public, anyone can access the files hosted in the container by referencing the filename using a URL such as https://{storage-account}.blob.core.windows.net/{container-name}/{filename.pdf}. Inserting, updating or removing files from the container requires an access key when performed by code. Doing so via the Azure Management Portal or from within Visual Studio can be restricted using role-based access control (RBAC) or simply by disallowing user access to the Azure subscription.

To create the storage account, select New | Data + Storage and the storage account. The Name attribute becomes the storage account where the container is created, and the first part of the URL: https//{storage-account}.blob.core.windows.net. The Deployment model attribute lets you choose either Resource manager or Classic. Unless you have existing applications deployed into a classic virtual network (VNET), it’s recommended you use Resource manager for all new development activity. The Azure Resource Manager (ARM) is a more declarative approach that uses templates and scripts. In contrast, interfacing with the Classic model, commonly referred to as Azure Service Manager (ASM), is generally performed using code and libraries.

When deciding whether to choose Standard or Premium Performance, you’ll want to consider cost and throughput. Standard is the most cost-effective and is optimal for applications that store infrequently accessed bulk data. Premium storage is backed by solid-state drives (SSD) that offer optimal performance for virtual machines with intensive I/O requirements.

The Replication attribute has numerous options—Local, Zone, Global and Read-Access Global—each providing a greater level of redundency and accessibility. I used the default settings for the HTML-to-PDF solution, and selected the same Subscription, Resource group and Location as for the Web app created previously.

Finally, after successfully creating the storage account, select Blobs services from the Storage account General blade, and then add the container.

The Access Type on the New container blade can be either Private (an access key is required for all operations), Blob (allows public read access) or Container (allows public read and list access).

That’s it, that’s all the Azure configuration required for this solution. Let’s jump into some C# code now to see how to get this real-time HTML-to-PDF conversion to work.

Azure WebJob

The Azure WebJob feature supports running a script or executable file in a continuous, triggered or scheduled manner (bit.ly/1Og9P95). Don’t confuse this with a Windows Service; think of it instead as a task or batch job that needs to run at certain times or when a certain event happens. In this case, using the real-time HTML-to-PDF conversion tool triggers the WebJob using the API. Alternatively, WebJobs can be started manually via Visual Studio or by using the Azure Scheduler Job Collections capability.

The Azure App Service platform determines whether the WebJob is triggered or continuous according to the path in which the WebJob is stored. If the WebJob is to be triggered, it should be deployed into the d:\home\site\wwwroot\app_data\jobs\triggered\{job name} directory; if it’s to be continuous, simply replace the triggered directory path with continuous. To deploy the WebJob, add the app_data\jobs\triggered\{job name} directory to a Web site project in Visual Studio, add the script or executable to it, similiar to what’s described at bit.ly/1Uczf8L, and publish it to the Azure App Service platform.

The WebJob I created performs two tasks, converting the page at a given Web address to a PDF file and uploading that PDF file to an Azure Storage container. I could have called wkhtmltopdf.exe directly using the WebJob API, but I would have had to make a second API call to then upload the file to storage and that would’ve involved a lot of complexity in managing the file and sending the result back to the client. Therefore, I created a console application called convertToPdf (which you can see in the source) that performs these two tasks, one after the other, and returns the location of the PDF file to the client that made the request.

To start wkhtmltopdf.exe and pass it the two required parameters—the Web address and PDF filename—I used System.Diagnostics.ProcessStartInfo, as shown in Figure 5.

Figure 5 Starting wkhtmltopdf.exe

static void Main(string[] args)
{
  var URL = args[0];
  var filename = args[1];
  try
  { 
    using (var p = new System.Diagnostics.Process())
    {
      var startInfo = new System.Diagnostics.ProcessStartInfo
      {
        FileName = "wkhtmltopdf.exe",
        Arguments = URL + " " + filename,
        UseShellExecute = false
      };
      p.StartInfo = startInfo;
      p.Start();
      p.WaitForExit();
      p.Close();
    }
  }
  catch (Exception ex) { WriteLine($"Something Happened: {ex.Message}"); }
}

The code creates an instance of the ProcessStartInfo class and sets the FileName, Arguments and other properties of the class. The method then starts the process identified by the FileName property, waits for it to complete and exits the process. By default, when the WebJob is uploaded to the Azure App Service environment, it’s copied by the platform to a temporary local directory—D:\local\temp\jobs\triggered\{job name}\*****\, where ***** is a dynamically generated directory name. This is also where the PDF file is physically stored prior to being uploaded to the Azure Storage container. Because the file is local only, it’s not persisted or accessible to any other instance of the Azure App Service Web App. If you’re running on mutiple instances, you might not see it in the local directory, but the Azure Storage container is globally accessible.

Once the PDF file is created, it needs to be uploaded to the Azure Storage container. You’ll find an excellent tutorial that describes in detail how to do this at bit.ly/1OAXIQ0. In summary, the capability to create, read, update and delete content in a container is controlled by two NuGet packages, the Microsoft Azure Configuration Manager library for .NET and the Microsoft Azure Storage Client library for .NET. Both packages are referenced from the convertToPdf WebJob console application. To install them, right-click the console application project and then Manage NuGet Packages. Then search for and install the libraries.

I used CloudConfigurationManager.GetSetting, which is part of the Microsoft Azure Configuration Manager library, to retrieve the storage connection string values for making the connection to the Azure Storage container. The values are the AccountName, which is the Azure Storage Account name (in this case, converthtmltopdf), not the container name, and the AccountKey, which is retrieved from the Storage Account blade by clicking on Settings | Access keys. Figure 6 shows how to upload the PDF file to the Azure Stroage container created previously.

Figure 6 Uploading the PDF to an Azure Storage Container

static void Main(string[] args)
{
  try
  {
    CloudStorageAccount storageAccount =
      CloudStorageAccount.Parse(
      CloudConfigurationManager.GetSetting("StorageConnectionString"));
    CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
    CloudBlobContainer container =
      blobClient.GetContainerReference("pdf");
    CloudBlockBlob blockBlob = container.GetBlockBlobReference(filename);
    using (var fileStream = System.IO.File.OpenRead(filename))
    {
      blockBlob.UploadFromStream(fileStream);
    }
  }
  catch (StorageException ex) { WriteLine($"StorageException: {ex.Message}"); }
  catch (Exception ex) { WriteLine($"Exception: {ex.Message}"); }
}

This configuration information is used as input for the CloudStorageAccount class, which is part of the Microsoft Azure Storage Client library. As an alternative to the CloudConfigurationManager to retrieve the StorageConnectionString from the App.config file, you can use System.Configuration.ConfigurationManager.AppSettings[“StorageConnectionString”].

I use an instance of the CloudStorageAccount class to create a CloudBlobClient, then use the blobClient to get a reference to the Azure Storage container with the GetContainerReference method. Then, using the GetBlockBlobReference method of the CloudBlobContainer class, I create a CloudBlockBlob containing the name of the file being uploaded. Both of the executable files, as previously noted, are located in the D:\local\temp\jobs\triggered\convertToPdf\***\ directory—the same place where the PDF file is stored and referenced. This is why no path to the filename is required, because the file is created in the same temporary directory as the executables. Last, I pass an instance of System.IO.FileStream using the System.IO.File.OpenRead method, and upload it to the container using the UploadFromStream method of the CloudBlockBlock class.

Once the code is complete and compiles, add both wkhtmltopdf.exe and the convertToPdf.exe to the \app_data\jobs\triggered\convertToPdf directory of the Visual Studio solution that will be published to the Azure App Service Web App. You can also publish just the WebJob files using an FTP tool, transferring the code directly to the Web site.

Now that the convertToPdf WebJob that creates and stores the PDF is complete, let’s look at how to call the WebJob from C# code using the HttpClient. After that, all that remains is creating a SignalR-based Azure App Service Web App front end to allow a visitor to send a URL to the WebJob and get back the URL to the PDF for download.

Azure WebJob API

I wrote an article about the Azure WebJob API (bit.ly/1SD9gVJ) in which I discussed how to call the API that triggers the WebJob. In essence, the WebJob API is a Web interface that executes a script or executable using the arguments passed in the URL.

Prior to creating the SignalR Hub that triggers the WebJob API, I created a simple console application consumer, shown in Figure 7, that calls the WebJob API. It’s included in the downloadable solution and is called convertToPDF-consumer. This console application simplified the coding, troubleshooting and testing as it removed the SignalR feature from the scenario.

Figure 7 The Simple Console Application Consumer

static async Task<string> ConvertToPDFWebJobAPIAsync(string Url)
{
  try
  {
    using (var client = new HttpClient())
    {
      client.BaseAddress = new Uri(
        "https://converthtmltopdf.scm.azurewebsites.net/");
      client.DefaultRequestHeaders.Accept.Clear();
      var userName = "your userName";
      var password = "your userPWD ";
      var encoding = new ASCIIEncoding();
      var authHeader =
        new AuthenticationHeaderValue("Basic",
          Convert.ToBase64String(
          encoding.GetBytes(string.Format($"{userName}:{password}"))));
      client.DefaultRequestHeaders.Authorization = authHeader;
      var content = new System.Net.Http.StringContent("");
      string filename = Guid.NewGuid().ToString("N").Substring(0, 8) + ".pdf";
      HttpResponseMessage response =
        await client.PostAsync(
        $"api/triggeredwebjobs/convertToPDF/run?arguments={Url} 
          {filename}", content);
      if (!response.IsSuccessStatusCode)
      {
        return $"Conversion for {Url} {filename} failed: " +
          DateTime.Now.ToString();
      }
      return $"{response.StatusCode.ToString()}:
        your PDF can be downloaded from here:";
    }
  }
  catch (Exception ex)  {  return ex.Message;  } }

Use the HttpClient method of System.Net.Http.HttpClient class to make the request. Then use the Source Control Management (SCM)-based Azure App Service Web App URL as the BaseAddress property for the request. As you might know, each Azure App Service Web App comes with an SCM URL (aka the KUDU console) that’s accessible using https://{appname}.scm.azurewebsites.net and is the URL used for calling the WebJob API. Appending /basicAuth to the end of the URL allows the calling client to authenticate using a basic challenge-and-response handshake. The userName and password are the Publish Profile credentials, which are downloadable from the Azure Management Portal by navigating to the Azure App Service Web App and selecting Get publish profile. Within the downloaded *.PublishSettings file you’ll find the userName and userPWD to use in the code. For simplicity, I hardcoded the userName and password into the application, but for the real world these should be placed in a safe location and retrieved from code, so they can be changed if desired by selecting the Reset publish profile in the Azure Management Portal. You don’t want to have to deploy updated code every time something changes.

Basic authentication requires associating an ASCII-encoded Base64 string of the userName and password to the Basic header in this format: Basic userName:password. Once the header value is created using the ASCIIEncoding method of the System.TextASCIIEncoding class, together with the ToBase64String method of the System.Convert class, add it to a new instance of the System.Net.Http.Headers.AuthenticationHeaderValue class along with the Basic header name. Use the instance of the System.Net.Http.HttpClient class created in the using statement to add the AuthenticationHeaderValue to the DefaultRequestHeaders.Authorization property of the System.Net.Http.Headers.HttpRequestHeaders class.

For the filename I used eight characters of a GUID using the Substring method of the String class, removing the dashes from the GUID. The GUID was created using the NewGuid method of the System.Guid class, by passing an “N” parameter to the ToString method of the Guid class. Finally, I asynchronously posted to the WebJob API using the PostAsync method of the System.Net.Http.HttpClient class, passing the URL and filename as arguments of the WebJob and awaiting its completion. When the process successfully completes, the URL to the Azure Storage container with the concatenated filename is displayed to the console, otherwise, a notification is sent that the creation of the PDF failed.

To see the status of the WebJob, go to the Azure Management Portal, navigate to the Azure App Service Web App running the WebJob, and select Settings | WebJobs. The WebJobs blade contains Name, Type, Status, and a very useful link to the WebJob execution logs. Click on the link to access a WebJob-specific KUDU console to see recent job runs, their status and a link to the actual log output of the WebJob, as shown in Figure 8. For example, if the WebJob is a console application, when you use the System.Console.WriteLine method to write the state of the execution to the console output window, this information is also written to the WebJob log and is viewable via the link from the Azure Management Portal.

Figure 8 Azure WebJob Output Log

Once this part was working as expected, all that remained was just a simple copy and paste into the SignalR solution, discussed next.

ASP.NET SignalR

ASP.NET SignalR is an open source library for ASP.NET developers to ease the sending of real-time notifications to browser-based, mobile or .NET client applications. The server to client remote procedure call (RPC) makes use of an API that calls JavaScript functions on the client from server-side .NET code. Prior to the existence of this technology, a common approach for achieving a similar solution was using an ASP.NET UpdatePanel control that would frequently refresh itself by making a request to the server to check if there was any change in the state of the data. This was much more a PULL approach, where the client triggered the request to the server instead of the server PUSHing the real-time data to the client as soon as it became available.

The client-side JavaScript code instantiates a Hub proxy, exposes the methods the server can trigger and identifies the server-side method to call (Send) when a click event occurs:

var pdf = $.connection.pDFHub;
pdf.client.broadcastMessage = function (userId, message) {};
pdf.client.individualMessage = function (userId, message) {};
$('#sendmessage').click(function () {
  pdf.server.send($('#displayname').val(), $('#message').val());
});

The name of the Hub proxy in the client-side JavaScript is the name of the Hub created to run on the server side; in this example, the Hub is named PDFHub, and it inherits from the Microsoft.AspNet.SignalR.Hub class. The two methods exposed by the client are broadcastMessage and individualMessage; each has a function with parameters that match the pattern of the server-side Send method, userId and message. The Send method is called on the server when the send button is clicked by a visitor on the Web app. The ConvertToPDFWebJobAsync method is a cut and paste of the console application created in the previous section that calls the Azure WebJob API to convert the provided Web page into a PDF file and load it into an Azure Storage container. Last, the server-side Send method uses an instance of the Microsoft.AspNet.SignalR.Hub.Clients property, which implements the IHubCallerConnectionContext interface. The Clients property is linked to the two client-side methods and provides the information sent from the server to the appropriate clients (see Figure 9).

Figure 9 The PDFHub Class

public class PDFHub : Hub
{
  public void Send(string userId, string message)
  {
    string name = Context.User.Identity.Name;
    string convertMessage = "no message yet";           
    Task.Run(async () =>
    {
      convertMessage = await ConvertToPDFWebJobAPIAsync(message);
    }).Wait();
    Clients.All.broadcastMessage(userId, "just converted: " + message +
      " to a pdf");
    Clients.Client(Context.ConnectionId).individualMessage(
      name, convertMessage);
  }
}

You might be wondering why I chose SignalR to consume the Azure WebJob API instead of just a simple ASP.NET Web Form or ASP.NET MVC Web application. It’s true, there are numerous ways to consume an API. For example, the downloadable code for this solution contains a console application that consumes the Azure WebJob API, so why do I use SignalR?

To answer that question, notice in Figure 9 that when the server has a message for the connected clients, two client-side methods are invoked. First, the broadcastMessage method notifies all the connected clients that a specific person converted a given URL to a PDF, but it doesn’t provide the link to the Azure Storage container and PDF file for download. The second client-side method is individualMessage, which sends the status of the HTML to PDF conversion and the link to the Azure Storage container with the concatenated PDF filename. The reason for using SignalR is to give the consuming clients a sense of social interaction by providing all the connected clients information about what’s happening on the Azure App Service Web app.

Recall that previously I mentioned the System.Security.Principle.IPrinciple.Identity.Name and noted how it made the Web app much more friendly because it could render a visitor’s name to the client instead of, for example, a unique but generic connectionId. The Context.User.Identity.Name property is used to set the name of the visitor, validated by their Microsoft Account, which adds to the social friendliness of the client.

Now all that needs to happen is to deploy the code (client code, server code and Azure WebJob) to the Azure App Service Web App platform using Visual Studio or an FTP application and test it out. Detailed instruction on how to deploy to an Azure App Service Web App can be found at bit.ly/1nXnhmB.

Software as a Service

While writing this article I started thinking about Software as a Service (SaaS) and whether this real-time HTML-to-PDF converter is a SaaS solution or simply an API-accessible app running in the cloud. I decided that the exposure of the Azure WebJob API, by the name itself, makes it an API and not SaaS. For my solution, the WebJob API is exposed through a URL and protected by Basic authentication. The API is available for other consumers to build on top of it or to add functionality to their applications, which is the definition of an API. However, as soon as there’s a consumer for the API, additional features are added around the consumed API that can be used by multiple online users, so it matches the definition of SaaS. Therefore, the Azure WebJob API alone is simply an API, while my ASP.NET SignalR client running on the Azure App Service Web App platform is a SaaS solution. Sure it’s not OneDrive, Office 365, CRM Dynamics Online or Hotmail, but if you need to convert a Web site to a PDF really quick, you know where to come.

Wrapping It up

This article explored three Azure features: an Azure App Service Web App; Azure Service Authentication and Authorization; and an Azure Storage account and container. These features are the platform that support the Azure WebJob, expose the Azure WebJob API and host the ASP.NET SignalR browser-based consumer. I discussed each of the features and the steps needed to configure them. I also described the code for the Azure WebJob, the code for calling the Azure WebJob API, and the ASP.NET SignalR client.

Benjamin Perkins is an escalation engineer at Microsoft and author of four books on C#, IIS, NHibernate and Microsoft Azure. He recently coauthored Beginning C# 6 Programming with Visual Studio 2015 (John Wiley & Sons). Reach him at benperk@microsoft.com.

Thanks to the following Microsoft technical expert for reviewing this article: Richard Marr
Richard Marr is a Senior Escalation Engineer at Microsoft. He has worked in Microsoft support organizations for 16 years, supporting IIS, ASP.Net and currently with Azure App Services.

Discuss this article in the MSDN Magazine forum