Please note that I am no longer working on this library - you may want to consider using something else for new projects. :)

ASIHTTPRequest documentation

Last updated: 15th May 2011 (v1.8.1)

ASIWebPageRequest

The ASIWebPageRequest class included with ASIHTTPRequest lets you download complete webpages, including external resources like images and stylesheets.

Features

  • Load complete web pages by creating a single request
  • Cache and reuse web pages and external resources of any size, indefinitely
  • Implement offline browsing for complex webpages
  • Store a complete web page in a single string, or with each external resource in a separate file referenced from the page

Using ASIWebPageRequest in your own projects

  1. Add ASIWebPageRequest.h and ASIWebPageRequest.m to your project
  2. ASIWebPageRequest requires libxml. You’ll need to add the libxml dylib to the Linked Libraries in your target settings (see the setup instructions for more information on how to do this).
  3. You may also need to adjust your Header Search Paths to point at the libxml headers - this seemed to work for me:
${SDK_DIR}/usr/include/libxml2

See the sample projects that come with ASIWebPageRequest for working examples for both Mac OS X and iOS.

How it works

ASIWebPageRequest is a subclass of ASIHTTPRequest, and can be started asynchronously just like a regular ASIHTTPRequest. When an ASIWebPageRequest completes its download, it looks at the response headers to see if the content is HTML/XHTML or CSS. If it is, it will parse the content to find external resources and download those.

Supported external resources

The following external resources are supported by ASIWebPageRequest and will be downloaded when they are encountered in HTML / CSS:

  • Images referenced with the HTML image tag (<img src="/path/to/image.jpg">)
  • Images referenced in CSS declarations (background: url('/path/to/image.png')) - inline CSS and CSS inside <style> tags are supported. Additionally, ASIWebPageRequest will download images referenced in external stylesheets
  • External stylesheets, including both <link rel="stylesheet" src="/path/to/styles.css"> and stylesheets imported by other stylesheets
  • External javascript files. Note that these are not parsed, and resource URLs referenced from javascript will not be modified or downloaded
  • Content in Frames and iFrames
  • HTML 5 Audio and video content referenced using a <source> tag. Note that ASIWebPageRequest explicitly disables downloading files with certain extensions (.ogg, .ogv, .webm) because these formats are not supported on iOS. It should be easy to remove this check by modifying ASIWebPageRequest if you need to download these formats.
  • HTML 5 Audio content referenced from inside an audio tag (<audio src="/path/to/audio.mp3">)
  • HTML 5 Video poster images

Note that ASIWebPageRequest parses external resources recursively - any external resources that are web pages or stylesheets will be parsed to find new external resource URLs. So, if a web page includes an iframe that references an external stylesheet that imports another stylesheet that references an image, all those resources will be downloaded.

Using ASIWebPageRequest to load data into a WebView / UIWebView

This example shows the simplest way to use ASIWebPageRequest. By setting the request’s urlReplacementMode, we are telling ASIWebPageRequest to replace external URLs with the actual data they’re pointing to. This means that when the request completes, the file at the request’s downloadDestinationPath will contain the complete web page, including all supported external resources, as a single string.

Note that the example code below is for iOS, but it should be easy to adapt to Mac OS X - see the Mac example project.

- (IBAction)loadURL:(NSURL *)url
{
   // Assume request is a property of our controller
   // First, we'll cancel any in-progress page load
   [[self request] setDelegate:nil];
   [[self request] cancel];
 
   [self setRequest:[ASIWebPageRequest requestWithURL:url]];
   [[self request] setDelegate:self];
   [[self request] setDidFailSelector:@selector(webPageFetchFailed:)];
   [[self request] setDidFinishSelector:@selector(webPageFetchSucceeded:)];
 
   // Tell the request to embed external resources directly in the page
   [[self request] setUrlReplacementMode:ASIReplaceExternalResourcesWithData];
 
   // It is strongly recommended you use a download cache with ASIWebPageRequest
   // When using a cache, external resources are automatically stored in the cache
   // and can be pulled from the cache on subsequent page loads
   [[self request] setDownloadCache:[ASIDownloadCache sharedCache]];
 
   // Ask the download cache for a place to store the cached data
   // This is the most efficient way for an ASIWebPageRequest to store a web page
   [[self request] setDownloadDestinationPath:
      [[ASIDownloadCache sharedCache] pathToStoreCachedResponseDataForRequest:[self request]]];
 
   [[self request] startAsynchronous];
}
 
- (void)webPageFetchFailed:(ASIHTTPRequest *)theRequest
{
   // Obviously you should handle the error properly...
   NSLog(@"%@",[theRequest error]);
}
 
- (void)webPageFetchSucceeded:(ASIHTTPRequest *)theRequest
{
   NSString *response = [NSString stringWithContentsOfFile:
      [theRequest downloadDestinationPath] encoding:[theRequest responseEncoding] error:nil];
   // Note we're setting the baseURL to the url of the page we downloaded. This is important!
   [webView loadHTMLString:response baseURL:[request url]];
}

You should always use ASIWebPageRequest in conjunction with a download cache, as this makes it possible for requests to reuse previously downloaded resources. Also, make sure you set a downloadDestinationPath for your ASIWebPageRequests to prevent large web pages eating up all your available RAM.

A more complex example

Rather than replacing external URLs with the data they represent, we can tell ASIWebPageRequest to replace external URLs with local file:// URLs pointing at data on disk. To do this, we set the request’s urlReplacementMode to ASIReplaceExternalResourcesWithLocalURLs.

In this example, we’re also trapping any clicks on links in the WebView. When a user taps on a link to another page, we stop the load, then load the page using ASIWebPageRequest.

- (IBAction)loadURL:(NSURL *)url
{
   // Again, make sure we cancel any in-progress page load first
   [[self request] setDelegate:nil];
   [[self request] cancel];
 
   [self setRequest:[ASIWebPageRequest requestWithURL:url]];
   [[self request] setDelegate:self];
   [[self request] setDidFailSelector:@selector(webPageFetchFailed:)];
   [[self request] setDidFinishSelector:@selector(webPageFetchSucceeded:)];
 
   // Tell the request to replace urls in this page with local urls
   [[self request] setUrlReplacementMode:ASIReplaceExternalResourcesWithLocalURLs];
 
   // As before, tell the request to use our download cache
   [[self request] setDownloadCache:[ASIDownloadCache sharedCache]];
   [[self request] setDownloadDestinationPath:
      [[ASIDownloadCache sharedCache] pathToStoreCachedResponseDataForRequest:[self request]]];
 
   [[self request] startAsynchronous];
}
 
- (void)webPageFetchFailed:(ASIHTTPRequest *)theRequest
{
   // Make sure you handle this error properly...
   NSLog(@"%@",[theRequest error]);
}
 
- (void)webPageFetchSucceeded:(ASIHTTPRequest *)theRequest
{
   // The page has been downloaded with all external resources. Now, we'll load it into our UIWebView.
   // This time, we're telling our web view to load the file on disk directly.
   [webView loadRequest:
      [NSURLRequest requestWithURL:[NSURL fileURLWithPath:[theRequest downloadDestinationPath]]]];
 
}
 
// We've set our controller to be the delegate of our web view
// When a user clicks on a link, we'll handle loading with ASIWebPageRequest
- (BOOL)webView:(UIWebView *)webView shouldStartLoadWithRequest:(NSURLRequest *)theRequest 
   navigationType:(UIWebViewNavigationType)navigationType
{
	if (navigationType == UIWebViewNavigationTypeLinkClicked) {
		[self loadURL:[theRequest URL]];
		return NO;
	}
	return YES;
}
 

Note that when using ASIReplaceExternalResourcesWithLocalURLs, ASIWebPageRequest will also modify all relative hyperlinks (<a href="">) to make them absolute. So, a link to ‘/news/’ might become ‘http://mywebsite.com/news/’.

Customising ASIWebPageRequest

ASIWebPageRequest is quite a new class, and it doesn’t provide many options to control how content is replaced, so if you need different behaviour, you’ll need to subclass or modify ASIWebPageRequest directly.

Customising which external content is downloaded

ASIWebPageRequest uses an XPath query to find external resources in HTML content. If you want to add support for other resources, or prevent some of the above content being downloaded, the best place to start is by modifying the XPath query at the top of ASIWebPageRequest.m.

The XPath query matches attributes that point at external URLs. Once the query has been performed, the method readResourceURLs loops over the matches in the XML, and adds them to list of files to fetch by calling addURLToFetch:. If you need to make decisions about whether to download an external resource in a more complex fashion that XPath will allow, modify readResourceURLs. If you want to substitute a different external resource located remotely for a particular URL or set of URLs globally, modify addURLToFetch:.

Customising replacement content

Modify (or override in a subclass) contentForExternalURL: to insert different content into webpages and stylesheets in place of the URL. You can also modify updateResourceURLs to control content on a per tag/attribute basis.

Limitations of ASIWebPageRequest

As noted above, ASIWebPageRequest is an experimental class, and you should not use it to replace the default loading mechanism for WebView / UIWebView for loading user-specified web pages. It is suitable for loading and caching web content that you control, or content you have tested with ASIWebPageRequest.

Some important things to bear in mind:

  • Currently, only certain external resources are downloaded. A modern web page may contain many more that it doesn’t find (plugins etc). Additionally, many scripts fetch more data when they run. This means that there’s no guarantee all required external resources will be cached. However, on a typical page, ASIWebPageRequest should at least cut down on the loading times if most images are pulled from a previously cached version.
  • ASIWebPageRequest modifies HTML and CSS, so the end result may appear different from the original in some cases (in particular, pages with malformed CSS may break).
  • When using ASIReplaceExternalResourcesWithLocalURLs, scripts that rely on relative URLs won’t work.
  • Loading uncached content into a WebView / UIWebView will be slower with ASIWebPageRequest than asking the webview to load the content.
  • ASIWebPageRequests do not currently support progress tracking for an entire request with its external resources. However, progress updates from external resource requests are passed on to your delegate, so with a bit of work it may be possible to implement this yourself.
  • ASIWebPageRequests cannot currently be run as synchronous requests