Azure app services have a lot of very cool features and abstract many of the responsibilities more traditional Virtual Machines would bestow upon it's owners. However, one of the things you still have to concern yourself with is the disk spaced used by your application. Normally an application in Sitecore is not going to be that huge since we're not counting any of those OS level files and directories and most of your media assets are going to be put into blob storage or a CDN somewhere else. However, if you run your App Service for long enough you may eventually see the storage space usage increase to a point where it becomes dangerous. Similar to how a Memory leak grabs your memory resources without letting them go and balloons, in this case we're talking about a Filesystem leak.
SYMPTOMS:
- App Service backups begin to fail
- You notice the disk space usage of your app services continually growing past what you'd expect normally
- You experience "System.IOException: There is not enough space on the disk" in your App Services
- You notice log files older than 30 days and you're using default Cleanup Agent configurations.
THE BREAKDOWN:
Normally a Sitecore application will write it's local log files to <webroot>\App_Data\Logs\ and create the log files directly in there with a date time stamp but when we look at Microsoft App Services the behavior is a bit different.
So let's step back a minute and talk about app services. Azure provides these as a managed platform where developers can just worry about the application and forget about the underlying infrastructure. To that effect, the file structure of app services is mostly provided through a UNC share and you also have some access to the local file-system of the host but are mostly discouraged from using this as it's transient and if you app migrates to a new host then your locally saved data goes bye bye. Here's some great information from Microsoft going into much more details around how App Services use the file system ( https://docs.microsoft.com/en-us/azure/app-service/operating-system-functionality#file-access ).
With that out of the way let's consider Sitecore logs, which are only really "unique" by the date time stamps they use. If we were to say have multiple instances attempting to create a Sitecore log file just using the date well you'd quickly run into file conflicts as they're trying to create two unique versions of a file named exactly the same thing in the same folder/directory. In windows this is not possible, files need to be named uniquely. Even if we look at Sitecore's log appender (via showconfig) we see that it doesn't seem to be accounting for this scenario:
<appender name="LogFileAppender" type="Sitecore.Cloud.ApplicationInsights.Logging.Log4NetAppender, Sitecore.Cloud.ApplicationInsights" patch:source="Sitecore.Cloud.ApplicationInsights.config">
<file value="D:\\home\\site\\wwwroot\\App_Data/logs/log.{date}.{time}.txt"/>
<appendToFile value="true"/>
<rollingStyle value="Size"/>
<maxSizeRollBackups value="-1"/>
<maximumFileSize value="10MB"/>
<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%4t %d{ABSOLUTE} %-5p %m%n"/>
</layout>
<encoding value="utf-8"/>
<category value="log" patch:source="Sitecore.Cloud.ApplicationInsights.config"/>
</appender>
Despite this fact, Microsoft seems to have accounted for this particular issue and it resolves it by simply creating an instance specific folder and then placing those log files inside. So instead of your logs being directly in the logs folder, you'll instead see a sub-folder containing a string of letters and numbers, pretty slick Microsoft.
Fixed by Microsoft right? Well the problem comes when Sitecore is trying to clean up these log files with the Cleanup Agent. By default this agent is not setup to recurse within the log directly and is only going to cleanup that base log folder. So in essence, Microsoft's clever solution has introduced a new problem.
THE FIX:
Luckily the fix for all of this is quite a simple one, it just requires a Sitecore patch include which performs cleanup operations recursively. Here's a simple example that would clean up log files older than 30 days:
<?xml version="1.0" encoding="utf-8" ?>
<!--
Purpose: This is a patch file to extend what files the CleanupAgent removes for PaaS environments as the defaults are not sufficient.
-->
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:role="http://www.sitecore.net/xmlconfig/role/"
xmlns:env="http://www.sitecore.net/xmlconfig/env/" xmlns:set="http://www.sitecore.net/xmlconfig/set/" >
<sitecore>
<scheduling>
<agent>
<!-- Specifies files to be cleaned up.
If rolling="true", [minCount] and [maxCount] will be ignored.
[minAge] and [maxAge] must be specified as [days.]hh:mm:ss. The default value
of [minAge] is 30 minutes.
[strategy]: number of files within hour, day, week, month, year
[recursive=true|false]: descend folders?
-->
<files hint="raw:AddCommand">
<remove folder="$(dataFolder)/logs" pattern="*log.*.txt*" maxAge="30.00:00:00" recursive="true" />
</files>
</agent>
</scheduling>
</sitecore>
</configuration>
With a patch like that in place it should start cleaning up your log files normally again and log file bloat should become a thing of the past.
Comments
Post a Comment