Skip to main content

Sitecore App Services Out of Disk Space or: The case of the ever growing logs!

Azure app services have a lot of very cool features and abstract many of the responsibilities more traditional Virtual Machines would bestow upon it's owners.  However, one of the things you still have to concern yourself with is the disk spaced used by your application.  Normally an application in Sitecore is not going to be that huge since we're not counting any of those OS level files and directories and most of your media assets are going to be put into blob storage or a CDN somewhere else.  However, if you run your App Service for long enough you may eventually see the storage space usage increase to a point where it becomes dangerous.  Similar to how a Memory leak grabs your memory resources without letting them go and balloons, in this case we're talking about a Filesystem leak.

SYMPTOMS:
- App Service backups begin to fail
- You notice the disk space usage of your app services continually growing past what you'd expect normally
- You experience "System.IOException: There is not enough space on the disk" in your App Services
- You notice log files older than 30 days and you're using default Cleanup Agent configurations.

THE BREAKDOWN:
Normally a Sitecore application will write it's local log files to <webroot>\App_Data\Logs\ and create the log files directly in there with a date time stamp but when we look at Microsoft App Services the behavior is a bit different.

So let's step back a minute and talk about app services.  Azure provides these as a managed platform where developers can just worry about the application and forget about the underlying infrastructure.  To that effect, the file structure of app services is mostly provided through a UNC share and you also have some access to the local file-system of the host but are mostly discouraged from using this as it's transient and if you app migrates to a new host then your locally saved data goes bye bye.  Here's some great information from Microsoft going into much more details around how App Services use the file system ( https://docs.microsoft.com/en-us/azure/app-service/operating-system-functionality#file-access ).

With that out of the way let's consider Sitecore logs, which are only really "unique" by the date time stamps they use.  If we were to say have multiple instances attempting to create a Sitecore log file just using the date well you'd quickly run into file conflicts as they're trying to create two unique versions of a file named exactly the same thing in the same folder/directory.  In windows this is not possible, files need to be named uniquely.  Even if we look at Sitecore's log appender (via showconfig) we see that it doesn't seem to be accounting for this scenario:

<appender name="LogFileAppender" type="Sitecore.Cloud.ApplicationInsights.Logging.Log4NetAppender, Sitecore.Cloud.ApplicationInsights" patch:source="Sitecore.Cloud.ApplicationInsights.config">
<file value="D:\\home\\site\\wwwroot\\App_Data/logs/log.{date}.{time}.txt"/>
<appendToFile value="true"/>
<rollingStyle value="Size"/>
<maxSizeRollBackups value="-1"/>
<maximumFileSize value="10MB"/>
<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%4t %d{ABSOLUTE} %-5p %m%n"/>
</layout>
<encoding value="utf-8"/>
<category value="log" patch:source="Sitecore.Cloud.ApplicationInsights.config"/>
</appender>

Despite this fact, Microsoft seems to have accounted for this particular issue and it resolves it by simply creating an instance specific folder and then placing those log files inside.  So instead of your logs being directly in the logs folder, you'll instead see a sub-folder containing a string of letters and numbers, pretty slick Microsoft.

Fixed by Microsoft right?  Well the problem comes when Sitecore is trying to clean up these log files with the Cleanup Agent.  By default this agent is not setup to recurse within the log directly and is only going to cleanup that base log folder.  So in essence, Microsoft's clever solution has introduced a new problem.

THE FIX:

Luckily the fix for all of this is quite a simple one, it just requires a Sitecore patch include which performs cleanup operations recursively.  Here's a simple example that would clean up log files older than 30 days:

<?xml version="1.0" encoding="utf-8" ?>
<!--
 
Purpose: This is a patch file to extend what files the CleanupAgent removes for PaaS environments as the defaults are not sufficient.
          
-->
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:role="http://www.sitecore.net/xmlconfig/role/"
               xmlns:env="http://www.sitecore.net/xmlconfig/env/" xmlns:set="http://www.sitecore.net/xmlconfig/set/" >
  <sitecore>
    <scheduling>
      <agent>
      <!-- Specifies files to be cleaned up.
              If rolling="true", [minCount] and [maxCount] will be ignored.
              [minAge] and [maxAge] must be specified as [days.]hh:mm:ss. The default value
              of [minAge] is 30 minutes.
              [strategy]: number of files within hour, day, week, month, year
              [recursive=true|false]: descend folders?
          -->
<files hint="raw:AddCommand">
<remove folder="$(dataFolder)/logs" pattern="*log.*.txt*" maxAge="30.00:00:00" recursive="true" />
</files>
</agent>
    </scheduling>
  </sitecore>
</configuration>

With a patch like that in place it should start cleaning up your log files normally again and log file bloat should become a thing of the past. 

Comments

Popular posts from this blog

Errors Out The Gate or: Why Does My Newly Built Sitecore PaaS Environment Have So Many Errors!

So you've gone through the process of building a brand new Sitecore environment totally pristine.  You've installed your favorite APM (Application Performance Monitor) tool and made sure it's working.  You're about ready to let your development team rip into this work of digital craftsmanship when you notice it; a flood of exceptions across multiple application services all seemingly tied together. SYMPTOMS: - Clean environment without any custom configs/code - Exception being reported every 5 minutes, almost like clockwork - The same 403.14 error appears across all of the xConnect Services (xConnect Collection, xConnect Search, xConnect Reference Data, Marketing Automation Operations, Marketing Automation Reporting, Cortex Processing, Cortex Reporting). - 100% - 50% error rate reported by the APM (New Relic in my case) - No Sitecore errors capture in Application Insights THE BREAKDOWN: Relax, take a moment to breath in ... then breath out, everything will be OK.  Let...

An Introduction or: A Definitive Guide to Life, Love, and Happiness

Really!!?? You're telling me that the author of this blog about working from home is going to unravel the meaning of life for me?  I'm skeptical... That's the thought that may be running through your head at the moment and if it is then you're completely justified because I'm not.  Sorry for the click-bait but it's the internet and everyone is fighting over a moment of your time, so as the saying goes "GO BIG OR GO HOME"! To that point though, it is true that your attention is always being sucked into one thing or another that you need to fix right now, or buy right now, or read right now so I'm offering you brief moment of respite.  Just relax, take a few moments and focus yourself because I don't want anything from you.  You can just go about doing what you want for a bit, I give you permission, before you're sucked back into the world of responsibilities and demands.  This is simply my "Hello World" or my "Watson, Co...