So you've gone through the process of building a brand new Sitecore environment totally pristine. You've installed your favorite APM (Application Performance Monitor) tool and made sure it's working. You're about ready to let your development team rip into this work of digital craftsmanship when you notice it; a flood of exceptions across multiple application services all seemingly tied together.
Has it clicked yet? Azure has a feature built in to make a request ever so often to the web root of the App Service in order to keep it awake and prevent it from going to sleep. Sitecore's install enables this feature as it doesn't want some backend component going to bed and slowing down the whole rest of your application. The end result is a get request to the app services web root "/" EVERY 5 MINUTES to keep it alive, the xConnect services then throw back that 403.14 error as that's not a valid query to the API and directory browsing is not enabled. Funny enough the bad request is enough to keep the app service active if only to respond to the client by saying they've made a bad request.
SYMPTOMS:
- Clean environment without any custom configs/code
- Exception being reported every 5 minutes, almost like clockwork
- The same 403.14 error appears across all of the xConnect Services (xConnect Collection, xConnect Search, xConnect Reference Data, Marketing Automation Operations, Marketing Automation Reporting, Cortex Processing, Cortex Reporting).
- 100% - 50% error rate reported by the APM (New Relic in my case)
- No Sitecore errors capture in Application Insights
THE BREAKDOWN:
Relax, take a moment to breath in ... then breath out, everything will be OK. Let's start with the basics.
The error code 403.14 resolves to "Forbidden The Web server is configured to not list the contents of this directory" basically it coincides with a request to a path such as "/" which in turn would be handled by some handler if it's setup or if not and directory browsing is enabled you'd get a list of what files and folders are in there. So as you may be familiar with a 400 series error is generally a client side error or in other words the server is saying hey that's now how you ask nicely, try again. The site root is not a valid query path for the xConnect services since these are APIs and are expecting properly structured queries ( https://doc.sitecore.com/developers/93/sitecore-experience-platform/en/xconnect-and-the-xdb.html ) it's not a web site where you'd navigate through some GUI and click around you have to be specific in what you're asking for. To sum it all up "/" shouldn't be requested on xConnect Services with valid requests and attempts at doing so will result in our 403.14 error.
So we've found out the why, but who exactly is making all of these bad requests and every 5 minutes to boot. Well that answer lies in understanding some of Azure's App Service configurations. This document goes into it a fair bit, https://docs.microsoft.com/en-us/azure/app-service/configure-common , but the feature we want to focus on is called "Always On". The documentation reads thusly:
Always On: Keeps the app loaded even when there's no traffic. It's required for continuous WebJobs or for WebJobs that are triggered using a CRON expression.
Oh and check that out, there's a note about it below:
Note: With the Always On feature, the front end load balancer sends a request to the application root. This application endpoint of the App Service can't be configured.
THE FIX:
So it's not as simple as it seems. We can't "fix" it from Azure's end because as their documentation states, "This application endpoint of the App Service can't be configured". Additionally we don't want to fix it from the Sitecore side as enabling directory browsing for your App Services is a great way to expose what files and folders are in your web root which can be tedious to manage properly and disastrous if not managed properly. We do not want to enable directory browsing just for Azure's Always On feature.
Surprisingly the best way to get these errors go to away is just ignore them. Since it's a known problem and we know the exact error code we're trying to make disappear, most APM solutions allow you to filter out exceptions of a certain type. Sitecore already is filtering out these errors as you'll find nothing in Application Insights about this so why not remove this noise from our New Relic as well. It's easy enough to do as well https://docs.newrelic.com/docs/agents/net-agent/configuration/net-agent-configuration# .
If you're using server-side configuration you can't be as granular but you can just add "403" to the "Error Collection" section in the "HTTP code" field. I like to snuggle it right between the default 401 and 404 error codes.
Alternatively if you're using Application Settings with a standard New Relic config you just need to add another child to the "ignoreStatusCodes" node like this:
Just like that all the errors will disappear. Granted we didn't exactly solve a major issue here, but what's important is that now you know what the error is, why it occurs, and what you can do about it.
Comments
Post a Comment