Continuing on from Part One.
The first big change with Project Parra will be the introduction of a modular, role based, architecture using dedicated controller/worker roles. Controllers (XML Brokers and Data Collectors, STA Servers, and possibly other functions) will not serve applications and so will not need to be configured in Terminal Services mode. As a result we can expect greater performance for any given server specification compared to today’s environment. Orestes didn’t share what the base platform for a XenApp Controller would be, but I would not be surprised to hear that it will be built on 2008 Server Core. At the same time with XenApp management functions assigned to dedicated Controllers, we should expect the XenApp ‘Workers’ to also see performance gains, 'Workers' will still require the IMA and XTC services but they won't participate in ZDC elections, and won't need to run the XML or STA service. Having a central core of dedicated controllers will also provide opportunities for greater reliability and should make troubleshooting significantly easier.
The next big change we should expect are technologies enabled through what is currently being demonstrated by Project Litoria. If you have seen the XenApp server power management demonstration, you’ve already seen one of the capabilities that the Litoria technologies introduce, but there’s much more to Litoria than power management
In order to fully understand Project Litoria, we first need to recap how XenApp assigns sessions today. One of the main strengths behind the initial Published Application paradigm was the separation of service (the application) from server, with Published Applications all the users need to know was that the application existed in the farm and MetaFrame/Presentation Server/XenApp does the hard work of connecting users to servers. The associated load balancing mechanisms ensures that connection requests are assigned to the least busy server capable of accepting the request. This works very well, in fact if anything it works too well. By ensuring that users always get the best possible performance, any defined service level should always be exceeded rather than just met. Exceeding an SLA might be good from the user’s perspective, but from an IT perspective it is needlessly expensive. In addition to cost considerations, assigning sessions to the least busy server introduces operational issues. The need to take servers out of service for maintenance is significantly complicated by a load balancing system that spreads load as evenly as possible, additional steps are needed to quiesce servers and allow connections to drain before maintenance can begin which clearly complicates life for IT operations.
In order to enable IT to meet an SLA more efficiently what is needed is a change to the load balancing mechanism so that connections are assigned not to the least busy server, but instead to the most appropriate server taking into consideration both the users needs and IT operations needs. To define what is meant most appropriate in this context, we need to look at session assignment from the perspective of both the user and IT operations. If say we have a pool of 10 servers hosting MS Office each capable of supporting 100 users, there should be no negative consequences to delivering 50 sessions from a single server and holding the load level on all the others at zero. If a single server is capable of hosting 100 concurrent users, it should be safe to say that there should be no user impact if all 50 user sessions are hosted on the one server (ignoring startup load biasing for the sake of simplicity at this point). At the same time the benefits to IT operations should be clear; running all 50 sessions on a single server would mean that we could take the idle servers off-line for maintenance or shut them down to save power.
And this is where Project Litoria comes in. Litoria introduces two fundamental changes to XenApp; the ability to manage application delivery against a service level, and a change to the way that sessions are assigned to servers to make the most appropriate use of resources. So with Litoria it should be possible to create an SLA that defines application availability in terms of the minimum number of servers that must be kept on line at any moment in time. Tie in XenApp Server Health Monitoring and Recovery so that bad servers can be automatically taken off-line and you have a full availability based SLA management solution. Of course in order to take advantage of this you need the ability to power servers on or off on demand. Cirix TV are showing a short
video of Sridhar Mullapudi demonstrating this capability, and I chatted with him to gather some more detail on how it works. By leveraging Litoria to ensure that sessions are assigned to the minimum number of servers necessary, whenever a server's session count drops to zero, Litoria can kick of a script to perform any necessary housekeeping tasks such as notifiying a network managment system that the server is going off-line and the power it down under OS control. Subsequent server startup is currently managed by Wake on LAN, but its reasonable to assume given what has been demonstated with Citrix's PowerSmart utility that direct support for server vendors' proprientory hardware management systems, such as HP's ILO system, will be provided in the future.
Looking beyond a simple availability based SLA, if application performance data is being gathered by the EdgeSight for Load Testing components that were bundled into XenApp 5.0 Feature pack 1 as
XenApp Application Performance Monitoring (XAMP) it should also be possible to offer true performance based SLAs. If XAMP determines that an application is running slowly, Litoria can assign more servers to active duty in an attempt to address the problem. This could be even more valuable than availability based SLA management. I’ve lost count of the number of times when an urgent hotfix, application update, or anti-malware signature has resulted in significantly degraded performance, and we had to manually add more servers to a silo to provide coverage until the problem had been addressed.
Finally for now, Zone Preference Fail-Over (a model I was never comfortable with) is going, to be replaced by a more general preferencing service that will direct connections to preferred resource and fail back to less preferred resource if the preferred does not have capacity to service a request – again all based on Litoria.
I’ll cover the last of the changes in Part Three of this series, where I’ll also offer some thoughts on where this might take us next.