The little series I am currently writing here on my blog has inspired me to write way too more code than actually necessary to get my point across ;-) So by now I've got my own MSMQ transport for WSE 2.0 (yes, I know that others have written that already, but I am shooting for a "enterprise strength" implementation), a WebRequest/WebResponse pair to smuggle under arbitrary ASMX proxies and I am more than halfway done with a server-side host for MSMQ-to-ASMX (spelled out: ASP.NET Web Services).

What bugs me is that WSE 2.0's messaging model is "asynchronous only" and that it always performs a push/pull translation and that there is no way to push a message through to a service on the receiving thread. Whenever I grab a message from the queue and put it into my SoapTransport's "Dispatch()" method, the message gets queued up in an in-memory queue and that is then, on a concurrent thread, pulled (OnReceiveComplete) by the SoapReceivers collection and submitted into ProcessMessage() of the SoapReceiver (like any SoapService derived implementation) matching the target endpoint. So while I can dequeue from MSMQ within a transaction scope (ServiceDomain), that transaction scope doesn't make it across onto the thread that will actually execute the action inside the SoapReceiver/SoapService.

So now I am sitting here, contemplating and trying to figure out a workaround that doesn't require me to rewrite a big chunk of WSE 2.0 (which I am totally not shy of if that is what it takes). Transaction marshaling, thread synchronization, ah, I love puzzles. Once I am know how to solve this and have made the adjustments, I'll post the queue listener I promised to wrap up the series. The other code I've written in the process will likely surface in some other way.

See Part 1

Before we can do anything about deadlocks or deal with similar troubles, we first need to be able to tell that we indeed have a deadlock situation. Finding this out is a matter of knowing the respective error codes that your database gives you and a mechanism to bubble that information up to some code that will handle the situation. So before we can think about and write the handling logic for failed/failing but safely repeatable transactions, we need to build a few little things. The first thing we’ll need is an exception class that will wrap the original exception indicating the reason for the transaction failure. The new exception class’s identity will later serve to filter out exceptions in a “catch” statement and take the appropriate actions.

using System;
using System.Runtime.Serialization;

namespace newtelligence.EnterpriseTools.Data
{
   [Serializable]
   public class RepeatableOperationException : Exception
   {
       public RepeatableOperationException():base()
       {
       }

       public RepeatableOperationException(Exception innerException)
           :base(null,innerException)
       {
       }

       public RepeatableOperationException(string message, Exception innerException)
           :base(message,innerException)
       {
       }

       public RepeatableOperationException(string message):base(message)
       {
       }

        public RepeatableOperationException(
          SerializationInfo serializationInfo,
          StreamingContext streamingContext)
            :base(serializationInfo,streamingContext)
        {
        }

        public override void GetObjectData(
           System.Runtime.Serialization.SerializationInfo info,
           System.Runtime.Serialization.StreamingContext context)
        {
            base.GetObjectData (info, context);
        }
   }
}

Having an exception wrapper with the desired semantics, we know need to be able to figure out when to replace the original exception with this wrapper and re-throw it up on the call stack. The idea is that whenever you execute a database operation – or, more generally, any operation that might be repeatable on failure – you will catch the resulting exception and run it through a factory, which will analyze the exception and wrap it with the RepeatableOperationException if the issue at hand can be resolved by re-running the transaction. The (still a little naïve) code below illustrates how to such a factory in the application code. Later we will flesh out the catch block a little more, since we will lose the original call stack if we end up re-throwing the original exception like shown here:

Try
{
   dbConnection.Open();
   sprocUpdateAndQueryStuff.Parameters["@StuffArgument"].Value = argument;
   result = this.GetResultFromReader( sprocUpdateAndQueryStuff.ExecuteReader() );
}
catch( Exception exception )
{
   throw RepeatableOperationExceptionMapper.MapException( exception );                           
}
finally
{
   dbConnection.Close();
}

The factory class itself is rather simple in structure, but a bit tricky to put together, because you have to know the right error codes for all resource managers you will ever run into. In the example below I put in what I believe to be the appropriate codes for SQL Server and Oracle (corrections are welcome) and left the ODBC and OLE DB factories (for which would have to inspect the driver type and the respective driver-specific error codes) blank. The factory will check out the exception data type and delegate mapping to a private method that is specialized for a specific managed provider.

using System;
using System.Data.SqlClient;
using System.Data.OleDb;
using System.Data.Odbc;
using System.Data.OracleClient;

namespace newtelligence.EnterpriseTools.Data
{
   public class RepeatableOperationExceptionMapper
   {
        /// <summary>
        /// Maps the exception to a Repeatable exception, if the error code
        /// indicates that the transaction is repeatable.
        /// </summary>
        /// <param name="sqlException"></param>
        /// <returns></returns>
        private static Exception MapSqlException( SqlException sqlException )
        {
            switch ( sqlException.Number )
            {
                case -2: /* Client Timeout */
                case 701: /* Out of Memory */
                case 1204: /* Lock Issue */
                case 1205: /* Deadlock Victim */
                case 1222: /* Lock Request Timeout */
                case 8645: /* Timeout waiting for memory resource */
                case 8651: /* Low memory condition */
                    return new RepeatableOperationException(sqlException);
                default:
                    return sqlException;
            }
        }

        private static Exception MapOleDbException( OleDbException oledbException )
        {
            switch ( oledbException.ErrorCode )
            {
                default:
                    return oledbException;
            }
        }

        private static Exception MapOdbcException( OdbcException odbcException )
        {
            return odbcException;           
        }

        private static Exception MapOracleException( OracleException oracleException )
        {
            switch ( oracleException.Code )
            {
                case 104:  /* ORA-00104: Deadlock detected; all public servers blocked waiting for resources */
                case 1013: /* ORA-01013: User requested cancel of current operation */
                case 2087: /* ORA-02087: Object locked by another process in same transaction */
                case 60:   /* ORA-00060: Deadlock detected while waiting for resource */
                    return new RepeatableOperationException( oracleException );
                default:
                    return oracleException;
            }
        }

        public static Exception MapException( Exception exception )
        {
            if ( exception is SqlException )
            {
                return MapSqlException( exception as SqlException );
            }
            else if ( exception is OleDbException )
            {
                return MapOleDbException( exception as OleDbException );
            }
            else if (exception is OdbcException )
            {
                return MapOdbcException( exception as OdbcException );
            }
            else if (exception is OracleException )
            {
                return MapOracleException( exception as OracleException );
            }
            else
            {
                return exception;
            }
        }
   }
}

With that little framework of two classes, we can now selectively throw exceptions that convey whether a failed/failing transaction is worth repeating. Next step: How do we do actually run such repeats and make sure we neither lose data nor make the user unhappy in the process? Stay tuned.

Categories: Architecture | SOA | Enterprise Services | MSMQ

Deadlocks and other locking conflicts that cause transactional database operations to fail are things that puzzle many application developers. Sure, proper database design and careful implementation of database access (and appropriate support by the database engine) should take care of that problem, but it cannot do so in all cases. Sometimes, especially under stress and other situations with high lock contention, a database just has not much of a choice but picking at least one of the transactions competing for the same locks as the victim in resolving the deadlock situation and then aborts the chosen transaction. Generally speaking, transactions that abort and roll back are a good thing, because this behavior guarantees data integrity. In the end, we use transaction technology for those cases where data integrity is at risk. What’s interesting is that even though transactions are a technology that is explicitly about things going wrong, the strategy for dealing with failing transaction is often not much more than to bubble the problem up to the user and say “We apologize for the inconvenience. Please press OK”.

The appropriate strategy for handling a deadlock or some other recoverable reason for a transaction abort on the application level is to back out of the entire operation and to retry the transaction. Retrying is a gamble that the next time the transaction runs, it won’t run into the same deadlock situation again or that it will at least come out victorious when the database picks its victims. Eventually, it’ll work. Even if it takes a few attempts. That’s the idea. It’s quite simple.

What is not really all that simple is the implementation. Whenever you are using transactions, you must make your code aware that such “good errors” may occur at any time. Wrapping your transactional ODBC/OLEDB/ADO/ADO.NET code or calls to transactional Enterprise Services or COM+ components with a try/catch block, writing errors to log-files and showing message boxes to users just isn’t the right thing to do. The right thing is to simply do the same batch of work again and until it succeeds.

The problem that some developers seem to have with “just retry” is that it’s not so clear what should be retried. It’s a problem of finding and defining the proper transaction scope. Especially when user interaction is in the picture, things easily get very confusing. If a user has filled in a form on a web page or some dialog window and all of his/her input is complete and correct, should the user be bothered with a message that the update transaction failed due to a locking issue? Certainly not. Should the user know when the transaction fails because the database is currently unavailable? Maybe, but not necessarily. Should the user be made aware that the application he/she is using is for some sudden reason incompatible with the database schema of the backend database? Maybe, but what does Joe in the sales department do with that valuable piece of information?

If stuff fails, should we just forget about Joe’s input and tell him to come back when the system is happier to serve him? So, in other words, do we have Joe retry the job? That’s easy to program, but that sort of strategy doesn’t really make Joe happy, does it?

So what’s the right thing to do? One part of the solution is a proper separation between the things the user (or a program) does and the things that the transaction does. This will give us two layers and “a job” that can be handed down from the presentation layer down to the “transaction layer”. Once this separation is in place, we can come up with a mechanism that will run those jobs in transactions and will automate how and when transactions are to be retried. Transactional MSMQ queues turn out to be a brilliant tool to make this very easy to implement. More tomorrow. Stay tuned.

Categories: Architecture | SOA | Enterprise Services | MSMQ

Microsoft urgently needs to consolidate all the APIs that are required for provisioning services or sites. The amount of knowledge you need to have and the number APIs you need to use in order to lock down a Web service or Enterprise Services application programmatically at installation time in order to have it run under an isolated user account (with a choice of local or domain account) that has the precise rights to do what it needs to do (but nothing else) is absolutely insane. 

You need to set ACLs on the file system and the registry, you need to modify the local machine's security policy, you need to create accounts and add them to local groups, you must adhere to password policies with your auto-generated passwords, you need to conbfigure identities on Enterprise Services applications and IIS application pools, you need to set ACLs on Message Queues (if you use them), and you need to write WS-Policy documents to secure your WS front. Every single of these tasks uses a different API (and writing policies has none) and most of these jobs require explicit Win32 or COM interop. I have a complete wrapper for that functionality for my app now (which took way too long to write), but that really needs to be fixed on a platform level.

Categories: Technology | ASP.NET | Enterprise Services

One of the reasons why I run Windows Server 2003 on my notebook is that "Services without Components" (managed incarnation is System.EnterpriseServices.ServiceDomain) didn't work on XP. If you just touch the ServiceConfig or ServiceDomain classes on XP, you get rewarded with a PlatformNotSupportedException, because the unmanaged implementation of that feature was present, but not quite-as-perfect-as-it-should-be on XP. That will soon be history. Windows XP SP2 and the COM+ 1.5 Rollup Package 6 will fix that and will bring COM+ 1.5 pretty much on par with Windows Server 2003.

Categories: Enterprise Services

I am writing a very, very, very big application at the moment and I am totally swamped in a 24/7 coding frenzy that’s going to continue for the next week or so, but here’s one little bit to think about and for which I came up with a solution. It’s actually a pretty scary problem.

Let’s say you have a transactional serviced component (or make that a transactional EJB) and you call an HTTP web service from it that forwards any information to another service. What happens if the transaction fails for any reason? You’ve just produced a phantom record. The web service on the other end should never have seen that information. In fact, that information doesn’t exist from the viewpoint of your rolled back local transaction. And of course, as of yet, there is no infrastructure in place that gives you interoperable transaction flow. And if that were the case, the other web service may not support it. What should you do? Panic?

There is help right in the platform (Enterprise Services that is). Your best friend for that sort of circumstance is System.EnterpriseServices.CompensatingResourceManager.

The use case here is to call another service to allocate some items from an inventory service. The call is strictly asynchronous and I the remote service will eventually turn around and call an action on my service (they have a “duplex” conversation using asynchronous calls going back and forth). Instead of calling the service form within my transactional method, I am deferring the call until the transaction is being resolved. Only when DTC is sure that the local transaction will go through, the web service call will be made. There is no way to guarantee that the remote call succeeds, but it does at least eliminate the very horrible side effects on overall system consistency caused by phantom calls. It is in fact quite impossible to implement “Prepare” correctly here, since the remote service may fail processing the (one-way) call on a different thread and hence I might never get a SOAP fault indicating failure. Because that’s so and because I really don’t know what the other service does, I am not writing any specific recovery code in the “Commit” phase. Instead, my local state for the conversation indicates the current progress of the interaction between the two services and logs an expiration time. Once that expiration time has passed without a response from the remote service, a watchdog will pick up the state record, create a new message for the remote service and replay the call.

For synchronous call scenarios, you could implement (not shown here) a two-step call sequence to the remote service, which the remote service needs to support, of course. In “Prepare” (or in the “normal code”) you would pass the data to the remote service and hold a session state cookie. If that call succeeds, you vote “true”. In “Commit” you would issue a call to commit that data on the remote service for this session, on “Abort” (remember that the transaction may fail for any reason outside the scope of the web service call), you will call the remote service to cancel the action and discard the data of the session. What if the network connection fails between the “Prepare” phase call and the “Commit” phase call? That’s the tricky bit. You could log the call data and retry the “Commit” call at a later time or keep retrying for a while in the “Commit” phase (which will cause the transaction to hang). There’s no really good solution for that case, unless you have transaction flow. In any event, the remote service will have to default to an “Abort” once the session times out, which is easy to do if the data is kept in a volatile session store over there. It just “forgets” it.

However, all of this is much, much better than making naïve, simple web service calls that fan out intermediate data from within transactions. Fight the phantoms.

At the call location, write the call data to the CRM transaction log using the Clerk:

AllocatedItemsMessage aim = new AllocatedItemsMessage();
aim.allocatedAllocation = <<< copy that data from elsewhere>>>
Clerk clerk = new Clerk(typeof(SiteInventoryConfirmAllocationRM),"SiteInventoryConfirmAllocationRM",CompensatorOptions.AllPhases);
SiteInventoryConfirmAllocationRM.ConfirmAllocationLogRecord rec = new RhineBooks.ClusterInventoryService.SiteInventoryConfirmAllocationRM.ConfirmAllocationLogRecord();
rec.allocatedItemsMessage = aim;
clerk.WriteLogRecord( rec.XmlSerialize() );
clerk.ForceLog();

Write a compensator that picks up the call data from the log and forwards it to the remote service. In the “Prepare” phase, the minimum work that can be done is to check whether the proxy can be constructed. You could also make sure that the call URL is valid, the server name resolves and you could even try a GET on the service’s documentation page or call a “Ping” method the remote service may provide. That all serves to verify as good as you can that the “Commit” call has a good chance of succeeding:


using System.EnterpriseServices.CompensatingResourceManager;
using …

 

///


/// This class is a CRM compensator that will invoke the allocation confirmation
/// activity on the site inventory service if, and only if, the local transaction
/// enlisting it is succeeding. Using the technique is a workaround for the lack
/// of transactional I/O with HTTP web services. While the compensator cannot make
/// sure that the call will succeed, it can at least guarantee that we do not produce
/// phantom calls to external services.
///

public class SiteInventoryConfirmAllocationRM : Compensator
{
  private bool vote = true;

  [Serializable]
  public class ConfirmAllocationLogRecord
  {
    public SiteInventoryInquiries.AllocatedItemsMessage allocatedItemsMessage;           

    internal string XmlSerialize()
    {
      StringWriter sw = new StringWriter();
      XmlSerializer xs = new XmlSerializer(typeof(ConfirmAllocationLogRecord));
      xs.Serialize(sw,this);
      sw.Flush();
      return sw.ToString();
    }

    internal static ConfirmAllocationLogRecord XmlDeserialize(string s)
    {
      StringReader sr = new StringReader(s);
      XmlSerializer xs = new XmlSerializer(typeof(ConfirmAllocationLogRecord));
      return xs.Deserialize(sr) as ConfirmAllocationLogRecord;
    }
  }

  public override bool PrepareRecord(LogRecord rec)
  {
    try
    {
      SiteInventoryInquiriesWse sii;
      ConfirmAllocationLogRecord calr  = ConfirmAllocationLogRecord.XmlDeserialize((string)rec.Record);
      sii = InventoryInquiriesInternal.GetSiteInventoryInquiries( calr.allocatedItemsMessage.allocatedAllocation.warehouseName );
      vote = sii != null;    
      return false;
    }
    catch( Exception ex )
    {
      ExceptionManager.Publish( ex );
      vote = false;
      return true;
    }
  }

  public override bool EndPrepare()
  {
    return vote;
  }


  public override bool CommitRecord(LogRecord rec)
  {
    SiteInventoryInquiriesWse sii;
    ConfirmAllocationLogRecord calr  = ConfirmAllocationLogRecord.XmlDeserialize((string)rec.Record);
    sii = InventoryInquiriesInternal.GetSiteInventoryInquiries( calr.allocatedItemsMessage.allocatedAllocation.warehouseName );
 
    try
    {
      sii.ConfirmAllocation( calr.allocatedItemsMessage );
    }
    catch( Exception ex )
    {
      ExceptionManager.Publish( ex );
    }
    return true;
  }
}

 

 

Brad More is asking whether and why he should use Enterprise Services.

Brad, if you go to the PDC, you can get the definitive, strategic answer on that question in this talk:

“Indigo”: Connected Application Technology Roadmap
Track: Web/Services   Code: WSV203
Room: Room 409AB   Time Slot: Wed, October 29 11:30 AM-12:45 PM
Speakers: Angela Mills, Joe Long

Joe Long is Product Unit Manager for Enterprise Services at Microsoft, a product unit that is part of the larger Indigo group. The Indigo team owns Remoting, ASP.NET Web Services, Enterprise Services, all of COM/COM+ and everything that has to do with Serialization.

And if you want to hear the same song sung by the technologyspeakmaster, go and hear Don:

“Indigo": Services and the Future of Distributed Applications
Track: Web/Services   Code: WSV201
Room: Room 150/151/152/153   Time Slot: Mon, October 27 4:45 PM-6:00 PM
Speaker: Don Box

If you want to read the core message right now, just scroll down here. I've been working directly with the Indigo folks on the messaging for my talks at TechEd in Dallas earlier this year as part of the effort of setting the stage for Indigo's debut at the PDC.

I'd also suggest that you don't implement your own ES clone using custom channel sinks, context sinks, or formatters and ignore the entire context model of .NET Remoting if you want to play in Indigo-Land without having to rewrite a large deal of your apps. The lack of security support of Remoting is not a missing feature; Enterprise Services is layered on top of Remoting and provides security. The very limited scalability of Remoting on any transport but cross-appdomain is not a real limitation; if you want to scale use Enterprise Services. Check out this page from my old blog for a few intimate details on transport in Enterprise Services.

ASMX is the default, ES ist the fall-back strategy if you need the features or the performance and Remoting the the cheap, local ORPC model. 

If you rely on ASMX and ES today, you'll have a pretty smooth upgrade path. Take that expectation with you and go to Joe's session.

[PS: What I am saying there about ES marshaling not using COM/Interop is true except for two cases that I found later: Queued Components and calls with isomorphic call signatures where the binary representation of COM and the CLR is identical - like with a function that passes and returns only ints. The reason why COM/Interop is used in those cases is very simple: it's a lot faster.] 

Categories: PDC 03 | Technology | COM | Enterprise Services | Indigo

Steve Swartz, who is one of my very good personal friends and who is, that “personal function” aside, Program Manager in Microsoft’s Indigo team and also was the lead architect for a lot of the new functionality that we got in the Windows Server 2003 version of Enterprise Services (COM+ 1.5 for the old fashioned folks), wrote a comment on my previous post on this topic, where I explained how you can get the XML configuration story of the Framework to work with Enterprise Services using the .NET Framework 1.1.

In response to what I wrote, someone asked whether this would also work on Windows XP, because I was explicitly talking about Windows Server 2003. Steve’s answer to that question completes the picture and therefore it shouldn’t be buried in the comments. Steve writes:

In fact, this will work on XP and Windows Server 2003 so long as you have NETFX 1.1. The field has been there since XP; in NETFX 1.1, we set the current directory for the managed app domain.

This field was originally added to configure unmanaged fusion contexts. In that capacity, the field works with library and server apps alike. In its capacity as a setter of current appdomain directory, it works less well with library apps (natch).

Categories: Enterprise Services

A long while back, I wrote about a hack to fix the dllhost.exe.config dilemma of Enterprise Services. That hack no longer works due to changes in the Framework, but the good news is there is an “official” and very easy solution for this now. Unfortunately there is no documentation on this (or at least it’s not easy enough to find that I could locate it) and Google only yields a few hints if you know exactly what you are looking for. So, index this, Google!

What I call the “config dilemma” of Enterprise Services is that because all out-of-process ES applications are executed using the surrogate process provided by %SystemRoot%\System32\dllhost.exe and the runtime is loaded into that process, the default application configuration file is dllhost.exe.config, must reside just next to dllhost.exe (in System32) and is therefore shared across all out-of-process Enterprise Services applications.

That makes using the XML configuration infrastructure for Enterprise Services very unattractive, to say the least.

Now, with COM+ 1.5 (Windows Server 2003) and the .NET Framework 1.1, things did change in a big way.

To use per-application application configuration files, all you have to do is to create an (possibly otherwise empty) “application root directory” for your application in which you place two files: An application.manifest file (that exact name) and an application.config file. Once your application is registered (lazy or using the RegistrationHelper class or through regsvcs.exe), you will have to configure the application’s root directory in the catalog – that can be done either programmatically using the catalog admin API (ApplicationDirectory property) or through the Component Services explorer as shown above.

The picture shows that the example that you can download using the link below is installed at “c:\Development\ES\ConfigTest\ESTest” on my machine and has these two said files sitting right there.

The application.manifest file content is embarrassingly simple

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
</assembly>

and the application.config isn’t complicated either:

<?xml version="1.0"?>
<
configuration>
  <appSettings>
     <add key="configBit" value="This rocks!"/>
  </appSettings>
</
configuration>

These two files, placed into the same directory and properly configured as shown in the above picture, let this class

       public class SimpleComponent : ServicedComponent
       {
        public string GetConfigBit()
        {
            return ConfigurationSettings.AppSettings["configBit"];
        }
       }

yield the expected result for GetConfigBit(): “This rocks!”

 

 

Download: ESTest.zip

Categories: CLR | Enterprise Services