Monday, 13 December 2010

"I get the message!" says customer sent 25,000 texts

Do you get the impression from this blog that R&D always gets it right?

I thought so. Well we do try and look for all the possible technical pitfalls - "Hope for the best, Plan for the worst" is a slogan I cling dearly to when unleashing an application or service on customers when it is still in R&D mode. We have some good project risk assessment meetings where each of us is asked to think of all the possible bad things an R&D application or service could do to a customer!

All our R&D projects that are tried out by customers have various counter-measures in place to protect them from "things going wrong". These include (but are not limited to) bandwidth restrictions, data limits, encryption and security measures, no privileged access to Tesco's production network or data services, and a code review to gain assurance that one of our R&D applications won't cause damage to the environment in which it runs.

The danger is that there can be Unknown Unknowns that we could not predict, and as a result haven't put in any risk-mitigating counter-measures to combat the problem, because we did not foresee the problem.

One project passing through R&D about three years ago was "Where's My Shopping?" (WMS), an SMS text-based service that sent text messages when any grocery delivery van left its store. Each message informed its customer of the time when their delivery was actually arriving. It's now a production service running in 60 Tesco branches and will roll out to further branches in 2011.

When WMS was in my hands, we were sending messages to customers attached to one particular Tesco branch in north London to see if they liked the messaging service and appreciated the information. To make this happen, we used to send text messages to a server in our telephone switch-room which forwarded them over a dedicated Link-60 network connection to our texting partner. Its reliability was so good that the project risk assessment did not consider every possibility of what 'going wrong' looked like. Normally a server in distress is doing less work (or no work) rather than the direction our SMS server actually took one day three years ago.

On this day, the Link-60 connection went slow. Not so slow that we couldn't send messages but slow enough that the communication between us and our SMS partner went out of synchronisation. The system normally worked by us sending a text message over the Link-60 and getting an acknowledgement (ACK) from our SMS parter that it had been received within a certain time frame. If the ACK was not received in that time then the message would be re-sent.

Only this time, the Link-60's slow response meant that the message was indeed received but the ACK was not received within the time frame, so we sent it again..... and again... to one customer 25,000 times. 

Oh yes.

(You're probably thinking, "Why not put a re-send limit on non-ACKed messages?". Alas the software that performed this work belonging to that of a third party and was a 'black box' into which we pushed text messages alongside all other text messages sent out by Tesco. I had neither detailed knowledge of its operation nor any ability to adjust it. Our R&D system only sent the message 'once'!).

The customer was very pleasant in the circumstances. Their mobile phone was a Blackberry and could store 100 message easily. It was just that, once he deleted any messages and freed up phone memory, down would come more.

After a short time this particular customer quickly got the message that their delivery was indeed going to arrive between 6:30pm and 6:45pm and just as quickly decided that he didn't need to be told again and again. He gave our customer service centre a call.

The message passed to me was, "a customer has called in to say they are getting endless text messages about their delivery!". I'll never forget receiving that message. I slammed the phone down, jumped up from my desk, tore across Tesco's Shire Park campus from one building to another looking like someone possessed, entered the switch-room and unceremoniously yanked the network and power cables from the back of the Link-60 SMS server.

The instant and sterling effort by our SMS partner and the customer's mobile phone company managed to remove the queue of messages (with the customer's readily given permission). My coveted R&D budget was reduced by a suitable amount in order to compensate the customer for the misadventure they had suffered at my hands.

These days - as a direct result of our experience - we use a nice and reliable secure web service interface to our partner's SMS service over the internet. It works a treat and the considerable counter-measures now in place at both ends prevent a customer from ever receiving more than three SMS messages from the WMS service in any one day.

Unknown Unknowns stalk R&D projects in every company, every day. Only skill, knowledge and the wisdom gained by experience allows us to grow a list of "Unknowns that become Known" whenever we have a project risk assessment meeting.

It's a list that we treasure. 

No comments:

Post a Comment

As this blog grows in readership - and because it carries the Tesco brand - I have had to become more careful about the sort of comments that are acceptable. The good news is that I'm a champion of free speech so please be as praising or as critical as you wish! The only comments I DON'T allow through are:

1. Comments which criticise an individual other than myself, or are critical of an organisation other than Tesco. This is simply because they cannot defend themselves so is unfair and possibly libellous. Comments about some aspect of Tesco being better/worse than another equivalent organisation are allowed as long as you start by saying "in my personal opinion.." or "I think that...". ... followed by a "...because.." and some reasoned argument.

2. Comments which are totally unrelated to the context of the original article. If I have written about a mobile app and you start complaining about the price of potatoes then your comment isn't going stay for long!

3. Advertising / web links / spam.

4. Insulting / obscene messages.

Ok, rules done - now it's your go: