Showing posts with label tesco instant. Show all posts
Showing posts with label tesco instant. Show all posts

Wednesday, 29 December 2010

R&D Christmas Experiment: Tesco Instant - Write-up and Play

OK so it's live - and achieved using no more than some C#/Java and SQL database skills! It helps me answer the question:  Can I get to a particular product faster than via our standard web offering (of departments, aisles and shelves, or text searching)? Does it for you?

Go play at http://www.techfortesco.com/tescoinstant !!

The rest of this article is about the technical design and implementation, of interest to anyone wishing to look at similar solutions.

Step 1 - Design!

I created a database with the following tables and developed code that used the API to extract the products for a particular store (Greenford Dotcom Store which serves all homes in north London) and perform 'lexicon analysis' to make the data available efficiently. 

My theory was that, if I broke all products down into their constituent words (called the 'Lexicon'), and indexed those words efficiently using all possible key combinations up to 5 characters, then I could make product searching really fast!

Pencil, paper and SQL Enterprise Manager brought me to this design (note that there are extra clustered indices applied to the final design that are not shown in this diagram):






Step 2 - Get The Data
Using the Tesco grocery API I parsed through all the departments, aisles and shelves and copied some essential product data - just ProductID, BaseProductID (known as TPNB for future use when I hook up the application to the Tesco API for 'add to basket') and the product's description name into a table called ProductsTable - totalling 21,638 rows.



Step 3 - Extract and store individual words to create the Lexicon.
I took each product description and extracted individual words thanks to C#'s String.Split(' ') method using space ' ' as a delimiter. Each word was inserted into the LexiconTable (unless it was already there, in which case I just incremented the ProductCount value at the rate of one per product - a value which would be useful later).

Since I was creating a many-to-many relationship between ProductTable and LexiconTable (that is, products have more than word and each word can be used by more than one product), I also created a lookup table - LexiconProductTable. That way I could get to list the products quickly using any particular word much because it would be indexed (the alternative would have been to search for all products containing a particular word using the 'LIKE' keyword in SQL, which is highly inefficient and slow).

LexiconTable ended up with 11,237 unique words that described all the Tesco products, and LexiconProductTable had inserted 124,413 lookup rows.

If I sort by descending product count, I can uncover the top words used in most Tesco products! Intrigued? Here is the top 20:


10
Tesco
6288
4
And
2337
14
Pack
1320
136
Chicken
873
238
75Cl
732
311
500G
714
275
Sauce
665
150
Finest
661
49
400G
660
29
In
638
199
200G
627
60
Chocolate
623
47
250Ml
568
21
Value
532
24
1
517
465
2
517
266
250G
508
37
4
498
120
Organic
486
25
Litre
475


The left column is the LexiconID unique identifier, and the right column is the count of the number of products in which the word is used.


Step 4 - Build Indexed Searching based in anticipated input
Now to get this application fast-acting when anyone typed in a search character. I needed a table that had a unique match for up to the first 5 letters typed in. That's ANY 1, 2, 3, 4, or 5 characters from 'aaaaa' to 'zzzzz' as long as there at is at least one lexicon word that satisfies the particular combination. Index this match and the system should react like lightning!

To do this I wrote some code to look at each word in the LexiconTable that had an 'a' in it and, if so, inserted an entry into a new table, LexiconSearchCharTable. I then searched 'b', 'c'' and so on. Indeed I had a 5-nested search loop that efficiently checked for up to 5-letter combinations from 'aaaaa' to 'zzzzz' like this:

Does the current Lexicon word I am studying have an 'a'?
If yes, does it have an 'aa'?
    If yes, does it have an 'aaa'?
        If yes, does it have an 'aaaa'?
            If yes, does it have an 'aaaaa'?

In other words, I iterated through the alphabet from a-z, and if I got a 'hit' I then iterated again through the alphabet for a second character 'aa-az'. Any hit I got I iterated through the alphabet using a third character 'aaa'-'azz', then a fourth 'aaaa'-'azzz' and finally a fifth character 'aaaaa-azzzz'. This worked really well without going through every combination of 'aaaaa'-'zzzzz' since of course words that have no characters 'ab' could not possibly have characters 'aba' through 'abz' either!

LexiconSearchCharTable ended up with 211,561 rows. Here is what the first 20 rows look like:

1
E
8973
2
H
8973
3
HE
8973
4
T
8973
5
TH
8973
6
THE
8973
7
B
7136
8
BE
7136
9
BER
7136
10
BERR
7136
11
BERRY
7136
12
C
7136
13
E
7136
14
ER
7136
15
ERR
7136
16
ERRY
7136
17
R
7136
18
RR
7136
19
RRY
7136
20
RY
7136

The left column is the table's unique identifier for each row, the middle row contains the successfully tested characters, and the third column is the LexiconTable unique identifier for a word that matched. You can probably deduce that Lexicon ID 8973 is "THE" and 7136 has "BERRY" and a mysterious "C" - the word actually stored is "C/BERRY" (cranberry).

A user typing in 'B' will have row 7 returned. There will be other rows with just 'B' in them but they will be returned instantly since that column is indexed. Only rows with just 'B' in this column will be returned. Type in 'BE' and now row 8 is returned instead, really fast along with the other 'BE'-only rows,


Step 5 - Extract appropriate data on demand QUICKLY via a stored procedure
So now I have my data in a good place. I just needed a stored procedure to act swiftly on the tables to bring back the list of products quickly. Here we go:



CREATE PROCEDURE [dbo].[TescoGroceryInstantSearch]
(
@SearchCharacters varchar(5)
)
as
SELECT TOP 20
pt.ProductID,
pt.Description
FROM
ProductsTable pt
INNER JOIN  LexiconProductTable lpt ON pt.ProductID = lpt.ProductID
INNER JOIN LexiconTable lt ON lt.LexiconID = lpt.LexiconID
INNER JOIN LexiconSearchCharTable lsct on lsct.LexiconID = lt.LexiconID
WHERE
SearchCharacters = @SearchCharacters
ORDER BY lt.ProductCount DESC


Nice - an easy to read and a quick way to grab the data! As you type each characters into the search box on the web page, the list of products results quickly from this stored procedure. This is confirmed by the SQL Execution Plan revealed by my SQL Server - all the heavy percentage CPU costs are borne by clustered index seeking.

I may as well make the maximum use of the data so I have a second stored procedure that will run in parallel with the first one in my web application, revealing word suggestions to help the user get to the product even quicker. Here it is:


CREATE PROCEDURE [dbo].[TescoGroceryInstantSearchSuggestions]
(
@SearchCharacters varchar(5)
)
as
SELECT TOP 15
lt.LexiconID,
lt.LexiconWord
FROM
LexiconTable lt 
INNER JOIN LexiconSearchCharTable lsct on lsct.LexiconID = lt.LexiconID
WHERE
lsct.SearchCharacters = @SearchCharacters
ORDER BY lt.ProductCount DESC

Again its clustered index seeking all the way. Nice! No need for third party software - just some intelligent thinking around data storage, formatting and extraction. I always like to try stuff out myself before I recommend buying anything, as I prefer to use the tools I already have.

Step 6 - Build a simple web application to make it all happen!

Now back to the whole reason for doing this. Can you get to a chosen product quicker?
Go play at http://www.techfortesco.com/tescoinstant and let me know.

Thursday, 23 December 2010

The R&D Christmas Project: Tesco.com Grocery Instant

That’s it! The final group of customers are having their groceries delivered before Christmas as this entry publishes - and once again they have entrusted us with delivering their Christmas Day dinner.

Yes, Tesco.com grocery customers have once again trusted us provide the ingredients for their most important meal of the year. It’s a huge honour and a huge responsibility. The weather hasn’t exactly helped but staff across the country are doing their best for customers this evening.

So finally I can draw breath and begin the process of planning the delivery of some exciting projects for the coming three months.

However, during the three working days of next week I’m not going to work. I’m going to play! I love working between Christmas and New Year because the office is mostly empty and the phone rarely rings. It’s the opportunity to start and finish a small proof of concept quickly, so let me explain it:

I love Google Instant, where search results start appearing with each key press (and it's not just the fact that Tesco has the kudos of being the first thing Google thinks of when I type 't'!). So next week I’m going to publish a proof of concept called, unless I think of a better name, Tesco.com Grocery Instant.

The objective is simple: on a web page running on our R&D server you start typing in a grocery product name. At the first key press, grocery product search results instantly appear that become more relevant with each subsequent key press. Could a user of this service find the product they are seeking more quickly than standard text search or following the department / aisle / shelf taxonomy?

To make it work I intend to follow these steps:

  1. Take a copy of our product range and import it into a SQL database server table.
  2. Create a second 'index' table which will have, in its first rows, a single letter of the alphabet and all the product ids from the first table that have that letter in their descriptions. I'll write a process that will carefully examine each product description and choose whether to add it to this 'index'. In total for this part I will look for characters a-z and numbers 0-9.
  3. Continue the indexing, this time adding “aa” and finding all the products that match, then “ab”, “ac”, etc through to “zz” - that’s 26 x 26 letter combinations (no need for digits) = 676.
  4. Continue the indexing, this time adding triple letters from “aaa” to “zzz” and finding all the products that match, That’s 26 x 26 x 26  letter combinations = 17,576.
  5. Continue the indexing, this time adding four-letter combinations from “aaaa” to “zzzz” and finding all the products that match, That’s 26 x 26 x 26 x2 6 letter combinations = 456,976.

By doing this, you should always get an instant response from the database because there is always a spot-on data match for each key press as the SQL server engine searches for the correct products.

Hopefully by the time you have reached four characters then the instant product listing should have narrowed down to a reasonably small number of products. If not I’ll simply go to five-letter combinations which will add another 11,881,376 such combinations to the database.

As ever I’ll put this proof of concept onto our public internet facing labs server so you can have a go. Stay tuned to see how you can access the page.

In the meantime, may I wish you and your loved ones Yuletide Greetings and leave you with this seasonal message: On December 25th a saviour was born. He revealed eternal truth, bringing joy to millions. He astonished the world with his command of nature. He changed history forever. Yes Happy Birthday Sir Isaac Newton, born 25 December 1642.