March 29, 2004
Large Data Sets (Part 1 of Many)
This is the first of what will most likely be many postings related to data management and Flex. Our opening topic is a brief discussion on large data sets and how to efficiently transfer them from your application server to the client. I'm going to assume you are familiar with MXML and the components and services that are a part of the Flex runtime.
This post will refer to many examples that are found here. Instructions for installing the samples can be found in a README included in the zip.
The ProblemOur problem is simple. We need to display a lot of data, and the user does not want to wait too long for it to appear. The more data a user requests, the longer it takes for the data to display. What we need to do is a find a way to download data in pieces so that the user can view a smaller amount of data immediately and then only wait a brief amount of time as they request additional data. To keep the problem domain simple we are only going to discuss improving the actual retrieval of data. We'll leave discussions on sorting, modification, filtering, etc. for another time.
Example 1 can help you visualize this problem. The data is census information that I got from The UCI KDD Archive and reduced to 20,000 records. Start by downloading a few records, perhaps 10. This doesn't take long at all. If we increase that number to 100 the response is usually a little slower (these numbers can vary depending on your system setup, for example whether the database and application servers are on the same machine in addition to where you run the Flex app). Increase the number to 1000 and you'll see a bigger jump. Finally 5000 and the time is becoming painful. Painful enough that an end-user might give up.
The Solution: PagingSmarter people than I solved this problem a long time ago by displaying data in pages. A page allows the user to see a discrete number of entries, and then when ready move on to the next batch. A typical example would be a search engine. Go to your favorite search engine and notice that whenever you search for something you only have a small number of results (10 or maybe 20) before you reach the bottom and can move to the next page. Even if a search returned 3000 results I've personally found it unlikely to ever look at anything past the third page. So if that search engine had returned all 3000 results at once to my browser, I'd be waiting forever for it to come down over the wire and render on screen, when really I only cared about those first 30.
The Server ComponentBefore getting into writing paging support in Flex, let's talk about how you retrieve pages on the server. I've written some very simple classes to retrieve my census data. The first is an interface (just to show that I'm thinking generically), the ValueListService. As you might have guessed, this is a very simple adaptation of the ValueListHandler described at the Sun Core J2EE Patterns site. The original ValueListHandler is treated as an Iterator which means that it must maintain state about its current position. Since my Flex application is going to do that for me, all I need from the ValueListService is the number of elements, and the ability to retrieve an arbitrary number of elements starting from a certain position. Therefore my ValueListService looks like this:
public interface ValueListService
{
public int getNumElements();
public List getElements(int begin, int count);
}
I've then implemented this interface using a CensusService which is capable of using a CensusDAO to retrieve the relevant data, which is populated into CensusEntryVO objects. The CensusDAO is currently only tested on MS SQLServer but the SQL is pretty standard. It's also not very robust given that this is sample code; all exceptions are simply caught and printed. So now when the Flex client wishes to view census data it will ask the CensusService for a List of CensusEntryVOs.
The RemoteObject TagWe're going to use the RemoteObject tag to access the CensusService. Flex offers three data services: HTTPService, WebService, and RemoteObject. HTTPService is useful for getting information from plain XML files or perhaps servlets that serve XML. WebService is obvious, it is used to call web services which return data over the wire as XML. RemoteObject allows you to access Java objects on your application server and by default uses a binary format called AMF for efficient data transfer. Since this "article" is about transferring large amounts of data, RemoteObject is a good choice of service because of its efficient transfer mechanism. However, the concepts discussed can easily be adapted to the other services.
Explicit PagingOur first solution follows the same idea as a search engine. Return data one page at a time and let the user request the next page. I'm going to call this technique explicit paging. The explicit paging solution is ideal for when a user requests a number of records but may only care about a specific subset rather than the entire result set. Searching is a primary example, an address book could be another. Pages do not have be to sized according to the number of records, they can meet other search criteria like last names beginning with C.
Example 2 let's you see how this works. Assuming you ran Example 1 you got a sense of how many records could be downloaded before the wait became unbearable. On my particular setup about 1000 records was the max I wanted to put myself through, but I think in reality that number would be closer to 100 for the app to appear speedy. There is a slider on the app that allows you to change the number of records in a page. From there you can see that the page selector is created to allow access to whichever page the user desires. The page selector uses a Repeater to create a number of Links which are then used to select the pages.
This solution is pretty straightforward. You'll see that most of my code is actually related to the page selector itself, not the data. This means that someone with better UI skills can write a whizbang page selector that can be re-used and all you'll need to do is configure it. Flex is new, but I have no doubt that we'll see robust components sprouting up soon for these kinds of purposes.
So an explicit paging solution is useful when a user wants to see pieces of data. However we might need another solution if it's important for the user to see all the data at once.
Implicit PagingOur second approach is a technique I'll call implicit paging. Here we don't want the end-user thinking about viewing finite pieces of data, instead it should appear that all of the data is available. Rather than downloading all of the data at once, we'll download a small piece and then as the user comes to data we haven't downloaded we'll go ahead and get it. If you use a mailreader and store your mail on a server this concept will be familiar to you. View one of your mailbox folders and begin to scroll down. As you scroll there may be a little pause as the reader gathers more data for you to look at.
This problem might sound hard; how do you configure the DataGrid control to only get small chunks of data at a time? It's actually pretty simple! The DataGrid takes a dataProvider that must conform to an "interface." Interface is in quotes because unfortunately we don't actually use an interface due to the need to mix in functionality to classes whose signature we can't modify. So all we need to do is implement the DataProvider interface using a class that knows how to retrieve data in pages instead of all at once.
Example 3 shows my simple implementation. The SimplePagingDataProvider (SPDP) is given a reference to a RemoteObject that will communicate with a ValueListService. Whenever the DataGrid asks for data (using the getItemAt method) the SPDP will check to see if the page for that item is loaded. If so the data is returned; if not the SPDP will ask the RemoteObject to download the data and in the meantime return a dummy value. This is pretty straightforward:
public function getItemAt(index : Number)
{
var item = data[index];
if (item == null)
{
item = miss(index);
}
return item;
}
private function miss(index : Number) : Object
{
var page : Number = Math.floor(index / pageSize);
//if the page was already loaded then the value actually is null
if (pagesLoaded[page] == true) return null;
//it's possible that the page is already being loaded
if (pagesPending[page] == true)
{
//this miss event is useful for just monitoring what's going on
dispatchEvent({type: "miss", index: index, alreadyPending: true});
return "loading";
}
//if the page is not loaded call for it
var call = dataService.getElements(page * pageSize, pageSize, this);
call.page = page;
//we want to keep track of how long it takes to load
call.startTime = getTimer();
pagesPending[page] = true;
dispatchEvent({type: "miss", index: index, alreadyPending: false});
return "loading";
}
The DataGrid will show blank rows while the data is loading in the background but it will continue to function (i.e., no hanging). When the data is downloaded the rows will be filled in. In the example I used a page size of 100 because on my setup there were only brief periods where the DataGrid appeared not filled in. Note that this solution works both when a user slowly moves down the list (perhaps using the PgDn key) and when dragging the ScrollBar.
ConclusionThis is just the first step in addressing the problem of dealing with large datasets in your Flex RIA. I've introduced the concept of paging (though I doubt it's new to you) and showed two different techniques for integrating a paging solution into your application. The explicit paging solution is useful when there is a lot of data to be shown but it is unlikely that the user wants to view all of it at once. The implicit paging solution acknowledges that a user wants to see a lot of data, but we bring it across the wire incremementally so that performance is acceptable. In future posts I'll try to talk about expanding these solutions to improve performance (both real and perceived), take into account dynamic data, allow sorting, and more. If you have any thoughts on this including topics you'd like to see discussed in the future please drop me a comment.
Some resources that I've begun looking at and will come into play more in future entries:
- The Distributed ResultSet Iterator Pattern
- The ValueListHandler Pattern
- The Data List Handler Pattern
Posted by mchotin at 12:00 AM | Comments (31) | TrackBack
March 26, 2004
Introducing Me
Well hi there! Welcome to my little corner of the blog universe. I'm Matt Chotin, a software engineer on the Flex team. Those of you who participated in the beta know why I'm here. I basically can't keep my damn mouth shut. Since the beta's coming to a close Macromedia management decided they needed an outlet for my ramblings, so here we are. Hope you might find something interesting.
The current plan is for me to talk about Flex in the context of real world problems. Specifically I'm going to try to focus on the data side of the Flex app, the M in MVC if you will. I've also got some level of experience in the C, but I won't do you much good when it comes to the V. We're going to start off with a discussion on dealing with large amounts of data. In talking with customers I've had questions come up like: "what if I need to display 20,000 records?" So we're gonna think about ways to deal with that situation. As time passes I hope to delve into other topics like general architecture, performance, and the secret to a good fruit crisp (hint: extra crisp).
So what do I bring to the table? For one, I worked a lot on Flex, so I know what it can and can't do. Beyond that I've primarily done J2EE development. First I worked on a standard web application where we thought generating our UI entirely using XSL at runtime was a good idea. It was not a good idea (at least in 2000). The second round of that product was much better and though we didn't know it at the time utilized a lot of the best practices you'll find out there, especially in its data architecture. After a stint integrating that system into another product and working on some smaller projects I came to Macromedia where I began working on the server portion of an RIA. This was my introduction to Flash (MX) and boy was it ugly. So when I found out that Macromedia was working on a product meant for people like me I knew that I had to participate. I'm pretty pleased with what we've put out, and I hope that many of you will be able to play with it and eventually develop real applications that provide the kind of user experience your customers deserve.
I'm not sure how often this blog will be updated as I'll be working on the next version of Flex, but I hope to put up something new every week or so. In the meantime check out FlexCoders as a way to chat with fellow Flex developers; and of course keep an eye on the Flex homepage for the latest news and information.
Posted by mchotin at 11:00 AM | Comments (21)