Monday, July 6, 2009

Data center issues: Now is a good time for a pre-postmortem review!

In the past week there have been a handful of significant issues at a few large datacenters, including an 11 hour outage at Authorize.NET that left thousands of websites unable to process credit card transactions. The company that I work for narrowly missed the 45 minute outage at Rackspace, but it looks like Justin Timberlake wasn’t quite so lucky:

Since it is my job to make sure that the servers that run our software stay online as much as possible, I enjoy reading about problems with hosting providers and making sure that I would know what to do to if/when it happens to us. The new cloud services offered by Amazon, Google, and Microsoft aim to give everyone the scale-ability and reliability that use to only be available to large companies with thousands of servers, But these new computing services are still in their infancy and have had more than their fair share of down time recently as well.

I very much enjoyed reading Dyn Inc’s analysis of the Authorized.Net outage, as it gave a good mix of the gory details of the failure and the best practice techniques that could be used to prevent or mitigate this type of extended outage from occurring. Usually you have to learn these kinds of things first hand, having gotten burned by it once and swearing to never let it happen again. Instead I prefer to try and learn from other people’s mistakes, as you usually spend much less time in the hospital that way!

Sunday, June 14, 2009

NBA Finals on ABC – Blurry Picture and Sub-Par HD

Tonight the Los Angeles Lakers beat the Orlando Magic 99 to 86 to win the 2009 NBA Finals. I am not a huge Lakers fan, but I do enjoy watching a good basketball game and did manage to catch the first and last games of the series. While the games were good I was a bit distracted because both games that I watched had pretty poor picture quality and the majority of the wide-angle shots were really bad for a major sporting event. Not at all what I had expected in today’s  High Definition standards.

The first game I watched was on a projector at a sports bar, and I had just figured that there was an issue with the projector, but the same thing happened tonight while watching at home on my new computer. After upgrading and purchasing a new antenna I have been able to receive all of the local HD broadcast channels (also called Over The Air or OTA) in full digital HD format. I prefer watching recorded TV from a DVR so that I don’t have to watch commercials, but for sports I will usually switch to the OTA HD signal and watch it live. In tonight’s game though I noticed that while the Cadillac commercials and commentary shots were sharp, a lot of the actual game was pretty blurry and hard to follow.

It sounds like other people have noticed this too, and it appears to be an issue with how ABC broadcasts the event to TV stations and cable providers. While their other shows look great, the sporting events that involve a lot of fast motion do not end up looking very good. My guess is that they try and compress the signal too much, because if you combine a low bit rate signal with a lot of fast movement and flash photography you are bound to get poor picture quality. Luckily the original recording is kept at a higher quality, so all of the highlights and re-broadcasts appear much better than the live event.

The funny thing was that even the overlay graphics like the scoreboard would appear blurry at times. Check out this 10 second video of the scoreboard that I made, where you can see the whole thing (including the ABC logo) goes blurry and then takes a few seconds to recover. Gets pretty annoying when this happens throughout the entire game. If this had happened during the commercials you can guarantee the sponsors would be asking for their money back!

Tuesday, June 9, 2009

Google Translator Toolkit: Makes me wish I knew another language

Google recently launched the Translator Toolkit, which helps users translate and edit documents or web pages from one language to another. This is industrial scale crowd sourcing and machine learning at it's finest and is a rare win-win-win situation:

  • Users get tools that help make their job easier. With a few click a document gets processed and translated automatically. The user then has to proofread, edit, and publish the results, but they get access to tools like dictionaries, glossaries, and previous translations that may be related.
  • Partner websites like Wikipedia and Knol will get better translations for articles because they are edited and approved by humans.  
  • Google gets to train it's translation algorithms using data from the human edits to improve it's accuracy. I took a basic natural language processing class a few years back, and getting access to mistakes and their correlated corrections is invaluable for refining and improving these types of algorithms.

If you are interested check out http://translate.google.com/toolkit or just watch the video:

Monday, June 1, 2009

Google Wave - High Frequency Innovation in Online Communication

About 5 days ago Google previewed their new Wave communication service at the Google I/O conference, and already the web is abuzz about it. A search for "Google Wave" returns over 33 million results, and there are over 30 tweets for "Google Wave" posted in the last 5 minutes (and another 20 while I wrote this sentence). Needless to say this is one of the biggest product announcements to come from Google in the last few years and has completely outshined Microsoft's announcement of rebranding their search engine from Live to bing. While Microsoft tries to gain ground in online search, Google has been planning ways to change how people communicate and collaborate online.

Google Wave is set to be released later this year and will combine features commonly found in IM, Email, Forums, Social Websites, and Wikis to create a unified communication platform designed for the modern web. It is based on HTML 5, which means it will run in a web browser but should feel a bit more like a desktop app in terms of response and the way a user interacts with the service. It also has some cool features like real time updates from multiple users, conversation playback, and advanced spellchecking using natural language processing.

The product is currently in a closed beta stage that is only open to developers that want to build applications or extensions on top of the Wave platform and protocol. The Google team thinks that one of the big drivers for success of email over the last 25 years has been that it was built using an open protocol that anyone could implement. The Wave protocol was designed to mimic the same type of system where servers from different providers can interoperate with each other using an open source federation protocol. Google has a vision for how to improve online communication, but they also are aware that to succeed they need to share with all the other kids in the neighborhood.

It is way to early to separate the hype from reality, but if Google's track record is any indication then this has a lot of potential to have a large impact one way or another. I use Gmail, Google Docs, and Google Calendar on a regular basis nowadays, and it is hard for me to envision working without those type tools without taking a large drop in productivity. Also the Google Wave team is lead by the same two guys that created Google Maps, so they have a lot of experience in creating these types of highly interactive applications.

If you want more information you can watch this 80 minute presentation demonstrating Google Wave or go to http://wave.google.com to sign-up to be notified when it is publically available. I guarantee that you will hear more about this in the future, as it truly does show potential for a new way to communicate online.

Friday, May 8, 2009

Lucky 7s? GB7, Core i7, and Windows 7

I have a lot of computers in my life, and usually end up working with anywhere from 2 to 10 different machines a day. At home there are 2 computers: an main workstation that serves as a media center and a laptop in the bedroom for when I have to answer the red telephone at 6am. I recently added a Hauppauge WINTV HVR 1600 digital TV tuner card into the media center computer, which combined with GBPVR provides TiVo like features, commercial skipping using Comskip, and the ability to watch live TV. The card has both an analog and a digital tuner, but the computer was too slow to keep up with the high definition format, so I was stuck with basic analog cable channels. Even with analog TV the AMD Athlon XP 2000+ processor could barely keep up.

Since the media center computer was about 6 years old, I decided that it was time to do a complete upgrade. This will be the 7th computer and will be replacing a computer named GB1, so it only makes sense to call it GB7. When purchasing the components from Newegg, I decided that the platform should last at least another 6 years, which meant buying the latest and greatest. I am a big fan of the Intel Core 2 and Core 2 Duo line of processors, but the most recent release from Intel is the Core i7 processor line, which costs a lot more but should be in active development for at least the next 5 years.

The Core i7 line is currently based on a quad core design with hyper threading, which allows for 8 logical threads to be processed on every clock cycle. The CPU also includes a new bus architecture and an integrated 3 channel DDR3 memory controller, which gives it a screaming 25.6 GB/sec throughput bandwidth. The speed comes at a high price tag though, as a 2.66Ghz CPU, a Asus P6T motherboard, and 6GB of ram cost around $630. Throw in an PCI Express 16x video card, a full tower case, and two 250GB SATA drives, and the build costs just under $900 without a monitor or any peripherals.

Since the new machine is suppose to last for a while, I decided to try and use the new Windows 7 as the operating system. Windows 7 is supposedly going to be released later this year, but Microsoft just opened up the Window 7 Release Candidate for public consumption. This means that you can download and install Windows 7 and use it for free until March 2010, at which point the computer will start rebooting every 2 hours. I figure that 10 months is more than enough time to decide if it is worth purchasing a license or not, so I might as well give it a go.

I got all the parts yesterday and, having built 10-20 machines a day when working for a computer wholesaler, I was able to finish the build during my lunch hour. It helps when the motherboard has everything built right in, but also a lot of the old headaches with installing CPU fans and hard drives are much easier now days. It was interesting that just about every single interface has been upgraded since my last build: ATA to SATA, PCI to PCI-Express, DDR to DDR2 to DDR3, 2 channel audio to 8 channel audio now with digital and optical outputs. These were not very popular the last time that I built a machine, which makes me wonder what each of the interfaces will look like 5 years from now.

Sadly some of the old headaches are still there. I spend a few hours trying to get the Windows installation disk to recognize the SATA drives that I had setup in a RAID 1 array. In the old days you had to put the controller drivers on a floppy disk and press F6 to load them into the setup program. Now the setup will prompt you for drivers if no hard drives are found and can load them from a USB flash drive, but the board has 2 SATA controllers and the DVD comes with 4-6 versions of each, so it took a lot of fiddling with cables and drivers before I finally found one that worked. If anyone else is interested, here are instructions for setting up Windows 7 64 Bit on an ASUS P6T Motherboard using an Intel IHC10R RAID array.

I finally got Windows 7 loaded around 11PM last night, but I haven't had much time to play around with it yet. The next steps will be to install the tuner card and start reloading all of the programs that I use, but that will have to wait for the weekend. I have however been very impressed with how quiet the machine runs. There are a total of 4 fans (2 case, 1 power supply, and 1 for the CPU) and all of them are larger than 80mm, which give them great air flow at a whisper quiet speed. I also opted for a video card with a passive cooling design, since the lack of a fan keeps the noise down and is one less moving part that might break. Hopefully that means the system will be running strong for a long time.

Monday, May 4, 2009

Torpig Botnet Analysis - When was the last time you changed your passwords?

Like many people, I spend a lot of time online. According to Rescuetime.com, I have averaged over 6 hours a day online, just a work. Add another 4-6 hours a day surfing at home, and my life is pretty well intertwined with the World Wide Web. I would consider myself smarter than the average bear when it comes to "safe-surfing" online, but even so I have on occasion found myself removing spyware or malware from my computer. These programs live on the Internet, so pretty much anyone that spends any significant amount of time online will run into them eventually.

With that in mind, it is very interesting to read about a team of researchers from the University of California, Santa Barbara that took over and analyzed the Torpig botnet for a period of 10 days. During the takeover they analyzed the data being sent to the control servers and made some very interesting observations. A full report of their findings is available here, and it is a fascinating read for anybody interested in the activities of online criminals. Once a machine was infected, the control servers started receiving usernames and passwords for email accounts and banking websites, as well as all other HTML form data such as webmail and posts on forums or social websites. Over the course of 10 days the researchers collected over 70 GB of total raw data from over 180,000 infected computers.

One of the interesting finding was how much personal information they were able to find out about by combining the online identities sent from individual machines. "For example, Torpig records a user logging into his LinkedIn account. His profile presents him as the CEO of a tech company with a large number of professional contacts. Torpig also records the same user logging into three sexually explicit web sites." Most people think that their identity is private when surfing online, but if your machine gets infected with a virus or Trojan, all privacy goes out the window.

In their conclusions they state that while better relations between security researchers and domain registrars could help solve some of the problems, the issue is fundamentally a cultural problem, since many people use the same weak password for all of their online activities. Also, there needs to be better education about how to be safe online. "Even though people are educated and understand well concepts such as the physical security and the necessary maintenance of a car, they do not understand the consequences of irresponsible behavior when using a computer." Of course, some of the criminals are getting pretty good at creating exploits that look legitimate. For instance, Torpig can be used to inject HTML forms into real banking webpages to create very legitimate looking exploits like the one shown here:

wellsfargo-injection

I come across these type of stories every once in a while, and they remind me to do simple things like periodically change the password to my email account and high security websites. This is something that everybody should do at least once a year, if not more frequent than that. When combined with a firewall, decent antivirus software, and safe browsing habits, changing your password periodically should help keep you from becoming a victim online.

Friday, May 1, 2009

Resolver One Review - 4 months with a .NET enabled spreadsheet and IronPython scripting

Spreadsheet programs were one of the first killer applications for the personal computer, as they allowed users to quickly calculate a large number of values using a simple and easy to use grid system. In 1979 VisiCorp released VisiCalc for the Apple II, and while this was way before my time, I have heard it revered as one of the major turning points into the era of computers being used for day to day business use. Since 1979 few changes have been made to the spreadsheet paradigm. Lotus created the Pivot Table in 1991 (Pivot charts and graphs came later), which greatly helped in creating interactive tables and graphs for analyzing data. Many mathematical and statistical functions are available in modern spreadsheet applications, but most of them rely on old and outdated scripting languages that can be a hindrance when analyzing complex data.

Resolver One is a new program that aims to make things easier by building the spreadsheet using the IronPython 2.0 scripting language. IronPython is an implementation of the Python scripting language built on top of the Microsoft .NET CLR and DLR runtimes, which means that it is able to easily interact with code from C# and other .NET languages and can make use of any existing .NET library or assembly. It also can perform complex operations such as web service calls, interact with external files or databases, and make use of the Ironclad NumPy library for complex data manipulation. Since it is based on the .NET framework you can embed custom IronPython or .NET objects into the cells instead of just storing numbers, dates and text. This can be very helpful as it allows you to apply Object Orientated Programming techniques to create powerful and easy to maintain spreadsheet.

I first came across Resolver One last year when Larry O'Brien mentioned the $25,000 Resolver One Challenge in one of his posts. I don't use spreadsheets much in my day to day work, as most of the work that I do is with databases. I do however enjoy playing poker, and the opportunity to learn IronPython, play with Resolver One, and possibly win some money was too good to pass up. Initially things got off to a bit of a rough start, as dynamic python is a vastly different programming model to the static .NET that I am use to using. Also IDE support for IronPython in Resolver One is pretty basic, and there isn't a lot of advanced features in external IDEs either. However after a week or two of prototyping I had enough knowledge to put together a basic spreadsheet for evaluating Texas Holdem poker hands. The spreadsheet uses a .NET library that treats each hand as a 52 bit binary mask and allows you to quickly score and rank hands accordingly. Using this library in Resolver One was as simple as adding an import statement, and by creating a custom GridHand object I was able to group the 1,326 unique starting hands into 169 distinct groups displayed in a grid. The spreadsheet could then use colors and cell comments to highlight which hand would be able to beat your pocket cards and give a full listing of win/tie/loss statistics in a matter of seconds.

When I first submitted it to the competition, the spreadsheet required a full 7 card hand (2 player pocket cards, and 5 on the board) to display the win percentages. Pre-calculated Preflop odds were also displayed before the hand was dealt, but other than that the features were quite basic. For the second submission, I altered the spreadsheet to use a Monte Carlo algorithm to allow predicting the win percentages on the Flop and Turn after running a few hundred trials on each group. On slower machines this might take up to 10 seconds to complete, but on a modern dual core machine it would only take a second or two to recalculate the spreadsheet. I now thought that this was the most amazing spreadsheet ever created, but sadly it did not win the competition. In my rush to win glory and fame I had neglected to pay any attention to the UI design or add any documentation about how to use the spreadsheet. It turns out I wasn't the only one.

At first I was a bit distraught, but after taking a step back it was clear to see that there was still work that needed to be done before it was a respectable spreadsheet. I was happy with how the spreadsheet worked, so I turned focus to creating a better User Interface and adding a built in help system. Since Resolver One is built on the .NET framework, you can add things like windows forms and controls. After a bit of fiddling with this project I was able to create a basic multi-tab embedded browser that would appear when clicking a Help button on the sheet. The browser ran in it's own thread, and you could program which pages would appear in each tab. This allowed creating HTML based help files that could be posted online or embedded directly into the spreadsheet. I also attempted to create a full tutorial system where clicking a button on a help page would perform an action on the spreadsheet (change value, highlight cell, etc), but I ran into some problems with changing the spreadsheet from a separate thread and was not able to finish this feature. I submitted the new Texas Holdem Monte Carlo Simulator spreadsheet to the competition with documentation, and was very happy with the results!

Overall I have to say that Resolver One is a very innovative program, and while there are still some rough edges I can see a lot of potential for this to be a new killer application for spreadsheets. The basic features are very easy to use, and if you know Python or have a programming background you should be able to code up whatever special functionality you want. It reminds me a lot of when I first started using Powershell, and it is definitely something that I will keep in my arsenal of tools for attacking complex problems.