You might think that this could be avoided by having a professionally managed environment. At Notre Dame, we have a site license for Red Hat Linux, and our staff are pretty rigorous in keeping everything up to date and on track. But even then, you can't assume everything is identical: there is no way to upgrade everyone simultaneously, and every machine operates on a different schedule (and discipline) for picking up automatic updates. For example, we are currently in the tail of of a general campus migration from Red Hat 4 to Red Hat 5.
Here is some hard evidence. We recently started using the neat 'cron' feature in Condor to make a daily observation of the operating system version, kernel version, and C library version of each machine. With a few variations on condor_status, we can see the upgrade status of the whole system:
The major release numbers (below) aren't too bad. About 3/4 of our cores are running the latest Red Hat, but another 73 machines are behind by a version or two. And, oops, looks like someone plugged in their own personal CentOS machine. Not too hard to deal with, if you are careful to put 'redhat_version' in your requirements:
% condor_status -format "%s\n" redhat_version | sort | uniq -c | sort -rn
782 Red Hat Enterprise Linux Server release 5.4 (Tikanga)
27 Red Hat Enterprise Linux AS release 4 (Nahant Update 7)
26 Red Hat Enterprise Linux Server release 5.3 (Tikanga)
10 Red Hat Enterprise Linux AS release 4 (Nahant Update 8)
10 Red Hat Enterprise Linux WS release 4 (Nahant Update 7)
4 CentOS release 5.3 (Final)
If we go a little deeper, the picture gets murkier. Below are the distribution of Linux kernel versions. Interesting to note that a few are hand-modified for some unusual hardware, and only two are Xen virtualized. Hope that you don't have any code sensitive to the kernel version.
% condor_status -format "%s\n" kernel_version | sort | uniq -c | sort -rn
342 2.6.18-164.9.1.el5
294 2.6.18-164.el5
94 2.6.18-164.10.1.el5
32 2.6.18-164.11.1.el5
32 2.6.9-78.0.13.ELsmp
14 2.6.18-128.7.1.el5
12 2.6.18-164.6.1.el5
10 2.6.18-128.2.1.el5
6 2.6.18-164.2.1.el5
5 2.6.9-78.0.17.ELsmp
4 2.6.27.8-md-microway
4 2.6.9-89.0.20.ELsmp
2 2.6.18-128.4.1.el5
2 2.6.18-164.9.1.el5xen
2 2.6.9-78.0.5.ELsmp
2 2.6.9-89.0.16.ELsmp
2 2.6.9-89.0.9.ELsmp
For completeness, here is the distribution of glibc versions, which has much the same story:
% condor_status -format "%s\n" glibc_version | sort | uniq -c
452 glibc-2.5-42.el5_4.2
296 glibc-2.5-42
34 glibc-2.5-42.el5_4.3
24 glibc-2.3.4-2.41
16 glibc-2.5-34.el5_3.1
14 glibc-2.5-34
13 glibc-2.3.4-2.41.el4_7.1
6 glibc-2.3.4-2.43
4 glibc-2.3.4-2.43.el4_8.1
In the good old days, you could just indicate that a program required OpSys=="LINUX" and more or less expect it to run. That certainly isn't possible now. Perhaps we are misleading users by talking about this thing called Linux, which doesn't really exist in any consistent form. Instead, we should be telling our users that a new operating system gets invented every week, and is usually named after a team on Survivor.
The good folks at Sun tried to solve this problem almost 20 years ago with Java. The idea was that they would create a stable platform that could be implemented on any machine. Then, you could write programs that would be universally portable. The problem was, well...
% condor_status -format "%s " JavaVendor -format "%s\n" JavaVersion | sort | uniq -c | sort -rn
308 Sun Microsystems Inc. 1.6.0
222 Sun Microsystems Inc. 1.6.0_15
174 Sun Microsystems Inc. 1.6.0_17
52 Free Software Foundation, Inc. 1.4.2
28 Sun Microsystems Inc. 1.6.0_18
3 Sun Microsystems Inc. 1.5.0_17
2 Apple Computer, Inc. 1.5.0_19
Many people think the grand solution to this problem is virtual machines. Perhaps, but more on that next time.
No comments:
Post a Comment