Archived posting to the Leica Users Group, 2014/07/01
[Author Prev] [Author Next] [Thread Prev] [Thread Next] [Author Index] [Topic Index] [Home] [Search]Way too technical for me and on the verge of poetry as a result ;-) But OMG thank you Brian for restoring the community to its normal state. Well, if the LUG has ever a normal state :-) Mille mercis et bravo au carr?. Amities Philippe Le 2 juil. 14 ? 06:24, Brian Reid a ?crit : > In case you care. > > Server computers that are engineered for reliability have two power > supplies and two power cords. Power supplies are the most frequent > component to fail in server computers, so having two of them makes > it survive the outage of one. > > The server computer that had supported the LUG had two power > supplies. They were stacked vertically, one on top of the other. > Both power supplies had been running 24x7 for about 9 years, and > their fans had sucked in a certain amount of lint. Lint is > flammable. The bottom power supply failed, and the lint caught fire. > The flame rose to the upper power supply and ignited its stored-up > lint also. Like firestarters in a Franklin stove, the 20-second > burst of flame was enough to ignite the various flammable items > (including lint) in the main enclosure. The flash fire probably only > lasted 40 or 50 seconds, but it was hot enough to destroy most of > the solder traces that were near the power supplies on the circuit > boards. There were various plastic tags on some of the cables, which > added flammable material. > > You can go to the store and buy a laptop or a desktop computer, but > you really can't go buy a server computer. Yes, this being silicon > valley, there are stores around that sell server computers (Central > Computer is the best of the lot) but buying a server computer at a > retail store is like buying a bicycle at a department store. It's > just not the same thing. Server computers are special-order, because > there are so many variations on how they are built that no one can > afford to keep good ones in inventory. > > The fire was on a Saturday morning, and I knew that the soonest I > could even place an order for a replacement server was Monday, and > even at rush-rush prices I wouldn't get it until Thursday. At the > time a Saturday-to-Thursday outage seemed unconscionable. So I > decided to move the LUG and its supporting software to the newest > and emptiest of my half-dozen servers. It wasn't exactly a spare--it > was running a few little things--but mostly it was idle. > > The LUG server had been running software from the era of its > installation, about 2005. The new server was built with chips and > components that the old software didn't understand, so I couldn't > just restore the LUG server backups onto the new server. They > wouldn't run. I had to get the new software working on the > replacement server and then manuall move over each piece. > > I made the mistake of believing the operating system documentation, > which detailed a function called "system upgrade". It was supposed > to work they way Mac or Windows updates work--you let it do its > thing for a while, and then you reboot and all is well. After > running the system upgrade, nothing worked any more, including the > few services that had been on that machine. After asking the > experts, I realized that I was going to have to wipe the machine, do > a clean install, get all of the necessary apps installed, and then > restore both sets of backups (LUG server and previous contents of > that server) to the clean system. > > So far this is not a crazy plan. I've done things like it many times > before, though the 9-year software update gap made for a few > challenges. > > Once I got all of the apps installed and the backups restored, I > immediately typed the command to turn it all on > /local/mailman/bin/mailmanctl start > and nothing happened. The error log showed a preposterous, deeply > hard to believe error message. > > The wise person's first step in debugging strange failures on > computers is to type the error message into a search engine (I use > Bing) to see if other people had asked about it. To my great > astonishment, no one had. This never happens. Somebody else *always* > has the same problem and has asked about it. > > I then started reading the source code of Mailman, trying to see > what circumstances would cause it to generate that message. Mailman > is written in a language called Python. When you are having trouble > like this, a good step is to explore "version skew". Mailman Version > XXX works only with Python Version YYY. The versions of Python that > are extant just now are 2.5, 2.6, 2.7, 3.2, 3.3, and 3.4. This is > an abnormally large spread of "current" versions, which usually > means that the language developers have made incompatible changes > and have to keep old versions around for apps that have come to > depend on them. > > I tried all 6 of those Python versions. I got the same odd error in > the 2.* versions, and absolute chaos in the 3.* versions. Since the > version of Mailman that I wanted to use (2.1.18) failed the same way > with all of the 2.* Python versions, I wiped the slate clean one > last time and installed Python 2.7. > > Gonna have to find this problem the old-fashioned way. > > Many days pass as I read documentation, run tests, explore the > software, use debuggers, create and read log files, all to no avail. > > Then I decided to instrument and log what was happening when Mailman/ > Python started up. Figuring out how much information to put in a log > file is a black art. If you log too much, you will never find what > you are looking for in the swamp of details. If you log too little, > you probably won't log what you're looking for. > > After far too much time staring at the logs, I saw that Python was > initializing from a library that was not listed in the Mailman > docdumentation. > > An aside: language systems like Python tend to be aggressive in how > they find libraries. They look around and if they find something > that looks like a library, they use it. I'm sure the Python > designers (none of whom is named Monty) thought they were doing the > world a favor by making it go out and find its own libraries. > "Autoconfiguration" run amok. Bad idea. > > This library was obsolete. In the 9 years of not upgrading, the > Mailman software had changed the place where it kept certain library > functions, and both of them were present in the version I was trying > to run. The "wipe clean and reinstall" function only wiped the > directories that it knew about, and this obsolete directory was not > on its list -- it had been retired years ago -- so it didn't get > removed by the "wipe clean" function. > > If I had run all 12 of the upgrades between Mailman 2.1.6 and > 2.1.18, one of them would surely have deleted that newly-obsolete > directory. But I didn't, so it was still there. > > When a complex computer system is using two different versions of > the same library, with creation dates 7 years apart, it doesn't > stand a chance of working. > > I typed the Unix command "rm -rf /local/mailman/Mailman/pythonlib/ > email" > which got rid of the ancient and incompatible library > and everything started working. Perfectly. > > There were hundreds of loose ends, and I spent the next week hunting > them down, but it wasn't taking 18 hours a day and LUG mail was > flowing while I did it. > > Thanks for listening. > Brian Reid > LUG Saloonkeeper and server wrangler > > > > > > _______________________________________________ > Leica Users Group. > See http://leica-users.org/mailman/listinfo/lug for more information One sees clearly only with the heart. What is essential is invisible to the eye. Antoine de Saint Exup?ry in Le Petit Prince. NO ARCHIVE