08.17.11
Posted in Uncategorized at 7:59 pm by david
Looking at the Microsoft Portable Executable and Commmon Object File Specification rev 8.2 (Sept 21, 2010)—which is the definition of the PE file format—I’m confused about the alignment of the entries in the Attribute Certificate Table.
- Page 58, section 4.7: “The attribute certificate table is composed of a set of contiguous, octaword-aligned attribute certificate entries.”
- Page 59, first paragraph: “Subsequent entries are accessed by advancing that entry’s dwLength bytes, rounded up to an 8-byte multiple, …”
- Page 59, algorithm step 2: “Round the value from step 1 up to the nearest 8-byte multiple …”
- Page 59, algorithm step 3: “… and round up to the nearest 8-byte multiple …”
- Page 60, last paragraph before the bullets: “If the bCertificate does not end on an octaword boundary, the attribute certificate table is padded with zeros, from the end of the bCertificate to the octaword boundary.”
So the documentation is confused. But, it also clearly says, on page 59, “If the sum of the rounded dwLength values does not equal the Size value, then either the attribute certificate table or the Size field is corrupted.” And on my sample, signed, executable, the Size field is a multiple of 8 but not 16, and WinVerifyTrust() says that the executable is authentic (and of course the loader will load and execute it).
So on the basis of this experimental evidence (one sample) I think we can conclude that the Attribute Certificate Table is octobyte aligned, not octoword aligned.
Permalink
07.11.11
Posted in Uncategorized at 1:19 pm by david
Here’s an example of documenting a broken feature as normal behavior - furthermore, hiding it in the documentation as a “remark”.
The issue is the SQL Server “TOP” feature to return the “top” rows of a query result. Consider the MSDN documentation here. The “TOP” keyword in a SELECT statement is used to return only the first N rows of a query, for some N (or alternatively, N%).
Naturally—and I really mean, of course, there’s no other reasonable way for it to work—if the SELECT query includes an ORDER BY clause then it is respected and the sort occurs before the “top N” rows are selected.
Now for some reason they let you put the TOP keyword in an INSERT, UPDATE, or DELETE statement after the main keyword. But in these cases, the ORDER BY clause in the subordinate SELECT is ignored!
E.g., given
INSERT TOP (2) INTO Table2 (ColumnB)
SELECT ColumnA FROM Table1
ORDER BY ColumnA;
an arbitrary two rows are inserted, not the first two rows returned after the sort! That’s REMARKable! But documented!
(There’s a workaround they show you: Put the TOP keyword after the SELECT keyword, not after the INSERT keyword. But why allow the other syntax to be at all if it leads to such obviously broken behavior? Alternatively: Why not fix it if you’re going to continue allowing that syntax?)
Permalink
05.03.11
Posted in Data Structures at 7:30 pm by david
Have you ever designed and coding something and got it working and felt really satisfied and then went to bed and woke up then next morning with the sparkling clear thought in your mind that all of that work was unneeded, that there was a much better and much simpler way to do the exact same thing? Did that happen after you had blogged about your cool piece of work? Well, it just happened to me.
Well, in my previous post I carefully justified a queue that directly contained primitive types, in order to have a queue that could hold hundreds of millions of elements. That’s alright, and the discussion of memory consumption and when to optimize it is good, but …
It is now clear as can be to me that the right way to handle this queue is to write my tuples-of-primitive-types to a file, not keep them in memory at all, and to run the file as a queue. The code is very simple - here it is: ExternalQueue.java. Basically, I open a RandomAccessFile and keep track of head and tail filePointers. The queue class is parameterized by a small interface that the caller must pass in to the constructor, where the caller takes responsibility for serializing and deserializing his element objects to via a DataInput or DataOutput. (This isn’t official Java serialization, where objects are tracked. This queue is meant largely for “struct-like” things.)
This simple class is enough for my needs but it has a limitation that might need to be fixed, depending on the usage. The external file grows to contain all the elements that have ever been written to the queue. It is only truncated when the queue becomes empty. For my application (the retrograde analysis of the 15-puzzle) the queue never becomes empty until the algorithm terminates, yet this is fine since all the enqueued elements will still fit on disk (it should take no more than 5-7Gb). But in other applications it might be desirable to enhance this queue to start using alternate files once the files reach a certain threshold. Then you can start deleting files once the head pointer (next element to be dequeued) advances past the end of a file and you rotate to the next one.
Permalink
05.01.11
Posted in Data Structures at 8:53 pm by david
Data structures in Java can take more memory than equivalent structures in C++ or C#, for various reasons, including general per-object overhead, and the dichotomy between primitive types and objects. For many applications that doesn’t really matter, but for some the excess memory usage in Java is critical and can mean the difference between success and failure.
I’m studying combinatorial search techniques now, using (for some reason) Java. At this point I’m using retrograde analysis to compute pattern databases for the 15-puzzle. (Retrograde analysis = searching backward from the goal state.) The easiest algorithm for this uses a breadth-first search of all positions from the goal.
Breadth-first search is done by enqueuing states-next-to-search onto a queue, and processing them one by one off the queue. For a lot of problems the queue size gets prohibitively large and can’t be used, which is why IDA* and other algorithms that go depth-first have been developed.
For building pattern databases for the 15-puzzle the queue can get quite long, but should still be tractable if care is taken. For Java, in particular, you can’t just blindly use the standard collection classes.
The two main Java classes that implement the interface Queue<E> are ArrayDeque, which is based on a Java array, and LinkedList, which is a typical linked list with external links.
Suppose, for the sake of argument, that we’re working with a 32-bit OS, and with queues of 100 million elements.
Well, if you’re talking 100 million distinct objects you’re already in trouble. Each object takes 16 bytes, minimum, and your 100M objects will need at least 1.6G of memory, more heap than you can get (with the Sun Hotspot VM). But maybe you’re talking primitive types, like long. (A 15-puzzle state will fit into a long - a 64-bit word: 16 tiles positions of 4 bits each.) 100M longs will fit in 800M bytes of memory, if placed in an array, so your queue could work.
Except for a few things. First, both ArrayDeque and LinkedList use as their representation type the type that they’re instantiated with. Generics in Java can’t be instantiated with primitive types, so you need to use, e.g., Long instead of long. This means boxing all of the longs you want to put in the queue, which means you’re back to separate objects of 16 bytes instead of array slots of 8 bytes. (And of course, things are worse, proportionally, if you want a queue of int or byte. (Why would you want a queue of 100M bytes given that there are only 256 unique values for a byte? Answer: If you really want to have a queue of a tuple of long and byte, and you’re going to run it as two queues of primitive objects, rather than one queue of a reference type. Which is the case for the breath-first search in the 15-puzzle.)
In addition to your boxed elements, the LinkedList has a 16-byte object for every element in the queue. So each element in the queue actually takes 32-bytes, and furthermore, is a separate object to manage.
That point is also important: In an experiment I ran with a 1Gb heap space, the test program started thrashing in the garbage collector and made no further progress after only 33,740,000 Longs were allocated and put into an ArrayDeque. (That’s only 540Mb, there should have been plenty of space left.)
Anyway, to make a long story short, I implemented a class, CompactQueue, that works with either primitive types or reference types, and, if using a primitive type, stores the enqueued elements directly into an array without boxing them. The operations of add() and remove() are constant time. (By the way, this isn’t true of ArrayDeque where add() is only amortized constant time because of the need to occasionally reallocate the array holding the elements, if the queue size increases.)
Using this class, and the heap size set to 1000Mb, I was able to put 126M longs, or 1012M bytes, into the queue. And there’s no GC performance problem caused by having 126M separate objects to manage (252M for the LinkedList!).
The class uses two techniques: First, it is parameterized by a array wrapper class that provides a factory for arrays of primitive (or reference) type, and array-like get() and set() operations. And second, it uses many smaller arrays (”blocks”) to store the elements of the queue, instead of one large array (as in ArrayDeque) or an object-per-element (as in LinkedList). This means that it can smoothly expand to take all necessary (available?) memory without hitting a roadblock when the array size doubles (as in ArrayDeque).
Code is provided here. (Note that only the array wrapper classes for byte and long are provided, array wrapper classes for the other primitive types are left as an exercise for the reader.)
Permalink
03.26.11
Posted in Uncategorized at 5:05 pm by david
Some time ago I read Jeremy Gibbons’ article “Unbounded Spigot Algorithms for the Digits of Pi” and liked it a lot- the problem was amusing, and the article was easy to underestand.
A spigot algorithm is an algorithm for producing digits of an unbounded sequence without reusing digits after they have been computed. The digits are produced “one by one, as if from a leaky tap” (Gibbons). An unbounded spigot algorithm doesn’t need to have some predetermined number of digits to run on—it’ll just keep going and going until you run out of memory.
I was recently experimenting with programming an FPGA and thought of this paper; my idea was that a little coprocessor (the FPGA) hanging off my server could just start producing digits of pi and keep running indefinitely. Even though it isn’t the fastest way go generate digits of pi (not by a longshot) it still seemed like fun. And I thought of a nice “agent” architecture where I’d take the digits and run them past an array of recognizers, each one to do some kind of statistics on the digits of pi, or look for special sequences like ten digits in a row made of one each of the ten digits. Those recognizers would all run in hardware, in parallel.
I remembered there was a “catch” about spigot algorithms, so I reread the paper. And the catch is that you obviously can’t generate digits of pi, an irrational (also transcendental of course), from a finite process. So even though the algorithm can be easily specified and for sure will generate one digit at a time, the implementation might need to use arbitrary amounts of memory to produce that digit. In fact, the memory complexity is hidden inside the bignum rational arithmetic that the algorithm uses.
Now, this wasn’t necessarily a stopper. I thought that maybe the problem was that as digits were generated the bignum rational arithmetic was generally done on reasonably sized bignums (maybe several thousand bits of numerator or denominator), and only every once in awhile did you have a few terms which blew up to much larger sizes. In that case you could mainly run the algorithm on the FPGA but which, when it discovered a multiplication that resulted in a too-large-for-the-hardware result, would trap back to the server which would run that particular multiplication with much more resources available, and would then pass the result back to the FPGA, which could continue.
But no, that wasn’t to be. I experimented with an implementation in Java and discovered that the bignums get larger and larger from the very beginning, and even if you not only reduce each bignum to smallest terms, but also do the same for the 2×2 matrix that the algorithm uses, there’s no use. You’re doing really big bignum arithmetic after the first few terms.
Anyway. I was happy to see that the Haskell algorithms in the paper transferred very easily and very directly to Java (once I supplied a BigRational class). Here’s the code: spigot.tar
One curious bug kept me puzzled for hours: I got bit by a 32-bit integer overflow. Turns out if you’re computing 3(3i+1)(3i+2)(5i-2) for i from 1 to ∞ … well, that simple expression overflows along around i = 280. I didn’t expect it at all, and looked everywhere else, in my bignum-using expressions for it.
At this time I’d like to point out a couple of typographical errors in the programs in the Gibbons’ paper cited above, “An Unbounded Spigot Algorithm for Pi”, so you can save some time if you want to investigate this algorithm yourself.
- pg 7, definition of
extr—the / should be %
- more crucially, pg 9, definition of
next—+15 should be -12, also in conjecture 1, definition of n
Permalink
Posted in Uncategorized at 2:39 pm by david
A classic interview coding question is—or at least used to be—to write a program to generate all the permutations of a string.
I used to like to get it because it had a simple and elegant recursive solution that was easy to right the first time, when writing it on the whiteboard. And coming up with a good recursive solution in an interview used to impress interviewers, especially if you explained it nicely in a “mathematical induction” sort of way.
Recently I had two additional thoughts about that easy solution. The first was that recursive solutions were sort of the epitome of procedural programming, but what would be the object-oriented variant, and would it be as easy to write it correctly at the whiteboard?
The second thought was a bit more interesting: suppose you actually needed a permutation generator in your program (I never have) then you probably would not want the recursive solution anyway, which would generate all permutations—and process them—before returning to the caller. Instead, you’d want a demand-oriented, that is, iterator-style, solution that you could pull solutions out of when you wanted to. That sort of solution would be more difficult to express in a procedural language, and you couldn’t easily change a recursive solution into an demand-oriented solution, but the object-oriented solution might be easier to adapt. And could you still do it at the whiteboard?
I was going to make a blog post about this—but it turned into quite the megilla and I put it on CodeProject as a tutorial article, here.
In that article I not only show the object-oriented data-pipeline version that corresponds directly to the procedural (recursive) solution, but I also show two variants of it in “pull-style” as a iterator.
However, here is the object-oriented solution to generating permutations. It is in fact as simple as the recursive solution, and perhaps even easier to get right the first time while standing at the whiteboard. I don’t know why it hasn’t been seen, as an example, earlier.
private static final class PermuteStep implements Action {
private final Action a;
private final char c;
private PermuteStep(char c, Action a) {
this.c = c;
this.a = a;
}
public void next(String s) {
for (int i = 0; i <= s.length(); i++) {
String n = s.substring(0, i)
+ c
+ s.substring(i, s.length());
a.next(n);
}
}
}
public static void Permute(Action a, String p) {
// (Argument validation elided for clarity)
Action as = a;
for (char c : p.toCharArray()) {
as = new PermuteStep(c, as);
}
as.next("");
}
Permalink
04.13.09
Posted in Uncategorized at 6:29 pm by david
Long time no update this blog. I intend to fix that. But for now, I’d like to point to two articles I posted at CodeProject:
- Tracing Events Raised by Any C# Object—in which I describe a technique for tracing the events of any C# object using a very simple helper class, using .NET Reflection to get the event handlers of an arbitrary object.
- Password Field Unhider (and some C++ utility classes)—first I present a small utility that lives in the Windows notification area and stands ready at any time to unhide (that is, unmask) any password field on the screen, so you can see what you’re typing. And second, I describe some very simple yet useful C++ utility classes: a general message pump, an IPC mechanism using
WM_COPYDATA, and a work item dispatcher.
I intend to post more articles at CodeProject, the kind of useful tips, tutorial, explanation things, with source code, that are longer than the typical blog post.
Permalink
11.11.07
Posted in Uncategorized at 5:43 pm by david
You can use Word 2007 features to generate very nice looking mathematical notation. This feature is, for all practical purposes, completely undocumented by Microsoft. However, some information has been published by Microsoft employees and others on the web. This post is meant to serve as a convenient directory of that information. (This post will be updated as I learn more about equations in Word 2007.)
- To enter equation mode: Insert|Equation or Alt+= shortcut. Note that the Insert ribbon doesn’t have the Equation item when in Blog mode. Why not? (Question: How do I get it to appear in Blog mode?)
- Dataninja: Undocumented Word 2007 Equation Shortcuts—John Gardner created this very nice reference card with some useful equation formatted tips.
- Word Team Blog: Equations in 2007—an introductory post, with links to two screenshot-videos on how to use linear method. Unfortunately – she doesn’t explain, while she is typing, how to enter equation mode or how use the keyboard to move the cursor from one insertion field to the next. (This is currently the only post on the Word Team blog with the tag “equations”.)
- Word 2007 Help: Math AutoCorrect Symbols—I cut&pasted this from the Word 2007 Help—it is more easily used in this format.
- TechRepublic: Microsoft Office Word 2007 Inside and Out sample chapter on Building Blocks—TechRepublic offers this sample chapter of Microsoft Office Word 2007 Inside and Out—the chapter is all about Building Blocks, which is what the equations gallery is made of. It explains a bit about Math Autocorrect mode, linear equation entry, and how to add your own equations to the gallery. This chapter is pretty good—the book might be worth getting.
- Word Team Blog: Equation Numbering—a post on how to number the equations in your document. There is a video here too. (This is currently the only post on the Word Team blog with the tag “equations video”.)
- Murray Sargent: Math in Office: Using Math Italic and Bold in Word 2007—How using the ribbon’s italic and bold formatting buttons provides the proper math italic and bold characters for variables.
- UTN 28: Unicode Nearly Plain-Text Encoding of Mathematics—this document, an Unicode consortium Technical Note written by the Microsoft developer who implemented the feature, is a complete description of the linear entry method.
- Murray Sargent: Math Selection—a brief note on how selection works inside an equation, and the related post Murray Sargent: Using Left/Right Arrow Keys in Mathematical Text on how the insertion points works inside an equation.
- Murray Sargent: Breaking Equations Into Multiple Lines—A nice description of how to break equations onto multiple lines, and also how to align multiple equations on a specific character.
- David Carlisle: XHTML and MathML from Office 2007—David Carlisle provides instructions and an XSL stylesheet so you can take the HTML output of Word 2007 and run it through his process to get an XHTML document that has the math equations in MathML format (normally Word 2007 saves equations in “ECMA Math” format, OMML—apparently a Microsoft invention). Note that Word allows you to cut/paste MathML to/from the Clipboard (so you get get equations into or out of Mathematica, for example).
- Murray Sargent: User Spaces in Math Zones—On typing spaces into equations: Just don’t do it!
Interesting, but not as practical:
General places to look for information:
Permalink
11.08.07
Posted in Concurrency, Race Conditions at 6:05 pm by david
Here is a useful paper that characterizes race conditions formally: What Are Race Conditions? Some Issues and Formalizations (Robert H. B. Netzer, Barton P. Miller) [ACM Letters on Programming Languages and Systems v1n1, March 1992).
Netzer (who wrote his PhD thesis on this subject) classifies data races into two categories: general races, which pertain to programs which are meant to be deterministic, and data races, which pertain to programs which are non-deterministic. Then he also presents an orthogonal classification of races: feasible races which “capture the intuitive notions desired for debugging” but which are hard to compute completely and accurately, and apparent races which “capture less accurate notions” which can be detected in practice, but which are sufficiently less accurate that they tend to swamp the user with false positives.
So: general races cause non-deterministic execution in programs intended to be deterministic, and data races cause non-atomic execution of critical sections in non-deterministic programs. Thus both kind of races can cause failures in programs.
In fact, in my most recent job, I dealt with both kinds of races in the application program we were building.
In fact, both general races and data races are important concepts. Your typical Windows application program is non-deterministic – execution depends on the precise timing and ordering of input events (keystroke, mouse movement, and system messages) – but also contains large sections that are intended to operated deterministically (e.g., if in Photoshop you load a certain image file, and execute a specific filter with specific parameters on it, and then you save the resulting image in another file, then the result should be the same each time you do it even if the timing of your mouse movements differs from run to run.)
(Note that the “critical sections” that are violated by data races need not be the operating system-provided primitive, like CRITICAL_SECTION objects in Windows. It just means any bit of code that implements—or is supposed to implement—mutual exclusion.)
(By the way, I found this article somewhat difficult to understand: the differences between the feasibledata races and apparent races, which are the key to Netzer’s classification, were hard to grasp. Also it wasn’t really clear what he meant by feasible execution. Finally, the writing was repetitive in places.)
09.07.07
Posted in Bakin's Bits at 3:36 pm by david
I was configuring a new computer to be used for testing concurrent software, and was using my standard self-guidelines: second-fastest processor available (a nod to economy), as much DRAM as I can jam on a motherboard, and the latest dual-graphics card technology. Whohoo! But then I found this site on building a very economical cluster system and I realized my guidelines were old-fashioned. I’m now in the mood to build my own micro-Beowulf, so I can experiment with parallel clusters as well as multicore concurrency.
Check it out: The system described produces 26Gflops at a cost (August 2007) of $1256! It consists of 4 microATX motherboards, each with a dual core CPU and 2GB RAM, 4 power supplies, 1 hard disk, and 1 8-port gigabit switch. The “structure” is scrap plexiglass and threaded rods – definitely minimal! – and the whole thing is 11″ x 12″ x 17″! Kudos to Professor Joel Adams and his student Tim Brom for designing, building, configuring, and benchmarking this small Beowulf.
Here’s another such system – LittleFe. And here is a homebrew 10-node system from 2000, with the same idea w.r.t. minimal packaging.
(My main conclusion about my self-guidelines: I don’t need even the second-fastest processor anymore. Nearly any current processor is fast enough for development purposes, compiler and system bloat notwithstanding. This system uses cheap multicore processors, a reasonable amount of memory for each node, and doesn’t need anything more than the built-in motherboard graphics. I would still like a system with a hot new graphics card however, so I can experiment with GPGPU.)
Update Sept 18 2007: Lot’s of people are doing work in this area—which will make it easy to get started! Here are some more links:
ParallelKnoppix - A LiveCD that let’s you boot up an MPI cluster in 5 minutes!
And on the ParallelKnoppix site, some user’s have sent in pictures of their clusters—lot’s of different (and primitive, yet working) building techniques here!
And this page from Dec 2005 describes how some guy built a “mobile wireless linux cluster” (2 nodes) in order to have access to “big computer resources” while exploring a cave, mountain climbing, a weekend trip to the mountains, or who knows what else.
Permalink
Posted in Book Review at 3:21 pm by david
The typical data structures most programmers know and use require imperative programming: they fundamentally depend on replacing the values of fields with assignment statements, especially pointer fields. A particular data structure represents the state of something at that particular moment in time, and that moment only. If you want to know what the state was in the past you needed to have made a copy of the entire data structure back then, and kept it around until you needed it. (Alternatively, you could keep a log of changes made to the data structure that you could play in reverse until you get the previous state - and then play it back forwards to get back to where you are now. Both these techniques are typically used to implement undo/redo, for example.)
Or you could use a persistent data structure. A persistent data structure allows you to access previous versions at any time without having to do any copying. All you needed to do at the time was to save a pointer to the data structure. If you have a persistent data structure, your undo/redo implementation is simply a stack of pointers that you push a pointer onto after you make any change to the data structure.
This can be quite useful—but it is typically very hard to implement a persistent data structure in an imperative language, especially if you have to worry about memory management1. If you’re using a functional programming language—especially a language with lazy semantics like Haskell—then all your data structures are automatically persistent, and your only problem is efficiency (and of course, in your functional languages, the language system takes care of memory management). But for practical purposes, as a hardcore C++ programmer for professional purposes, I was locked out of the world of persistent data structures.
Now, however, with C# and C++/CLI in use (and garbage collection coming to C++ any time now …2) I can at last contemplate the use of persistent data structures in my designs. And that’s great, because it gave me an excuse to take one of my favorite computer science books off the shelf and give it another read.
The book is Purely Functional Data Structures, by Chris Okasaki. I find it to be a very well written and easy to understand introduction to the design and analysis of persistent data structures—or equivalently—for the design and analysis of any data structure you’d want to use in a functional language.
There are two key themes of the book: First, to describe the use and implementation of several persistent data structures, such as different kinds of heaps, queues, and random-access lists, and second, to describe how to create your own efficient persistent data structures.
Read the rest of this entry »
Permalink
08.21.07
Posted in Bakin's Bits at 3:22 pm by david
I—David Bakin—am an experienced software developer with over 25 years experience developing system and application software.
My aim with Bakin’s Bits is to write occasional blog posts, articles, and book reviews related to my software interests, and whatever I’m working on understanding.
My main interests are:
- issues in concurrency, especially correctness, debugging, and appropriate design
- functional programming—I know it is fun, I know it is powerful, but can I construct a business case that will convince a development manager to let me put it into a product?
- GPGPU programming and other forms of stream programming, including
- massively scalable systems that fit in Google’s map-reduce paradigm or another similar functional pipeline
- fun in programming—not just fun as in functional, but also fun problems, and fun algorithms.
Thanks for checking in! — Dave
Permalink