trip logs / gnuvola

Trip Log 2022-01-02 h13 -- Indices Style Upgrade Part 12

The Indices Style Upgrade series has finally reached the Last Patch, number 8, “Compute tags sort keys exactly once”.  This is another “internal” change, very focused.  You don't need to grab the accompanying tarball since we are again preferring context diffs to the unified diffs in the tarball.  (But, hey, do what you want — maybe look at them both!) 

We can guess the nature of the patch from its title.  Whenever you see “exactly once” billed as an improvement, that usually means the previous way wastefully redid some computation over and over again.  Basically, the change is to rework things to do the computation once and save the result again for later use.  That's exactly the case here. 

As you may recall, the procedure ‘most-impactful-first’ does a lot of stuff, but in the end, it is the ‘more-impactful’ procedure that does the two steps: (a) extract aspect information (the so-called “sort keys”) from the items; (b) use that information to compare them.  If extraction were cheap, maybe it would not be a big deal, but we know that one of the aspects, the “count”, requires calling procedure ‘length’, which has O(n) complexity.  Thus, it is (a) that patch 8 attacks.  The reasoning is: Why extract the same (unchanging) aspect information again and again for each comparison?  Better to do (a) once, and leave the bulk of the effort to be applied towards (b). 

Patch 8 has two hunks, so let's look at hunk 1, prefixed w/ the corresponding change description.  Lines 25-34 is the before portion and lines 35-55 is the after portion. 

 9	* sub/tl-mkindex (most-impactful-first wrap): New proc.
10	(most-impactful-first unwrap): Likewise.
11	(most-impactful-first most-impactful): Don't use
12	‘tag+count+latest’ on proc args to obtain key fields;
13	instead, take args as a thunk that yields key fields.
25	*** 219,227 ****
27	  (define (most-impactful-first tag+count+latest)
29	    (define (more-impactful a b)
30	!     (let-values (((a-tag a-count a-latest) (tag+count+latest a))
31	!                  ((b-tag b-count b-latest) (tag+count+latest b)))
32	        (if (not (= a-count b-count))
33	            ;; more refs
34	            (> a-count b-count)
35	--- 219,238 ----
37	  (define (most-impactful-first tag+count+latest)
39	+   (define (wrap ls)
40	+     (map (lambda (x)
41	+            (let-values (((tag count latest) (tag+count+latest x)))
42	+              (cons (lambda ()
43	+                      (values tag count latest))
44	+                    x)))
45	+          ls))
46	+ 
47	+   (define (unwrap ls)
48	+     (map cdr ls))
49	+ 
50	    (define (more-impactful a b)
51	!     (let-values (((a-tag a-count a-latest) (a))
52	!                  ((b-tag b-count b-latest) (b)))
53	        (if (not (= a-count b-count))
54	            ;; more refs
55	            (> a-count b-count)
56	***************

We see that there are two new procedures, ‘wrap’ (lines 39-45) and ‘unwrap’ (lines 47-48), which checks out w/ the description (lines 9-10).  Their name suggests a symmetrical operation.  We see that ‘more-impactful’ changes (before: lines 30-31, after: lines 51-52) to no longer call ‘tag+count+latest’.  In fact, it is now ‘wrap’ that calls ‘tag+count+latest’ (line 41) and eventually ‘cons’ (line 42) to return a pair ‘(VALUES-PRODUCING-THUNK . x)’ for each item in ‘ls’.  It's no surprise, then, that ‘unwrap’ calls ‘cdr’ (line 48) for each member in its ‘ls’, to get back the ‘x’ from the pair. 

So, really, it's just a question of when all this happens.  This is where the hunk 2 enters the picture.  The description (lines 14-15) mentions “rv proc” — remember that?  Yep, there it is (“rv” comment on line 59, 67), with the procedure body on line 61 (before) and 69 (after). 

14	(most-impactful-first): Rewrite rv proc
15	to ‘wrap’, use ‘sort/car’, and ‘unwrap’.
57	*** 233,239 ****
59	    ;; rv
60	    (lambda (ls)
61	!     (sort ls more-impactful)))
63	  (define (consult-db upath-ignored where-clause)
65	--- 244,250 ----
67	    ;; rv
68	    (lambda (ls)
69	!     (unwrap (sort/car more-impactful (wrap ls)))))
71	  (define (consult-db upath-ignored where-clause)
73	-- 

Indeed, the order of operations is wrap, sort, unwrap.  Pretty straightforward, wouldn't you say? 

Well, with that, we conclude the Indices Style Upgrade series.  I was hoping to be inspired to write a few words more, but as it turns out, the muses are out to lunch today...  Feedback on this series or ideas for others always welcome! 

Copyright (C) 2022 Thien-Thi Nguyen