<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:blogger="http://schemas.google.com/blogger/2008" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" gd:etag="W/&quot;CkYHQ3k7fSp7ImA9WhBaFEs.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286</id><updated>2013-05-24T22:35:32.705-07:00</updated><title>cbloom rants</title><subtitle type="html" /><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default?start-index=26&amp;max-results=25&amp;redirect=false&amp;v=2" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>3844</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/CbloomRants" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="cbloomrants" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry gd:etag="W/&quot;Ak4ASHk9eip7ImA9WhBaFEk.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-7189554588489063361</id><published>2013-05-24T19:28:00.000-07:00</published><updated>2013-05-24T19:29:09.762-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-05-24T19:29:09.762-07:00</app:edited><title>05-24-13 - Hunger Cues</title><content type="html">I'm so exhausted that I've been eating from fatigue, which is terrible, so a note for myself.
I very much believe in listening to your body, but you have to know what to listen to, and hunger cues can be confusing.

&lt;P&gt;

1. Tiredness is not a hunger cue.  Yes, sure popping some sugar will give you a boost, but that is not the right solution.
When you are tired you need sleep, not food.  This is always a huge problem for me in work crunches, I start jamming candy
down my throat to try to keep my energy up, and I'm tempted to do it now for baby, so hey self - no, sleepiness is not
solved by eating.  Go sleep.

&lt;P&gt;

2. Your belly is actually not a great cue.  Sure extreme belly ache and rumbling means you need to eat, but a slight
hollow empty feeling, which most people take as a "I must eat now" is not actually a good hunger cue.  Humans are not
meant to feel full in the belly all the time, but in our culture we get used to that feeling and so it feels strange
when it's gone and you think something is wrong that you have to fix by putting more food in.  It's really not; you need
to try to detach the mental association of "belly empty" with "eat now".

&lt;P&gt;

3. The actual correct hunger cue is light headedness, dizzyness, or weakness.  That means you really do need to eat something now,
but perhaps not a lot.  (getting quantities right for yourself takes some experimentation over time to figure out)

&lt;P&gt;

I believe that the primary goal of food consumption portioning and scheduling should be to eat as little as possible, without ever going
into that red zone of critically low blood sugar.  (of course I'm assuming that you want to be near your "ideal" body weight, with "ideal"
being the modern standard of trim; if your ideal is to be as large as possible then you would do something different).  
Note that belly feelings show up nowhere in the "primary goal", you just ignore them.  Perhaps
even learn to enjoy that slight hollow feeling in your gut, which gives you a bit of a hungry wolf feeling, it's sort of energizing.
(if I'm slightly hungry, slightly horny, and slightly angry, good god, get out of the way!)

&lt;P&gt;

I'm convinced that the right way to eat is something like 5 small meals throughout the day.
Long ago when I was single and quite fit, I ate like that and was quite successful at meeting the "primary goal of food consumption portioning and scheduling"
(minimal eating without going into the red zone).
It's very hard to keep that up in a relationship, because eating a big meal is such a key part of our social conventions.  When I was single I
would very rarely eat a proper dinner; I would eat a sandwich at 4 PM or something so that I wouldn't really be too hungry at 8 when Fifth Meal
came around, so I might just eat a salad and some canned tuna.  It is possible to do in a relationship, and I'm sort
of trying it now.  You have to just make sure that you eat less at the standard meal times, or eat more low-cal food like
cooked veg and salads, and then go ahead and eat the intermediate meals yourself.

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/ggfB_jrcsd8" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/7189554588489063361/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=7189554588489063361" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/7189554588489063361?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/7189554588489063361?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/05/05-24-13-hunger-cues.html" title="05-24-13 - Hunger Cues" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;Ak4ARnc7eCp7ImA9WhBaFEk.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-6622326726350957111</id><published>2013-05-22T00:00:00.000-07:00</published><updated>2013-05-24T19:29:07.900-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-05-24T19:29:07.900-07:00</app:edited><title>05-22-13 - Baby Work</title><content type="html">Jesus christ it's a lot of work.  I was hoping to get back to doing a little bit of RAD work by Monday (two days ago), but it's
just not possible yet.  I'm doing all the work for Tasha and baby and it's completely swamping me.  I get little moments of
free time (that I oddly use to write things like this), but never a solid enough block to actually do programming work, and you
can never predict when that solid block of time will be, which is so hard for programming.  A few times I tried to get going on
some actual coding, and then baby wakes up and I have to stop and run around changing diapers and getting mom snacks.  I give up.

&lt;P&gt;

(even when I do get a bit of solid time, I'm just too tired to be productive; I stare blankly at the code.
Actually I've had a few funny "tired dad" moments already.  I went to the grocery store and was shopping along,
and all of a sudden I noticed my cart was full of stuff that wasn't mine.  Oh crap I took someone else's cart
at some point because I was so asleep on my feet.  I remember all through my childhood my mom would make shopping
mistakes that I found so infuriating (Mom, I said corn chex and you got wheat chex, omg mom how could you be so
daft!) and now I finally have some sympathy; you just get so exhausted that you can't even perform the most
rudimentary of tasks without spacing out and making mistakes).

&lt;P&gt;

If you haven't had your baby yet, get some help, don't try to do it alone.  (our relief should arrive tomorrow, and it's
getting easier each day anyway; the hardest days were the beginning when we were still exhausted from labor and cleaning up after
the home birth, and baby hadn't figured out nursing very well yet, that was some serious crunch time).  We thought it would be
sweet to have some time for just the three of us, and it has been, actually it's been really nice just being alone
together, but it's too much work, I don't recommend it.

&lt;P&gt;

(we've been incredibly fortunate so far that our baby is super easy, really well behaved, a good sleeper and nurser; we just
have none of the problems that you hear about.  Of course that's not entirely luck, we're both super healthy people and we've done what
we believe is right to make a happy newborn (singing to the womb, immediate mommy skin contact after birth, never separating baby from parents,
cosleeping, breatfeeding, etc.).  But it's been so hard even with a well-behaved baby I now have new respect for the parents that
go through a baby with colic or feeding difficulties or one that doesn't sleep, you guys are heros).

&lt;P&gt;

House work is so annoying in the way that you can't just get it all done at once.  Even during this time when the house work is
so much harder than usual, it's still only something like 4-6 hours total of work, but it's all spread out over the day; you work
for a while, then you have a 15 minute break (waiting for laundry to finish or the baby to poop again), then you do more work, then
another tiny break, etc.  I don't do well with work like that; I'm a work sprinter (actually I'm an everything sprinter), I want to get all my tasks
on a list and then I'm just gonna strap on the gusto and knock it out as fast as possible so that I can be done and have a deep rest.

&lt;P&gt;

I'm sad that I'm so busy running around doing chores that I can't just lie in bed with mom and baby very much.  I've always used
doing work for people as a way of showing love, and it's fucking retarded.  It's not what they really want, and it's not perceived
as love.  I'm sure that if I had someone else do all the work and I just spent more time cuddling, I would be a better dad.

&lt;P&gt;

At family gatherings, there are always those people who disappear from the group to help out in the kitchen; they might help cook, then help set up,
help clean up; they do a lot of work and show their love for the family that way.  There are other people who just hang out with
the group the whole time and chat and smile and are more overtly interested in being there.  Of course the hard worker is subconsciously
perceived negatively, as standoffish, while the smiler is a "good family man" that everyone enjoys.  In my youth I would rage about
the unfairness of it all.  I'm past that and can see things more clearly :

&lt;P&gt;

1. If you have a choice, then of course it's better to be the smiler who does no work and everyone loves.  There is no reward in
the real world for being a hard worker; not in social situations nor in the workplace.  It's much better to be
perceived as nice than to actually do deeply giving nice things.

&lt;P&gt;

2. Being a friendly, funny social creature is of course a type of "work" that's contributing value to the social situation.  Think of them as an MC if
you like; they're providing a service to the group, telling stories or laughing or whatever.  That's valuable work as well.

&lt;P&gt;

3. The people who are really stealing from the group are the ones that don't do the work, and also don't provide laughs and good vibes.
They are energy vampires and you should minimize your contact with them.

&lt;P&gt;

4. If you're a "smiler", the hard-working types will give you dirty looks or even drop passive aggressive shitty remarks about how
"some people don't contribute" or whatever.  Fuck them.  They're just morons who have chosen a bad path for themselves and are trying
to bring you down.  Just laugh at them inside your head.  Foolish hard-workers, your judgement has no sway over me, I don't need your
approval.  If someone else wants to do all the hard work for you, and then make themselves feel all sour about the fact that you didn't
do the work, fantastic.

&lt;P&gt;

5. If you're a hard-worker, don't despair about the unfair world.  You have found your lot in life.  Maybe you are simply incapable
of being a smiler.  That's too bad for you, but we all have our place, and it's better to accept it and be content than to rage about
what cannot be yours.

&lt;P&gt;

In my youth I used to struggle with trying it both ways.  One of the nice things about age is you figure out your lot in life and
just accept it; some years ago I concluded that I would never really contribute great social energy to groups, so I should just
be a dish-doer in order to avoid being an energy vampire.

&lt;P&gt;

Anyway.

&lt;P&gt;

I'm a bit worried that I will never be able to really concentrate in my home office the way I have in the past.  It's been a wonderful
sanctuary of peace and alone time for me, where I can dive into my work and there's nobody making noise or peeking in at me (the way
people do in offices).  But now even just knowing that my baby is in the house, my mind is partly on the baby; is she okay, should I go
check on her now?  I'm sure that will decrease over time, but never go away.  And of course once the child is running around
making noise that will be a new distraction.  Oh well, I guess we'll see how it goes.

&lt;P&gt;

Programming is such hard work to mix with anything else because you really need that solid block of uninterrupted time.  You can't just
pause and resume; or I can't anyway, I need to get into this sort of trance, which takes a while to acheive, and is quite draining.
I feel a bit like a wizard in a fantasy novel; I can cast this amazing spell ("write code"), but to do so drains a bit of my life force,
and if I do it more than once per day I bring myself close to death; if you're interrupted in coding, the spell is cancelled.  I can
write code without the spell, but then progress is slow and difficult, just like a normal human trying to write code, it's so frustrating
for me to write code without the spell that I don't like to do it at all.

&lt;P&gt;

Anyway.

&lt;P&gt;

The actual point of this post is that I feel this need to get back to doing RAD work right away, and it's making me angry.  Why do I have
to feel that way?  Fuck RAD work, I need to be with family.  But my god damn hard-working WASPy martyr upbringing makes me feel like I
can't ever take anything for myself, that I need to go and sacrifice and work for other people.

&lt;P&gt;

The whole time was Tasha was pregnant I was crunching like crazy trying to finish Oodle 1.1 and get the real public release done, and to
just get as much work done then as I could so that I would feel better about taking time off after the birth.  I neglected Tasha when she
needed me and she was really upset at me for it.  I wanted to get ahead on my schedule, and I did, but now that I'm here my brain won't
let me have that and wants me to go back to work.

&lt;P&gt;

One of the things I've really struggled with at RAD is the lack of structure and the self-scheduling.  There's
never a point where I can get ahead of an official schedule; I can't hit all my milestones for the month and
then feel okay about taking it easy for a while.  Any time I do take it easy because I just need a break, I feel
bad about it.  In general, my stupid brain makes me productive as a programmer, but also miserable as a
programmer.

&lt;P&gt;

People who have a job where they just have a list of things to do and they can actually do them all and
then go "I'm done, I'm going home" are very fortunatey.  In programming, the todo list is always effectively
infinite (it's finite, but always longer than what you can ever accomplish).  You might make a schedule and
set a target set of tasks for a given month, but if you get them done sooner you don't go "great, I'm done
for the month, time for a few days off", you go "oh, I went faster than expected, I guess I'll adjust the schedule
and start on next month's tasks".

&lt;P&gt;

Even in a structured programming work environment, if you do your tasks faster than scheduled, you don't get
sent home - you get given more tasks.  In the traditional producer/team work system, your reward for being
the fastest on the team is not more free time or even more pay, it's more work.  Yay.  Cynical "realist" programmers
learn this at some point and many of them start to sandbag.  They might finish their tasks quickly, but don't
report it to production until their previously alotted time.  Or they will intentionally take "slow work" breaks,
like browing the web or watching videos while working.

&lt;P&gt;

Anyway.  I guess this post is all just my way of trying to convince myself that it's okay for me to take a few
more days off.

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/yvq7zKGEk-0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/6622326726350957111/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=6622326726350957111" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/6622326726350957111?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/6622326726350957111?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/05/05-22-13-baby-work.html" title="05-22-13 - Baby Work" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>3</thr:total></entry><entry gd:etag="W/&quot;Ak4ARn86eSp7ImA9WhBaFEk.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-5896102974593120761</id><published>2013-05-20T00:00:00.000-07:00</published><updated>2013-05-24T19:29:07.111-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-05-24T19:29:07.111-07:00</app:edited><title>05-20-13 - Thoughts on Data Compression for MMOs</title><content type="html">I've been thinking about this a bit and thought I'd scribble some ideas publicly.
(MMO = not necessarily just MMOs but any central game server with a very large number of clients, that
cares about total bandwidth).

&lt;P&gt;

The situation is roughly this : you have a server and many clients (let's say 10k clients per server just to be concrete).
Data is mostly sent server-&gt;client , not very much is client-&gt;server.  Let's say 10k bytes per second per channel from server-&gt;client,
and only 1k bytes per second from client-&gt;server.  So the total data rate from the server is high (100 MB/s) but the data rate
on any one channel is low.  The server must send packets more than once per second for latency reasons; let's say 10 times
per second, so packets are only 1k on average; server sends 100k packets/second.  You don't want the compressor to add any
delay by buffering up packets.

&lt;P&gt;

I'm going to assume that you're using something like TCP so you have gauranteed packet order and no loss, so that you can use
previous packets from any given channel as encoder history on that channel.  If you do have an error in a connection you have to reset the encoder.

&lt;P&gt;

This is a rather unusual situation for data compression, and the standard LZ77 solutions don't work great.  I'm going to talk
only about the server-&gt;client transmission for now; you might use a completely different algorithm for the other direction.
Some properties of this situation :

&lt;P&gt;

1. You care more about encode time than decode time.  CPU time on the server is one of the primary limiting factors.  The client machine has
almost no compression work to do, so decode time could be quite slow.

&lt;P&gt;

2. Per-call encode time is very important (not just per-byte time).  Packets are small and you're doing lots of them (100k packets/sec),
so you can't afford long startup/shutdown times for the encoder.  This is mostly just an annoyance for coding, it means you
have to be really careful about your function setup code and such.

&lt;P&gt;

3. Memory use on the server is a bit limited.  Say you allow 4 GB for encoder states; that's only 400k per client.
(this is assuming that you're doing per-client encoder state, which is certainly not the only choice).

&lt;P&gt;

4. Cache misses are much worse than a normal compression encoder scenario.  Say you have something like a 256k hash table
to accelerate match finding.  Normally when you're compressing you get that whole table into L2 so your hash lookups are in cache.
In this scenario you're jumping from one state to another all the time, so you must assume that every memory lookup is a cache miss.

&lt;P&gt;

5. The standard LZ77 thing of not allowing matches at the beginning or end is rather more of a penalty.  In general all those
inefficiencies that you normally have on tiny buffers are more important than usual.

&lt;P&gt;

6. Because clients can be added at any time and connections can be reset, encoder init/reset time can't be very long.  This is
another reason aside from memory use that encoder state must be small.

&lt;P&gt;

7. The character of the data being sent doesn't really vary much from client to client.  This is one way in which this scenario
differs from a normal web server type of situation (in which case, different clients might be receiving vastly different types of data).
The character of the data can change from packet to packet; there are sort of a few different modes of the data and the stream switches
between then, but it's not like one client is usually receiving text and another one is receiving images.  They're all generally
receiving bit-packed 3d positions and the same type of thing.

&lt;P&gt;

And now some rambling about what encoder you might use that suits this scenario :

&lt;P&gt;

A. It's not clear that adaptive encoding is a win here.  You have to do the comparison with CPU use held constant, if you
just compare an encoder running adaptive vs the same encoder with a static model, that's not fair, because the static model
can be so much faster you should use a more sophisticated encoder.  The static model can also use vastly more memory.
Maybe not a whole 4G, but a lot more than 400k.

&lt;P&gt;

B. LZ77 is not great here.  The reason we love LZ77 is the fast, simple decoder.  We don't really care about that here.
An LZP or ROLZ variant would be better, that has a slightly slower and more memory-hungry decoder, but a simpler/faster encoder.

&lt;P&gt;

C. There are some semi-static options.  Perhaps a static match dictionary with something like LZP, and then an adaptive simple
context model per channel.  That makes the per-channel adaptive part small in memory, but still allows some local adaptation
for packets of different character.  Another option would be a switching static-model scheme.  Do something like train 4 different
static models for different packet types, and send 2 bits to pick the model then encode the packet with that static model.

&lt;P&gt;

D. Static context mixing is kind of appealing.  You can have static hash tables and a static mixing scheme, which eliminates a
lot of the slowness of CM.  Perhaps the order-0 model is adaptive per channel, and perhaps the secondary-statistics table is adaptive per channel.
Hitting 100 MB/s might still be a challenge, but I think it's possible.  One nice thing about CM here is that it can have the idea of
packets of different character implicit in the model.

&lt;P&gt;

E. For static-dictionary LZ, the normal linear offset encoding doesn't really make a lot of sense.  Sure, you could try to
optimize a dictionary by laying out the data in it such that more common data is at lower offsets, but that seems like a
nasty indirect way of getting at the solution.  Off the top of my head, it seems like you could use something like an LZFG
encoding.  That is, make a Suffix Trie and then send match references as node or leaf indexes; leaves all have equal
probability, nodes have a child count which is proportional to their probability (relative to siblings).

&lt;P&gt;

F. Surely the ideal solution is a blended static/dynamic coder.  That is, you have some large trained static model
(like a CM or PPM model, or a big static dictionary for LZ77) and then you also run a local adaptive model in a 
circular window for each channel.  Then you compressing using a mix of the two models.  There are various options on
how to do this.  For LZ77 you might send 0-64k offsets in the local adaptive window, and then 64k-4M offsets as indexes
into the static dictionary.  Or you could more explicitly code a selector bit to pick one of the two and then an offset.
For CM it's most natural, you just mix the result of the static model and the local adaptive model.

&lt;P&gt;

G. What is not awesome is model preconditioning (and it's what most people do, because it's the only thing available with
off-the-shelf compressors like zlib or whatever).  Model precondition means taking an adaptive coder and initially loading its model (eg. an LZ
dictionary) from some static data; then you encode packets adaptively.  This might actually offer excellent compression,
but it has bad channel startup time, and high memory use per channel, and it doesn't allow you to use more efficient
algorithms that are possible with fully static models (such as different types of data structures that provide fast lookup
but not fast update).

&lt;P&gt;

Obviously if you're doing UDP or some other unreliable packet scheme, then static-model compression is the only
way to go.  Also if clients are very frequently joining and leaving or moving servers, then they will never build
up much channel history and static model is the way to go there as well.  If streams vary vastly in size, like
if they're usually less than 1k but occasionally you do large 1M+ transfers (like for content serving as opposed
to updating game state) you would use totally different schemes.

&lt;P&gt;

I'd like to do some tests.  If you work on an MMO or similar game situation and can give me some real-world test data, please
be in touch.

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/zAscOHFZmaY" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/5896102974593120761/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=5896102974593120761" title="14 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/5896102974593120761?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/5896102974593120761?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/05/05-20-13-thoughts-on-data-compression.html" title="05-20-13 - Thoughts on Data Compression for MMOs" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>14</thr:total></entry><entry gd:etag="W/&quot;DUENQXc-cCp7ImA9WhBaEk4.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-9222819158185609984</id><published>2013-05-17T11:22:00.000-07:00</published><updated>2013-05-22T08:48:10.958-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-05-22T08:48:10.958-07:00</app:edited><title>05-17-13 - Cloth Diapering</title><content type="html">Oh yes indeed, you are in for a spate of baby-related blogging.

&lt;P&gt;

I'm pretty sure clother diapers are bullshit.  I'm about to cancel my diaper service.  In this first week I've been
using a semi-alternating mix of cloth and disposable.  I assumed that I would start out with disposables just for
ease in the first few days and then switch to cloth because it's "better", but I don't think I will.

&lt;P&gt;

(I make all my decisions now based only on 1. personal observations
and 2. serious scientific studies where I can read the original papers.  I try to avoid and discount 3. journalism
4. hearsay 5. the internet 6. mass-market nonfiction.  I think they are garbage and mental poison.)

&lt;P&gt;

What I'm seeing is :

&lt;P&gt;

Disposable diapers actually work the way they claim to.  The seal around the borders is good.  The entire
diaper itself has a nice low profile so is not too bulky or uncomfortable.  But most importantly, they actually
do trap and absorb moisture.  When baby has a heavy pee in a disposable diaper, the moisture stays right in
one little spot and doesn't spread all over.  When I remove the diaper I can feel her skin all over the nether regions
is pretty dry.

&lt;P&gt;

Cloth diapers don't.  The worst aspect is that when baby has a heavy pee, the cloth soaks it up, and because it's
cloth and wicks moisture, the pee is spread all over her entire lower parts.  When I get the diaper off, she's soaking
wet all over.  (and yes of course I'm changing her almost instantly after peeing because at this point we're watching
her constantly).  That alone is enough to turn me off cloth diapers, but there's lots more that sucks about them.
It's really hard to get the diaper cover on such that it actually makes a water-tight seal, so leakage is much more likely
(and if you do try to make it water tight, it's easy to make it too tight and cut off circulation, which I accidentally
did once).  The cloth diaper alone looks pretty comfortable on her, but the diaper cover is much rougher and more bulky than a
dispoable; the result is that she has this huge awkward thing on.

&lt;P&gt;

When you add the inconvenience of cloth diapers (longer changing times, having to store poop in your house, taking the pail
in and out for pickup), it just seems like a massive lose.

&lt;P&gt;

The only possible argument pro-cloth that makes sense to me is the reduction of the landfill load.  Now, environmental
arguments are always complicated; there are arguments for the other side based on the environmental cost of washing (though I
think they're bogus).  But even assuming that the environmental case is clear, being a hypocritical liberal I wouldn't actually
inconvenience myself and discomfort my baby for the benefit of the landfill.

&lt;P&gt;

Eh, actually I take back that false self-accusation.  That's a retarded Fox News style "gotcha" that's based on misrepresentation
and not understanding.  I've never advocated the standard liberal martyrdom (and if I once did, I certainly don't now).  
I don't believe in choosing to undermine yourself because you believe the world would be
better if everyone did it.  I believe in changing the laws such that they encourage you to make the choice that is better
for the world.  eg. people who don't drive because they believe it's evil, even if it would be much to their benefit,
are just being dumb martyrs.  The US government massively subsidizes driving, so if you don't take advantage of that you
are essentially paying for other people to drive.  I would love it if the government would subsidize *not driving* rather
than the other way around, but until they do I'm driving up a storm.  (tangent : the massive subsidies for Teslas
is a great example of the way that Dems and Reps are in fact both really working for the same cause : creating
loop holes and kick backs so that they can give money to rich people).

&lt;P&gt;

I'm a big tangent wanderer.  My political philosophy in a nutshell :

&lt;P&gt;

Government's role is to create a market structure (through laws, regulation, the Fed, direct market action, etc)
such that when each actor maximimizes their own personal utility, the net result is as good for the entire world (nation) as possible.  

&lt;P&gt;

(if you're out of high school (or the 18th century) you should know that a free market does not do that on its own)

&lt;P&gt;

(And crucially, "good for" must be defined on something like a sum-of-logs scale, or perhaps just maximize the median, or
minimize the number in poverty; if you maximize
the sum (basically GDP) then giving huge profits to Larry Ellison and fucking everyone else looks like it's "good for the world")

&lt;P&gt;

And, uh, oh yeah, cloth diapers suck.

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/a0F7m5bGKmc" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/9222819158185609984/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=9222819158185609984" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/9222819158185609984?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/9222819158185609984?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/05/05-17-13-cloth-diapering.html" title="05-17-13 - Cloth Diapering" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>4</thr:total></entry><entry gd:etag="W/&quot;CEEEQ3w6eCp7ImA9WhBbGEw.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-423403794999956147</id><published>2013-05-15T00:00:00.000-07:00</published><updated>2013-05-17T10:43:22.210-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-05-17T10:43:22.210-07:00</app:edited><title>05-15-13 - Baby</title><content type="html">I suppose this is the easiest way to announce to various friends and semi-friends rather than trying to mass-email.
I have a new baby, yay!  No pictures, you paparazzos.  She's adorable and healthy.
I love how simple and direct her communication is.  Suckling lips = needs to nurse.  Squirming =
needs a diaper change.  Fussing = cold or gassy.  Everything else = needs to sleep.  I wish all humans communicated so clearly.

&lt;P&gt;

I want to write about the wonderful experience of having a home birth (see *2), but don't want to intrude on
Tasha's privacy.  Suffice it to say it was really good, so good to be home and have everything at hand to
make Tasha comfortable, and then be able to take baby in our arms and settle into bed right away.  We spent the
first 36 hours after birth all in bed together and I think that time was really important.

&lt;P&gt;

I've always wanted to have kids, but I'm (mostly) glad that I waited this long.  For one thing Tasha is a wonderful mom
and I'm glad I found her.  But also, I realize now that I wasn't ready in my twenties.  I've changed a lot in
the last five years and I'm a much better person now.  I've learned important lessons that are helping me a lot
in this challenging time, like to do hard work correctly you have to not only complete the task but also
keep a good attitude and be nice to the people around you while you do it.  And that when you are tired and hungry is when you
can really show your character; anyone can have a good attitude when they're fresh, but if you get nasty when
the going gets tough then you are nasty.  etc. standard cbloom slowly figures out things that most people learned in their teens.

&lt;P&gt;

Now for some old-style ranting.

&lt;P&gt;

1. "We had a baby".  No you fucking did not.  Your wife had a baby.  If you were a really good husband,
you held her hand and got her snacks.  She squeezed a watermelon out of her vagina.  You do not get to
take any credit for that act, it was all her.  It's a bit like Steve Jobs saying "we invented" anything;
no you did not you fucking credit-stealing douchebag, your company didn't even invent it, much less you.

&lt;P&gt;

(tangent : I can't stand the political correctness in sport post-game interviews these days;
they're all so sanitized and formulaic.  They must go to interview coaching classes or something because
everyone says exactly the same things.  Of course it's not the athlete's fault, they would love to have
emotional honest outbursts, it's the god damn stupid public who throw a coniption if anybody says anything
remotely true.  In particular this post reminds me of how athletes always immediately go "it wasn't just me, it was
the team"; no it was not, Kobe, you just had an 80 point game, it was all fucking you, don't give me this
bullshit credit to the team stuff.   Be a man and say "*I* won this game".)

&lt;P&gt;

2. People are busy-body dicks.  When we would tell acquaintances about our plans to have a home birth,
a good 25% would feel like they had to tell us what a bad idea that was and nag us about the dangers of
childbirth.  Shut the fuck up you stupid asshole.  First of all, don't you think that maybe we've researched
that more than you before making our decision, so you don't know WTF you're talking about?  Second of all,
we're not going to change our mind because of your nagging, so all you're doing is being nasty about something
you're not going to change.  We didn't ask for your opinion, you can just stay the hell out of it.
(The doctors that we would occasionally see for tests were often negative and naggy as well, which only made
us more confident in our choice).

&lt;P&gt;

It's a bit like if a friend tells you they're marrying someone and you go "her?".  Even if the marriage is
a questionable choice, they're not going to stop it due to your misgivings, so all you're doing is adding
some unpleasantness to their experience.

&lt;P&gt;

You always run into these idiots when you do software reviews or brainstorming sessions.  You'll call a meeting
to discuss revisions to the boss fight sequence, and some asshole will always chime in with "I really think the
whole idea of boss fights sucks and we should start over".  Umm, great, thanks, very helpful.  We're not going
to tear up the whole design of the game a few months from shipping, so maybe you could stick to the topic
at hand and get some kind of clue about what things are reasonable to change and which need to be taken as a given
and worked within as constraints.

&lt;P&gt;

Like when I'd ask for reviews of Oodle, a few of the respondents would give me something awesomely unhelpful like
"I don't like the entire style of the API, and I'd throw it out and do a new one" , or "actually I think a paging +
data compression library is a bad idea and I'd just start over on something else".  Great, thanks; I might agree
with you but obviously you must know that that is not going to happen and it's not what I was asking for, so if
you don't want to say anything helpful then just say "no".

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/SpzzRmYj2-U" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/423403794999956147/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=423403794999956147" title="12 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/423403794999956147?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/423403794999956147?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/05/05-15-13-baby.html" title="05-15-13 - Baby" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>12</thr:total></entry><entry gd:etag="W/&quot;Dk4NQn48eyp7ImA9WhBbGEw.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-960380042636942989</id><published>2013-05-08T19:33:00.001-07:00</published><updated>2013-05-17T11:23:13.073-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-05-17T11:23:13.073-07:00</app:edited><title>05-08-13 - A Lock Free Weak Reference Table</title><content type="html">It's very easy (almost trivial (*)) to make the table-based {index/guid} style of weak reference lock free.

&lt;P&gt;

(* = obviously not trivial if you're trying to minimize the memory ordering constraints, as evidenced by the
revisions to this post that were required; it is trivial if you just make everything seq_cst)

&lt;P&gt;

Previous writings on this topic :

&lt;P&gt;

&lt;A HREF="http://www.cbloom.com/3d/techdocs/smart_pointers.txt"&gt; Smart &amp; Weak Pointers - valuable tools for games - 03-27-04 &lt;/A&gt; &lt;BR&gt;
&lt;A HREF="http://cbloomrants.blogspot.com/2008/03/03-22-08-6.html"&gt; cbloom rants 03-22-08 - 6 &lt;/A&gt; &lt;BR&gt;
&lt;A HREF="http://cbloomrants.blogspot.com/2010/07/07-05-10-counterpoint-2.html"&gt; cbloom rants 07-05-10 - Counterpoint 2 &lt;/A&gt; &lt;BR&gt;
&lt;A HREF="http://cbloomrants.blogspot.com/2011/08/08-01-11-game-threading-model_01.html"&gt; cbloom rants 08-01-11 - A game threading model &lt;/A&gt; &lt;BR&gt;
&lt;A HREF="http://cbloomrants.blogspot.com/2012/03/03-05-12-oodle-handle-table.html"&gt; cbloom rants 03-05-12 - Oodle Handle Table &lt;/A&gt; &lt;BR&gt;

&lt;P&gt;

The primary ops conceptually are :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

Add object to table; gives it a WeakRef id

WeakRef -&gt; OwningRef  (might be null)

OwningRef -&gt; naked pointer

OwningRef construct/destruct = ref count inc/dec

&lt;/PRE&gt;&lt;/FONT&gt;

&lt;P&gt;

The full code is in here : &lt;A HREF="http://www.cbloom.com/src/cbliblf.zip"&gt; cbliblf.zip &lt;/A&gt; , but you can get a
taste for how it works from the ref count maintenance code :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

    // IncRef looks up the weak reference; returns null if lost
    //   (this is the only way to resolve a weak reference)
    Referable * IncRef( handle_type h )
    {
        handle_type index = handle_get_index(h);
        LF_OS_ASSERT( index &gt;= 0 &amp;&amp; index &lt; c_num_slots );
        Slot * s = &amp;s_slots[index];

        handle_type guid = handle_get_guid(h);

        // this is just an atomic inc of state
        //  but checking guid each time to check that we haven't lost this slot
        handle_type state = s-&gt;m_state.load(mo_acquire);
        for(;;)
        {
            if ( state_get_guid(state) != guid )
                return NULL;
            // assert refcount isn't hitting max
            LF_OS_ASSERT( state_get_refcount(state) &lt; state_max_refcount );
            handle_type incstate = state+1;
            if ( s-&gt;m_state.compare_exchange_weak(state,incstate,mo_acq_rel,mo_acquire) )
            {
                // did the ref inc
                return s-&gt;m_ptr;
            }
            // state was reloaded, loop
        }
    }

    // IncRefRelaxed can be used when you know a ref is held
    //  so there's no chance of the object being gone
    void IncRefRelaxed( handle_type h )
    {
        handle_type index = handle_get_index(h);
        LF_OS_ASSERT( index &gt;= 0 &amp;&amp; index &lt; c_num_slots );
        Slot * s = &amp;s_slots[index];
        
        handle_type state_prev = s-&gt;m_state.fetch_add(1,mo_relaxed);
        state_prev;
        // make sure we were used correctly :
        LF_OS_ASSERT( handle_get_guid(h) == state_get_guid(state_prev) );
        LF_OS_ASSERT( state_get_refcount(state_prev) &gt;= 0 );
        LF_OS_ASSERT( state_get_refcount(state_prev) &lt; state_max_refcount );
    }

    // DecRef
    void DecRef( handle_type h )
    {
        handle_type index = handle_get_index(h);
        LF_OS_ASSERT( index &gt;= 0 &amp;&amp; index &lt; c_num_slots );
        Slot * s = &amp;s_slots[index];
        
        // no need to check guid because I must own a ref
        handle_type state_prev = s-&gt;m_state.fetch_add((handle_type)-1,mo_release);
        LF_OS_ASSERT( handle_get_guid(h) == state_get_guid(state_prev) );
        LF_OS_ASSERT( state_get_refcount(state_prev) &gt;= 1 );
        if ( state_get_refcount(state_prev) == 1 )
        {
            // I took refcount to 0
            //  slot is not actually freed yet; someone else could IncRef right now
            //  the slot becomes inaccessible to weak refs when I inc guid :
            // try to inc guid with refcount at 0 :
            handle_type old_guid = handle_get_guid(h);
            handle_type old_state = make_state(old_guid,0); // == state_prev-1
            handle_type new_state = make_state(old_guid+1,0); // == new_state + (1&lt;&lt;code&gt;&lt;&lt;/code&gt;handle_guid_shift);
  
            if ( s-&gt;m_state($).compare_exchange_strong(old_state,new_state,mo_acq_rel,mo_relaxed) )
            {
                // I released the slot
                // cmpx provides the acquire barrier for the free :
                FreeSlot(s);
                return;
            }
            // somebody else mucked with me
        }
    }

&lt;/PRE&gt;&lt;/FONT&gt;

&lt;P&gt;

The maintenance of ref counts only requires relaxed atomic increment &amp; release atomic decrement (except when the pointed-at object is
initially made and finally destroyed, then some more work is required).  Even just the relaxed atomic incs
could get expensive if you did a ton of them, but my philosophy for how to use this kind of system is that you inc &amp; dec refs
as rarely as possible.  The key thing is that you don't write functions that take owning refs as arguments, like :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

void bad_function( OwningRefT&lt;code&gt;&lt;&lt;/code&gt;Thingy&gt; sptr )
{
    more_bad_funcs(sptr);
}

void Stuff::bad_caller()
{
    OwningRefT&lt;code&gt;&lt;&lt;/code&gt;thingy&gt; sptr( m_weakRef );
    if ( sptr != NULL )
    {
        bad_function(sptr);
    }
}

&lt;/PRE&gt;&lt;/FONT&gt;

hence doing lots of inc &amp; decs on refs all over the code.  Instead you write all your
code with naked pointers, and only use the smart pointers where they are needed to ensure ownership for the lifetime
of usage.  eg. :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

void good_function( Thing * ptr )
{
    more_good_funcs(ptr);
}

void Stuff::bad_caller()
{
    OwningRefT&lt;code&gt;&lt;&lt;/code&gt;thingy&gt; sptr( m_weakRef );
    Thingy * ptr = sptr.GetPtr();
    if ( ptr != NULL )
    {
        good_function(ptr);
    }
}

&lt;/PRE&gt;&lt;/FONT&gt;

&lt;P&gt;

If you like formal rules, they're something like this :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

1. All stored variables are either OwningRef or WeakRef , depending on whether it's
an "I own this" or "I see this" relationship.  Never store a naked pointer.

2. All variables in function call args are naked pointers, as are variables on the
stack and temp work variables, when possible.

3. WeakRef to pointer resolution is only provided as WeakRef -&gt; OwningRef.  Naked pointers
are only retrieved from OwningRefs.

&lt;/PRE&gt;&lt;/FONT&gt;

&lt;P&gt;

And obviously there are lots of enchancements to the system that are possible.  A major one that I recommend is to put more information in
the reference table state word.  If you use a 32-bit weak reference handle, and a 64-bit state word, then you have 32-bits of extra
space that you can check for free with the weak reference resolution.  You could put some mutex bits in there
(or an rwlock) so that the state contains the lock for the object, but I'm not sure that is a big win (the only
advantage of having the lock built into the state is that you could atomically get a lock and inc refcount in a single
op).  A better usage is to put some
object information in there that can be retrieved without chasing the pointer and inc'ing the ref and so on.

&lt;P&gt;

For example in Oodle I store the status of the object in the state table.  (Oodle status is a progression
through Invalid-&gt;Pending-&gt;Done/Error).  That way I can take a weak ref and query status in one atomic load.
I also store some lock bits, and you aren't allowed to get back naked pointers unless you have a lock on them.

&lt;P&gt;

The code for the weak ref table is now in the cbliblf.zip that I made for the last post.
Download : &lt;A HREF="http://www.cbloom.com/src/cbliblf.zip"&gt; cbliblf.zip &lt;/A&gt;

&lt;P&gt;

( The old cblib has a non-LF weak reference table that's similar for comparison.  It's also more developed
with helpers and fancier templates and such that could be ported to this version.  Download :
&lt;A HREF="http://www.cbloom.com/src/cblib.zip"&gt; cblib.zip &lt;/A&gt; )

&lt;P&gt;

ADDENDUM : alternative DecRef that uses CAS instead of atomic decrement.  Removes the two-atomic free path.
Platforms that implement atomic add as a CAS loop should probably just use this form.  Platforms that have
true atomic add should use the previously posted version.

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

    // DecRef
    void DecRef( handle_type h )
    {
        handle_type index = handle_get_index(h);
        LF_OS_ASSERT( index &gt;= 0 &amp;&amp; index &lt; c_num_slots );
        Slot * s = &amp;s_slots[index];
        
        // no need to check guid because I must own a ref
        handle_type state_prev = s-&gt;m_state($).load(mo_relaxed);
        
        handle_type old_guid  = handle_get_guid(h);

        for(;;)
        {
            // I haven't done my dec yet, guid must still match :
            LF_OS_ASSERT( state_get_guid(state_prev) == old_guid );
            // check current refcount :
            handle_type state_prev_rc = state_get_refcount(state_prev);
            LF_OS_ASSERT( state_prev_rc &gt;= 1 );
            if ( state_prev_rc == 1 )
            {
                // I'm taking refcount to 0
                // also inc guid, which releases the slot :
                handle_type new_state = make_state(old_guid+1,0);

                if ( s-&gt;m_state($).compare_exchange_weak(state_prev,new_state,mo_acq_rel,mo_relaxed) )
                {
                    // I released the slot
                    // cmpx provides the acquire barrier for the free :
                    FreeSlot(s);
                    return;
                }
            }
            else
            {
                // this is just a decrement
                // but have to do it as a CAS to ensure state_prev_rc doesn't change on us
                handle_type new_state = state_prev-1;
                LF_OS_ASSERT( new_state == make_state(old_guid,  state_prev_rc-1) );
                
                if ( s-&gt;m_state($).compare_exchange_weak(state_prev,new_state,mo_release,mo_relaxed) )
                {
                    // I dec'ed a ref
                    return;
                }
            }
        }
    }

&lt;/PRE&gt;&lt;/FONT&gt;

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/gKh7_t2u_Og" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/960380042636942989/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=960380042636942989" title="12 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/960380042636942989?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/960380042636942989?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/05/05-08-13-lock-free-weak-reference-table.html" title="05-08-13 - A Lock Free Weak Reference Table" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>12</thr:total></entry><entry gd:etag="W/&quot;DE8GR3c5fCp7ImA9WhBbEEs.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-8093180259164328649</id><published>2013-05-02T12:12:00.000-07:00</published><updated>2013-05-08T19:33:46.924-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-05-08T19:33:46.924-07:00</app:edited><title>05-02-13 - Simple C++0x style LF structs and adapter for x86-Windows</title><content type="html">I've seen a lot of lockfree libraries out there that are total crap.  Really weird non-standard ways
of doing things, or overly huge and complex.

&lt;P&gt;

I thought I'd make a super simple one in the correct modern style.  
Download : &lt;A HREF="http://www.cbloom.com/src/cbliblf.zip"&gt; cbliblf.zip &lt;/A&gt;

&lt;P&gt;

(If you want a big fully functional much-more-complete library, Intel TBB is the best I've seen.
The problem with TBB is that it's huge and entangled, and the license is not clearly free for all use).

&lt;P&gt;

There are two pieces here : 

&lt;P&gt;

"cblibCpp0x.h" provides atomic and such in C++0x style for MSVC/Windows/x86 compilers
that don't have real C++0x yet.  I have made zero attempt to make this header syntatically identical to C++0x,
there are various intentional and unintentional differences.

&lt;P&gt;

"cblibLF.h" provides some simple lockfree utilities (mostly queues) built on C++0x atomics.

&lt;P&gt;

"cblibCpp0x.h" is kind of by design not at all portable.  "cblibLF.h" should be portable to any C++0x platform.

&lt;P&gt;

WARNING : this stuff is not super well tested because it's not what I use in Oodle.  I've mostly copy-pasted
this from my Relacy test code, so it should be pretty strong but there may have been some copy-paste errors.

&lt;P&gt;

ADDENDUM : In case it's not clear, you do not *want* to use "cblibCpp0x.h".  You want to use real Cpp0x atomics
provided by your compiler.  This is a temporary band-aid so that people like me who use old compilers can get a cpp0x stand-in,
so that they can do work using the modern syntax.  If you're on a gcc platform that has the __atomic extensions but not C1X, use that.

&lt;P&gt;

You should be able to take any of the C++0x-style lockfree code I've posted over the years and use it
with "cblibCpp0x.h" , perhaps with some minor syntactic fixes.  eg. you could take
&lt;A HREF="http://cbloomrants.blogspot.com/2011/12/12-08-11-some-semaphores.html"&gt; the fastsemaphore wrapper &lt;/A&gt;
and put the "semaphore" from "cblibCpp0x.h" in there as the base semaphore.

&lt;P&gt;

Here's an example of what the objects in "cblibLF.h" look like :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

//=================================================================         
// spsc fifo
//  lock free for single producer, single consumer
//  requires an allocator
//  and a dummy node so the fifo is never empty
template &lt;code&gt;&lt;&lt;/code&gt;typename t_data&gt;
struct lf_spsc_fifo_t
{
public:

    lf_spsc_fifo_t()
    {
        // initialize with one dummy node :
        node * dummy = new node;
        m_head = dummy;
        m_tail = dummy;
    }

    ~lf_spsc_fifo_t()
    {
        // should be one node left :
        LF_OS_ASSERT( m_head == m_tail );
        delete m_head;
    }

    void push(const t_data &amp; data)
    {
        node * n = new node(data);
        // n-&gt;next == NULL from constructor
        m_head-&gt;next.store(n, memory_order_release); 
        m_head = n;
    }

    // returns true if a node was popped
    //  fills *pdata only if the return value is true
    bool pop(t_data * pdata)
    {
        // we're going to take the data from m_tail-&gt;next
        //  and free m_tail
        node* t = m_tail;
        node* n = t-&gt;next.load(memory_order_acquire);
        if ( n == NULL )
            return false;
        *pdata = n-&gt;data; // could be a swap
        m_tail = n;
        delete t;
        return true;
    }

private:

    struct node
    {
        atomic&lt;code&gt;&lt;&lt;/code&gt;node *&gt;      next;
        nonatomic&lt;code&gt;&lt;&lt;/code&gt;t_data&gt;   data;
        
        node() : next(NULL) { }
        node(const t_data &amp; d) : next(NULL), data(d) { }
    };

    // head and tail are owned by separate threads,
    //  make sure there's no false sharing :
    nonatomic&lt;code&gt;&lt;&lt;/code&gt;node *&gt;   m_head;
    char                m_pad[LF_OS_CACHE_LINE_SIZE];
    nonatomic&lt;code&gt;&lt;&lt;/code&gt;node *&gt;   m_tail;
};

&lt;/PRE&gt;&lt;/FONT&gt;

&lt;P&gt;

Download : &lt;A HREF="http://www.cbloom.com/src/cbliblf.zip"&gt; cbliblf.zip &lt;/A&gt;

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/SfoZJrSPH6Q" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/8093180259164328649/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=8093180259164328649" title="12 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/8093180259164328649?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/8093180259164328649?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/05/05-02-13-simple-c0x-style-lf-structs.html" title="05-02-13 - Simple C++0x style LF structs and adapter for x86-Windows" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>12</thr:total></entry><entry gd:etag="W/&quot;CEYNQHYzfSp7ImA9WhBUFU8.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-7875219573451423741</id><published>2013-04-30T12:01:00.001-07:00</published><updated>2013-05-02T12:16:31.885-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-05-02T12:16:31.885-07:00</app:edited><title>04-30-13 - Packing Values in Bits - Flat Codes</title><content type="html">One of the very simplest forms of packing values in bits is simply to store a value with non-power-of-2
range and all values of equal probability.

&lt;P&gt;

You have a value that's in [0,N).  Ideally all code lengths would be the same ( log2(N) ) which is fractional
for N not a power of 2.
With just bit output, we can't write fractional bits, so we will lose some efficiency.  But how much exactly?

&lt;P&gt;

You can of course trivially write a symbol in [0,N) by using log2ceil(N) bits.  That's just going up to the next
integer bit count.  But you're wasting values in there, so you can take each wasted value and use it to reduce
the length of a code that you need.  eg. for N = 5 , start with log2ceil(N) bits :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;
0 : 000
1 : 001
2 : 010
3 : 011
4 : 100
x : 101
x : 110
x : 111
&lt;/PRE&gt;&lt;/FONT&gt;

The first five codes are used for our values, and the last three are wasted.
Rearrange to interleave the wasted codewords :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;
0 : 000
x : 001
1 : 010
x : 011
2 : 100
x : 101
3 : 110
4 : 111
&lt;/PRE&gt;&lt;/FONT&gt;

now since we have adjacent codes where one is used and one is not used, we can reduce the length of
those codes and still have a prefix code.  That is, if we see the two bits "00" we know that it must
always be a value of 0, because "001" is wasted.  So simply don't send the third bit in that case :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;
0 : 00
1 : 01
2 : 10
3 : 110
4 : 111
&lt;/PRE&gt;&lt;/FONT&gt;

&lt;P&gt;

(this is a general way of constructing shorter prefix codes when you
have wasted values).  You can see that the number of wasted values we
had at the top is the number of codes that can be shortened by one bit.

&lt;P&gt;

A flat code is written thusly :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

void OutputFlat(int sym, int N)
{
    ASSERT( N &gt;= 2 &amp;&amp; sym &gt;= 0 &amp;&amp; sym &lt; N );

    int B = intlog2ceil(N);
    int T = (1&lt;&lt;code&gt;&lt;&lt;/code&gt;B) - N;
    // T is the number of "wasted values"
    if ( sym &lt; T )
    {
        // write in B-1 bits
        PutBits(sym, B-1);
    }
    else
    {
        // write in B bits
        // push value up by T
        PutBits(sym+T, B);
    }
}

int InputFlat(int sym,int N)
{
    ASSERT( N &gt;= 2 &amp;&amp; sym &gt;= 0 &amp;&amp; sym &lt; N );

    int B = intlog2ceil(N);
    int T = (1&lt;&lt;code&gt;&lt;&lt;/code&gt;B) - N;

    int sym = GetBits(B-1);
    if ( sym &lt; T )
    {
        return sym;
    }
    else
    {
        // need one more bit :
        int ret = (sym&lt;&lt;1) - T + GetBits(1);        
        return ret;
    }
}

&lt;/PRE&gt;&lt;/FONT&gt;

That is, we write (T) values in (B-1) bits, and (N-T) in (B) bits.
The intlog2ceil can be slow, so in practice you would want to precompute that
or pass it in as a parameter.

&lt;P&gt;

So, what is the loss vs. ideal, and where does it occur?  Let's work it out :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

H = log2(N)  is the ideal (fractional) entropy

N is in (2^(B-1),2^B]
so H is in (B-1,B]

The number of bits written by the flat code is :

L = ( T * (B-1) + (N-T) * B ) / N

with T = 2^B - N

Let's set

N = f * 2^B

with f in (0.5,1] our fractional position in the range.

so T = 2^B * (1 - f)

At f = 0.5 and 1.0 there's no loss, so there must be a maximum in that interval.

Doing some simplifying :

L = (T * (B-1) + (N-T) * B)/N
L = (T * B - T + N*B - T * B)/N
L = ( N*B - T)/N = B - T/N
T/N = (1-f)/f = (1/f) - 1
L = B - (1/f) + 1

The excess bits is :

E = L - H

H = log2(N) = log2( f * 2^B ) = B + log2(f)

E = (B - (1/f) + 1) - (B + log2(f))
E = 1 - (1/f) - log2(f)

so find the maximum of E by taking a derivative :

d/df(E) = 0
d/df(E) = 1/f^2 - (1/f)/ln2
1/f^2 = (1/f)/ln2
1/f = 1/ln(2)
f = ln(2)
f = 0.6931472...

and at that spot the excess is :

E = 1 - (1/ln2) - ln(ln2)/ln2
E = 0.08607133...

&lt;/PRE&gt;&lt;/FONT&gt;

The worst case is 8.6% of a bit per symbol excess.  The worst case
appears periodically, once for each power of two.

&lt;P&gt;

The actual excess bits output for some low N's :

&lt;P&gt;

&lt;IMG SRC="http://chart.googleapis.com/chart?cht=bvg&amp;chs=1000x256&amp;chbh=4,0,0&amp;chxt=x,y&amp;chco=3072F3&amp;chxr=0,0,256,8|1,0,100,10&amp;chxs=0,000000,12,0,lt|1,000000,10,1,lt&amp;chdl=z&amp;chd=e:AAAAAAFIAAFIFIDNAADNFIFxFIEfDNB7AAB7DNEfFIFIFxFxFIFIEfD2DNCkB7ApAABSB7CkDND2EfEfFIFIFIFxFxFxFxFIFIFIFIEfEfD2D2D2DNCkCkB7B7BSApApAAApBSBSB7CkCkDNDND2D2D2EfEfEfFIFIFIFIFIFIFIFxFxFxFxFxFxFxFIFIFIFIFIFIFIFIEfEfEfEfEfD2D2D2D2D2DNDNDNCkCkCkCkB7B7B7BSBSBSApApApAAAAAAApApBSBSBSB7B7B7CkCkCkCkDNDNDNDND2D2D2D2D2EfEfEfEfEfEfEfFIFIFIFIFIFIFIFIFIFIFIFIFIFxFxFxFxFxFxFxFxFxFxFxFxFxFxFIFIFIFIFIFIFIFIFIFIFIFIFIFIFIFIFIEfEfEfEfEfEfEfEfEfEfD2D2D2D2D2D2D2D2D2DNDNDNDNDNDNDNCkCkCkCkCkCkCkB7B7B7B7B7B7BSBSBSBSBSBSBSApApApApApApAAAAAA&amp;chg=0,10,5,5&amp;chtt=percent+of+bits+excess"&gt;

&lt;P&gt;

The worst case actually occurs as N-&gt;large, because at higher N you can get f closer to
that worst case fraction (ln(2)).  At lower N, the integer steps mean you miss the worst
case and so waste less.  This is perhaps a bit surprising, you might think that
the worst case would be at something like N = 3.

&lt;P&gt;

In fact for N = 3 :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

H = l2(3) = 1.584962 ...

L = average length written by OutputFlat

L = (1+2+2)/3 = 1.66666...

E = L - H = 0.08170421...

&lt;/PRE&gt;&lt;/FONT&gt;

(obviously if you measure the loss as a percentage of the output length, the worst case
is at N=3, and there it's 5.155% of the entropy).

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/s-sahLkJI2o" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/7875219573451423741/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=7875219573451423741" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/7875219573451423741?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/7875219573451423741?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/04/04-30-13-packing-values-in-bits-flat.html" title="04-30-13 - Packing Values in Bits - Flat Codes" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>2</thr:total></entry><entry gd:etag="W/&quot;CEYERn44cSp7ImA9WhBUFU8.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-2428755579913151009</id><published>2013-04-21T00:00:00.000-07:00</published><updated>2013-05-02T12:15:07.039-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-05-02T12:15:07.039-07:00</app:edited><title>04-21-13 - How to grow Linux disk under VMWare</title><content type="html">There's a lot of these guides around the net, but I found them all a bit confusing to follow, so my experience :

&lt;UL&gt;
&lt;LI&gt;
1. Power off the VM.
&lt;P&gt;&lt;LI&gt;
2. Make a backup of the whole VM in case something goes wrong.  Just find the dir containing it and copy the whole thing.
&lt;P&gt;&lt;LI&gt;
3. VMWare Settings -&gt; Hardware -&gt; Hard Disk -&gt; Utilities -&gt; Expand
change to whatever size you want.
&lt;P&gt;&lt;LI&gt;
4. This has just expanded the virtual disk, now you must grow the partition on the disk.  Linux does not have
good tools to grow a partition that is running the OS, so don't boot your VM back into Linux.
&lt;P&gt;
(there is some LVM stuff that lets you make multiple partitions and then treat them as a single one,
but for a Unix newb like myself that looks too scary).
&lt;P&gt;&lt;LI&gt;
5. Download GParted ISO.  VMWare Settings -&gt; Hardware -&gt; CD/DVD -&gt; Use ISO -&gt; point it at GParted.
&lt;P&gt;
Also make sure "Connect at Power On" is checked.
&lt;P&gt;&lt;LI&gt;
6. Now you have to get the virtual machine to boot from CD.  Getting into the BIOS interactively was impossible for me.
Fortunately VMWare has a solution.  Find your VM files and edit the .vmx config file.  Add this line :
&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;
bios.forceSetupOnce = "TRUE"
&lt;/PRE&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;LI&gt;
7. Power on the VM and you should enter the BIOS.  Go to "Boot" and put the CD first.  Save and Exit and you should now
boot into GParted.
&lt;P&gt;&lt;LI&gt;
8. Using GParted is pretty self explanatory.  It's a good tool.  When you're done, shut down and Power Off the VM.
&lt;P&gt;
My VM was set up with a swap partitition, so I had to move that to the end before I could grow the primary partition.
I hear that you can set up Linux with a swap file instead of a swap partition; that would be preferable.  A swap partition
makes zero sense in a VM where the disks are virtualized anyway (so the advantage of keeping the swap thrashing off your
main disk doesn't exist).  Not something I want to change though.
&lt;P&gt;&lt;LI&gt;
9. VMWare Settings -&gt; Hardware -&gt; CD/DVD -&gt; turn off "Connect at Power On"
&lt;/UL&gt;

It seems to me it's a good idea to leave it this way - BIOS is set to boot first from CD, but the VM is set with no CD hardware
enabled.  This makes it easy to change the ISO and just check that box any time you want to boot from an ISO, rather than
having to go into that BIOS nightmare again.

&lt;P&gt;&lt;HR&gt;&lt;P&gt;

More generally, what have I learned about multi-platform development from working at RAD ?

&lt;P&gt;

That it's horrible, really horrible, and I pray that I never have to do it again in my life.  Ugh.

&lt;P&gt;

Just writing cross-platform code is not the issue (though that's horrible enough, solely due to stupid non-fundamental
issues like the fact that struct packing isn't standardized, adding signed ints isn't standardized, restrict/noalias isn't
standardized, inline linkage varies greatly, etc. urg wtf etc etc).
If you're just releasing some code on the net and offering it for many platforms (leaving it up to the downloaders to actually
build it and test it), your life is easy.  The horrible part
is if you actually have to maintain machines and build systems for all those platforms, test them, be able to debug on them,
keep all the sdk's up to date, etc. etc.

&lt;P&gt;

(in general coding is easy when you don't actually test your code and make sure it works well, which a surprising number
of people think is "done"; hey it compiles, I'm done! umm, no...)

&lt;P&gt;

(I guess that's a more general life thing; I observe a lot of people who just do things and don't actually measure
whether the "doing" was successful or done well, but they just move on and are generally happy.  People who stress over
whether what they're doing is actually a good job or not are massively less happy but also actually do good work.)

&lt;P&gt;

I feel like I spend 90% of my time on stupid fucking non-algorithmic issues like this Linux partition resizing shit
(probably more like 20%, but that's still frustratingly high).  The regression tests are failing on Linux,
okay have to figure out why, oh it's because the VM disk is too small, okay how do I fix that; or the PS4 compiler has a
bug I have to work around, or the system software on this system has a bug, or the Mac clang wants to spew pointless warnings about
anonymous namespaces, or my tests aren't working on Xenon .. spend some time digging .. oh the console is just turned off, or the IP changed
or it got reflashed and my SDK doesn't work anymore, and blah blah fucking blah.  God dammit I just want to be able to write algorithms.
I miss coding, I miss thinking about hard problems.  Le sigh.

&lt;P&gt;

I've written before about how in my imagination I could hire some kid for $50k to do all this shit work for me and it would be a huge win
for me overall.  But I'm afraid it's not that easy in reality.

&lt;P&gt;

What really should exist is a "coder cloud" service.  There should be a bunch of VMs of different OS'es with various compilers and SDKs
installed, so I can just say "build my shit for X with Y".  Of course you need to be able to run tests on that system as well, and if
something goes wrong you need remote desktop for interactive debugging.  It's got to have every platform, including things like game consoles
where you need license agreements, which is probably a no-go in reality because corporations are jerks.  There's got to be superb customer
service, because if I can't rely on it for builds at every moment of every day then it's a no-go.  Unfortunately, programmers are almost
uniformly moronic about this kind of thing (in that they massively overestimate their own ability to manage these things quickly) so wouldn't
want to pay what it costs to run that service.

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/WQaFUHwxpT8" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/2428755579913151009/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=2428755579913151009" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/2428755579913151009?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/2428755579913151009?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/04/04-21-13-how-to-grow-linux-disk-under.html" title="04-21-13 - How to grow Linux disk under VMWare" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;DE8GR384eSp7ImA9WhBbEEs.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-80694978702544931</id><published>2013-04-10T12:26:00.000-07:00</published><updated>2013-05-08T19:33:46.131-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-05-08T19:33:46.131-07:00</app:edited><title>04-10-13 - Waitset Resignal Notes</title><content type="html">I've been redoing my low level waitset and want to make some notes.
Some previous discussion of the same issues here :

&lt;P&gt;

&lt;A HREF="http://cbloomrants.blogspot.com/2011/11/11-28-11-some-lock-free-rambling.html"&gt; cbloom rants 11-28-11 - Some lock-free rambling &lt;/A&gt; &lt;BR&gt;
&lt;A HREF="http://cbloomrants.blogspot.com/2011/11/11-30-11-some-more-waitset-notes.html"&gt; cbloom rants 11-30-11 - Some more Waitset notes &lt;/A&gt; &lt;BR&gt;
&lt;A HREF="http://cbloomrants.blogspot.com/2011/12/12-08-11-some-semaphores.html"&gt; cbloom rants 12-08-11 - Some Semaphores &lt;/A&gt; &lt;BR&gt;

&lt;P&gt;

In particular, two big things occurred to me :

&lt;P&gt;

&lt;B&gt;1.&lt;/B&gt; I talked before about the "passing on the signal" issue.  See the above posts for more in depth details,
but in brief the issue is if you are trying to do NotifyOne (instead of NotifyAll), and you have a double-check
waitset like this :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

{
waiter = waitset.prepare_wait(condition);

if ( double check )
{
    waiter.cancel();
}
else
{
    waiter.wait();
    // possibly loop and re-check condition
}

}

&lt;/PRE&gt;&lt;/FONT&gt;

then if you get a signal between prepare_wait and cancel, you didn't need that signal, so a wakeup of
another thread that did need that signal can be missed.

&lt;P&gt;

Now, I talked about this before as an "ugly hack", but over time thinking about it, it doesn't seem so bad.
In particular, if you put the resignal inside the cancel() , so that the client code looks just like the above,
it doesn't need to know about the fact that the resignal mechanism is happening at all.

&lt;P&gt;

So, the new concept is that cancel atomically removes the waiter from the waitset and sees if it got a signal
that it didn't consume.  If so, it just passes on that signal.  The fact that this is okay and not a hack
came to me when I thought about under what conditions this actually happens.  If you recall from the earlier
posts, the need for resignal comes from situations like :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

T0 posts sem , and signals noone
T1 posts sem , and signals T3
T2 tries to dec count and sees none, goes into wait()
T3 tries to dec count and gets one, goes into cancel(), but also got the signal - must resignal T2

&lt;/PRE&gt;&lt;/FONT&gt;

the thing is this can only happen if all the threads are awake and racing against each other (it requires
a very specific interleaving); that is,
the T3 in particular that decs count and does the resignal had to be awake anyway (because its first check
saw no count, but its double check did dec count, so it must have raced with the sem post).  It's not like you
wake up a thread you shouldn't have and then pass it on.  The thread wakeup scheme is just changed
from :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

T0 sem.post --wakes--&gt; T2 sem.wait
T1 sem.post --wakes--&gt; T3 sem.wait

to :

T0 sem.post
T1 sem.post --wakes--&gt; T3 sem.wait --wakes--&gt; T2 sem.wait

&lt;/PRE&gt;&lt;/FONT&gt;

that is, one of the consumer threads wakes its peer.  This is a tiny performance loss, but it's a pretty
rare race, so really not a bad thing.

&lt;P&gt;

The whole "double check" pathway in waitset only happens in a race case.  It occurs when one thread sets the
condition you want right at the same time that you check it, so your first check fails and after you prepare_wait,
your second check passes.  The resignal only occurs if you are in that race path, and also the setting thread
sent you a signal between your prepare_wait and cancel, *and* there's another thread waiting on that same signal
that should have gotten it.  Basically this case is quite rare, we don't care too much about it being fast or
elegant (as long as it's not disastrously slow), we just need behavior to be correct when it does happen - and the "pass on the signal" mechanism gives
you that.

&lt;P&gt;

The advantage of being able to do just a NotifyOne instead of a NotifyAll is so huge that it's worth adopting
this as standard practice in waitset.

&lt;P&gt;

&lt;B&gt;2.&lt;/B&gt; It then occurred to me that the waitset PrepareWait and Cancel could be made lock-free pretty trivially.

&lt;P&gt;

Conceptually, they are made lock free by turning them into messages.  "Notify" is now the receiver of messages
and the scheme is now :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

{
waiter w;
waitset : send message { prepare_wait , &amp;w, condition };

if ( double check )
{
    waitset : send message { cancel , &amp;w };
    return;
}

w.wait();
}

-------

{
waitset Notify(condition) :
first consume all messages
do prepare_wait and cancel actions
then do the normal notify
eg. see if there are any waiters that want to know about "condition"
}

&lt;/PRE&gt;&lt;/FONT&gt;

The result is that the entire wait-side operation is lock free.  The notify-side still uses a lock to
ensure the consistency of the wait list.

&lt;P&gt;


This greatly reduces contention in the most common usage patterns :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

Mutex :

only the mutex owner does Notify
 - so contention of the waitset lock is non-existant
many threads may try to lock a mutex
 - they do not have any waitset-lock contention

Semaphore :

the common case of one producer and many consumers (lots of threads do wait() )
 - zero contention of the waitset lock

the less common case of many producers and few consumers is slow

&lt;/PRE&gt;&lt;/FONT&gt;

Another way to look at it is instead of doing little bits of waitlist maintenance in three
different places (prepare_wait,notify,cancel) which each have to take a lock on the list,
all the maintenance is moved to one spot.

&lt;P&gt;

Now there are some subtleties.

&lt;P&gt;

If you used a fresh "waiter" every time, things would be simple.  But for efficiency you don't
want to do that.  In fact I use one unique waiter per thread.  There's only one OS waitable handle
needed per thread and you can use that to implement every threading primitive.  But now you have
to be able to recycle the waiter.  Note that you don't have to worry about other threads using your
waiter; the waiter is per-thread so you just have to worry about when you come around and use it
again yourself.

&lt;P&gt;

If you didn't try to do the lock-free wait-side, recycling would be easy.  But with the lock-free
wait side there are some issues.

&lt;P&gt;

First is that when you do a prepare-then-cancel , your cancel might not actually be done for a long time
(it was just a request).  So if you come back around on the same thread and call prepare() again,
prepare has to check if that earlier cancel has been processed or not.  If it has not, then you just have
to force the Notify-side list maintenance to be done immediately.

&lt;P&gt;

The second related issue is that the lock-free wait-side can give you spurious signals to your waiter.
Normally prepare_wait could clear the OS handle, so that when you wait on it you know that you got the
signal you wanted.  But because prepare_wait is just a message and doesn't take the lock on the waitlist,
you might actually still be in the waitlist from the previous time you used your waiter.  Thus you can
get a signal that you didn't want.  There are a few solutions to this; one is to allow spurious signals
(I don't love that); another is to detect that the signal is spurious and wait again (I do this).  Another
is to always just grab the waitlist lock (and do nothing) in either cancel or prepare_wait.

&lt;P&gt;&lt;HR&gt;&lt;P&gt;

Ok, so we now have a clean waitset that can do NotifyOne and gaurantee no spurious signals.  Let's use it.

&lt;P&gt;

You may recall we've looked at a simple waitset-based mutex before :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

U32 thinlock;

Lock :
{
    // first check :
    while( Exchange(&amp;thinlock,1) != 0 )
    {
        waiter w; // get from TLS
        waitset.PrepareWait( &amp;w, &amp;thinlock );

        // double check and put in waiter flag :
        if ( Exchange(&amp;thinlock,2) == 0 )
        {
            // got it
            w.Cancel();
            return;
        }

        w.Wait();
    }
}

Unlock :
{
    if ( Exchange(&amp;thinlock,0) == 2 )
    {
        waitset.NotifyAll( &amp;thinlock );
    }
}
&lt;/PRE&gt;&lt;/FONT&gt;

This mutex is non-recursive, and of course you should spin doing some TryLocks before going into the wait loop
for efficiency.

&lt;P&gt;

This was an okay way to build a mutex on waitset when all you had was NotifyAll.  It only does the notify
if there are waiters, but the big problem with it is if you have multiple waiters, it wakes them all and
then they all run in to try to grab the mutex, and all but one fail and go back to sleep.  This is a common
type of unnecessary-wakeup thread-thrashing pattern that sucks really bad.

&lt;P&gt;

(any time you write threading code where the wakeup means "hey wakeup and see if you can grab an atomic"
(as opposed to "wakeup you got it"), you should be concerned (particularly when the wake is a broadcast))

&lt;P&gt;

Now that we have NotifyOne we can fix that mutex :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

U32 thinlock;

Lock :
{
    // first check :
    while( Exchange(&amp;thinlock,2) != 0 ) // (*1)
    {
        waiter w; // get from TLS
        waitset.PrepareWait( &amp;w, &amp;thinlock );

        // double check and put in waiter flag :
        if ( Exchange(&amp;thinlock,2) == 0 )
        {
            // got it
            w.Cancel(waitset_resignal_no); // (*2)
            return;
        }

        w.Wait();
    }
}

Unlock :
{
    if ( Exchange(&amp;thinlock,0) == 2 ) // (*3)
    {
        waitset.NotifyOne( &amp;thinlock );
    }
}
&lt;/PRE&gt;&lt;/FONT&gt;

We changed the NotifyAll to NotifyOne , but two funny bits are worth noting : (*1) - we must now immediately
exchange in the waiter-flag here; in the NotifyAll case it worked to put a 1 in there for funny reasons
(see
&lt;A HREF="http://cbloomrants.blogspot.com/2011/07/07-15-11-review-of-many-mutex.html"&gt; cbloom rants 07-15-11 - Review of many Mutex implementations &lt;/A&gt; ,
where this type of mutex is discussed as "Three-state mutex using Event" ), but it doesn't work with the NotifyOne.
(*2) - with a mutex you do not need to pass on the signal when you stole it and cancelled.  The reason is just that
there can't possibly be any more mutex for another thread to consume.  A mutex is a lot like a semaphore with a maximum count of 1
(actually it's exactly like it for non-recursive mutexes);
you only need to pass on the signal when it's possible that some other thread needs to know about it.
(*3) - you might think the check for == 2 here is dumb because we always put in a 2, but there's code you're not seeing.
TryLock should still put in a 1, so in the uncontended cases the thinlock will have a value of 1 and no Notify is done.  The thinlock
only goes to a 2 if there is some contention, and then the value stays at 2 until the last unlock of that contended sequence.

&lt;P&gt;

Okay, so that works, but it's kind of silly.  With the mechanism we have now we can do a much neater mutex :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

U32 thinlock; // = 0 initializes thinlock

Lock :
{
    waiter w; // get from TLS
    waitset.PrepareWait( &amp;w, &amp;thinlock );

    if ( Fetch_Add(&amp;thinlock,1) == 0 )
    {
        // got the lock (no need to resignal)
        w.Cancel(waitset_resignal_no);
        return;
    }

    w.Wait();
    // woke up - I have the lock !
}

Unlock :
{
    if ( Fetch_Add(&amp;thinlock,-1) &gt; 1 )
    {
        // there were waiters
        waitset.NotifyOne( &amp;thinlock );
    }
}
&lt;/PRE&gt;&lt;/FONT&gt;

The mutex is just a wait-count now.  (as usual you should TryLock a few times before jumping in to the PrepareWait).
This mutex is more elegant; it also has a small performance advantage in that it only calls NotifyOne when it really needs to;
because its gate is also a wait-count it knows if it needs to Notify or not.  The previous Mutex posted will always Notify on
the last unlock whether or not it needs to (eg. it will always do one Notify too many).

&lt;P&gt;

This last mutex is also really just a semaphore.  We can see it by writing a semaphore with our waitset :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

U32 thinsem; // = 0 initializes thinsem

Wait :
{
    waiter w; // get from TLS
    waitset.PrepareWait( &amp;w, &amp;thinsem );

    if ( Fetch_Add(&amp;thinsem,-1) &gt; 0 )
    {
        // got a dec on count
        w.Cancel(waitset_resignal_yes); // (*1)
        return;
    }

    w.Wait();
    // woke up - I got the sem !
}

Post :
{
    if ( Fetch_add(&amp;thinsem,1) &lt; 0 )
    {
        waitset.NotifyOne( &amp;thinsem );
    }
}

&lt;/PRE&gt;&lt;/FONT&gt;

which is obviously the same.  The only subtle change is at (*1) - with a semaphore we must do the resignal,
because there may have been several posts to the sem (contrast with mutex where there can only be one Unlock at a time;
and the mutex itself serializes the unlocks).

&lt;P&gt;&lt;HR&gt;&lt;P&gt;

Oh, one very subtle issue that I only discovered due to relacy :

&lt;P&gt;

waitset.Notify requires a #StoreLoad between the condition check and the notify call.  That is, the standard
pattern for any kind of "Publish" is something like :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

Publish
{
    change shared variable
    if ( any waiters )
    {
        #StoreLoad
        waitset.Notify()
    }
}

&lt;/PRE&gt;&lt;/FONT&gt;

Now, in most cases, such as the Sem and Mutex posted above, the Publish uses an atomic RMW op.  If that
is the case, then you don't need to add any more barriers - the RMW synchronizes for you.  But if you do
some kind of more weakly ordered primitive, then you must force a barrier there.

&lt;P&gt;

This is the exact same issue that I've run into before and forgot about again :

&lt;P&gt;

&lt;A HREF="http://cbloomrants.blogspot.com/2011/07/07-31-11-example-that-needs-seqcst_31.html"&gt; cbloom rants 07-31-11 - An example that needs seq_cst - &lt;/A&gt; &lt;BR&gt;
&lt;A HREF="http://cbloomrants.blogspot.com/2011/08/08-09-11-threading-links.html"&gt; cbloom rants 08-09-11 - Threading Links &lt;/A&gt; (see threads on eventcount) &lt;BR&gt;
&lt;A HREF="http://cbloomrants.blogspot.com/2012/06/06-01-12-on-c-atomic-fences-part-3.html"&gt; cbloom rants 06-01-12 - On C++ Atomic Fences Part 3 &lt;/A&gt; &lt;BR&gt;

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/N_LXXgceDKs" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/80694978702544931/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=80694978702544931" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/80694978702544931?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/80694978702544931?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/04/04-10-13-waitset-resignal-notes.html" title="04-10-13 - Waitset Resignal Notes" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;CEcGSH87fSp7ImA9WhBWFk8.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-4952912724879107195</id><published>2013-04-04T17:37:00.001-07:00</published><updated>2013-04-10T12:27:09.105-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-04-10T12:27:09.105-07:00</app:edited><title>04-04-13 - Tabdir</title><content type="html">I made a fresh build of "tabdir" my
&lt;A HREF="http://cbloomrants.blogspot.com/2008/07/07-31-08-1.html"&gt; old &lt;/A&gt;
(&lt;A HREF="http://cbloomrants.blogspot.com/2008/08/08-16-08-1_9818.html"&gt; old &lt;/A&gt;) utility that does
a recursive dirlisting in tabbed-text format for "tabview".

&lt;P&gt;

Download : &lt;A HREF="http://www.cbloom.com/exe/tabdir.zip"&gt; tabdir 320k zip &lt;/A&gt;

&lt;P&gt;

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;
tabdir -?
usage : tabdir [opts] [dir]
options:
 -v  : view output after writing
 -p  : show progress of dir enumeration (with interactive keys)
 -w# : set # of worker threads
 -oN : output to name N [r:\tabdir.tab]
&lt;/PRE&gt;&lt;/FONT&gt;

&lt;P&gt;

This new tabdir is built on Oodle so it has a multi-threaded dir lister for much greater speed. (*)

&lt;P&gt;

Also note to self : I fixed tabview so it works as a shell file association.  I hit this all the time and
always forget it : if something works on the command line but not as a shell association, it's probably
because the shell passes you quotes around file names, so you need a little code to strip quotes from args.

&lt;P&gt;

Someday I'd like to write an even faster tabdir that reads the NTFS volume directory information directly,
but chances are that will never happen.

&lt;P&gt;

One odd thing I've spotted with this tabdir is that the Windows SxS Assembly dirs take a ton of time to
enumerate on my machine.  I dunno if they're compressed or WTF the deal is with them (I pushed it on the
todo stack to investigate), but they're like 10X slower than any other dir.  (could just be the larger number
of files in there)

&lt;P&gt;

I never did this before because I didn't expect multi-threaded dir enumeration to be a big win; I thought it
would just cause seek thrashing, and if you're IO bound anyway then multi-threading can't help, can it?  Well,
it turns out the Win32 dir enum functions have quite a lot of CPU overhead, so multi-threading does in fact help
a bit :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;
# workers | elapsed time
1 | 12.327
2 | 10.450
3 | 9.710
4 | 9.130
&lt;/PRE&gt;&lt;/FONT&gt;

&lt;P&gt;

(* = actually the big speed win was not multi-threading, it's that the old tabdir did something rather
dumb in the file enum.  It would enum all files, and then do GetInfo() on each one to get the file sizes.
The new one just uses the file infos that are returned as part of the Win32 enumeration, which is
massively faster).

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/gI84R2OVBy0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/4952912724879107195/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=4952912724879107195" title="8 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/4952912724879107195?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/4952912724879107195?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/04/04-04-13-tabdir.html" title="04-04-13 - Tabdir" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>8</thr:total></entry><entry gd:etag="W/&quot;CEcHQX47fyp7ImA9WhBWFk8.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-5595958967324891130</id><published>2013-04-04T16:15:00.000-07:00</published><updated>2013-04-10T12:27:10.007-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-04-10T12:27:10.007-07:00</app:edited><title>04-04-13 - Worker Thread Forward Permit Delay-Kicks</title><content type="html">I've had this small annoying issue for a while, and finally thought of a pretty simple solution.

&lt;P&gt;

You may recall, I use a 
&lt;A HREF="http://cbloomrants.blogspot.com/2011/12/12-03-11-worker-thread-system-with.html"&gt; worker thread system with forward "permits" (reversed dependencies)  &lt;/A&gt;.  When any handle
completes it sees if that completion should trigger any followup handles, and if so
those are then launched.  Handles may be SPU jobs or IOs or CPU jobs or whatever.  The problem I will
talk about occurred when the predessor and the followup were both CPU jobs.

&lt;P&gt;

I'll talk about a specific case to be concrete : decoding compressed data while reading it from disk.

&lt;P&gt;

To decode each chunk of LZ data, a chunk-decompress job is made.  That job depends on the IO(s) that
read in the compressed data for that chunk.  It also depends on the previous chunk if the chunk is
not a seek-reset point.  So in the case of a non-reset chunk, you have a dependency on an IO and a
previous CPU job.  Your job will be started by one or the other, whichever finishes last.

&lt;P&gt;

Now, when decompression was IO bound, then the IO completions were kicking off the decompress jobs,
and everything was fine.

&lt;P&gt;

In these timelines, the second line is IO and the bottom four are workers.  (click images for higher res)

&lt;P&gt;

LZ Read and Decompress without seek-resets, IO bound :

&lt;P&gt;

&lt;A HREF="http://www.cbloom.com/blogimages/lz_overlap_iobound.png"&gt;
&lt;IMG SRC="http://www.cbloom.com/blogimages/lz_overlap_iobound.png" WIDTH=800&gt;
&lt;/A&gt;

&lt;P&gt;

You can see the funny fans of lines that show the dependency on the previous decompress job and also the IO.
Yellow is a thread that's sleeping.

&lt;P&gt;

You may notice that the worker threads are cycling around.  That's not really ideal, but it's not
related to the problem I'm talking about today.  (that cycling is caused by the fact that the OS semaphore
is FIFO.  For something like worker threads, we'd actually rather have a LIFO semaphore, because it makes
it more likely that you get a thread with something useful still hot in cache.  Someday I'll replace my
OS semaphore with my own LIFO one, but for now this is a minor performance bug).  (Win32 docs say that they
don't gaurantee any particular order, but in my experience threads of equal priority are always FIFO in
Win32 semaphores)

&lt;P&gt;

Okay, now for the problem.  When the IO was going fast, so we were CPU bound, it's the prior decompress
job that triggers the followup work.

&lt;P&gt;

But something bad happened due to the forward permit system.  The control flow was something like this :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

On worker thread 0

wake from semaphore
do on an LZ decompress job
mark job done
completion change causes a permits check
permits check sees that there is a pending job triggered by this completion
  -&gt; fire off that handle
   handle is pushed to worker thread system
   no worker is available to do it, so wake a new worker and give him the job
finalize (usually delete) job I just finished
look for more work to do
   there is none because it was already handed to a new worker

&lt;/PRE&gt;&lt;/FONT&gt;

And it looked like this :

&lt;P&gt;

LZ Read and Decompress without seek-resets, CPU bound, naive permits :

&lt;P&gt;

&lt;A HREF="http://www.cbloom.com/blogimages/lz_deps_old.png"&gt;
&lt;IMG SRC="http://www.cbloom.com/blogimages/lz_deps_old.png" WIDTH=800&gt;
&lt;/A&gt;

&lt;P&gt;

You can see each subsequent decompress job is moving to another worker thread.
Yuck, bad.

&lt;P&gt;

So the fix in Oodle is to use the "delay-kick" mechanism, which I'd already been using
for coroutine refires (which had a similar problem; the problem occurred when you yielded a
coroutine on something like an IO, and the IO was done almost immediately; the coroutine would get
moved to another worker thread instead of just staying on the same one and continuing from
the yield as if it wasn't there).

&lt;P&gt;

The scheme is something like this :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

On each worker thread :

Try to pop a work item from the "delay kick queue"
  if there is more than one item in the DKQ,
    take one for myself and "kick" the remainder
    (kick means wake worker threads to do the jobs)

If nothing on DKQ, pop from the main queue
  if nothing on main queue, wait on work semaphore

Do your job

Set "delay kick" = true
  ("delay kick" has to be in TLS of course)
Mark job as done
Permit system checks for successor handles that can now run 
  if they exist, they are put in the DKQ instead of immediately firing
Set "delay kick" = false

Repeat

&lt;/PRE&gt;&lt;/FONT&gt;

In brief : work that is made runnable by the completion of work is not fired until the worker
thread that did the completion gets its own shot at grabbing that new work.  If the completion made 4 jobs
runnable, the worker will grab 1 for itself and kick the other 3.  The kick is no longer in the completion
phase, it's in the pop phase.

&lt;P&gt;

And the result is :

&lt;P&gt;

LZ Read and Decompress without seek-resets, CPU bound, delay-kick permits :

&lt;P&gt;

&lt;A HREF="http://www.cbloom.com/blogimages/lz_deps_new.png"&gt;
&lt;IMG SRC="http://www.cbloom.com/blogimages/lz_deps_new.png" WIDTH=800&gt;
&lt;/A&gt;

&lt;P&gt;

Molto superiore.  

&lt;P&gt;

These last two timelines are on the same time scale, so you can see just from the visual that eliminating the
unnecessary thread switching is about a 10% speedup.

&lt;P&gt;

Anyway, this particular issue may not apply to your worker thread system, or you may have other solutions.
I think the main take-away is that while worker thread systems seem very simple to write at first, there's actually
a huge amount of careful fiddling required to make them really run well.  You have to be constantly vigilant
about doing test runs and checking threadprofile views like this to ensure that what's actually happening
matches what you think is happening.
Err, oops, I think I just accidentally wrote an advertisement for
&lt;A HREF="http://www.radgametools.com/telemetry.htm"&gt; Telemetry &lt;/A&gt;.

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/rz6IYnlHgOY" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/5595958967324891130/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=5595958967324891130" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/5595958967324891130?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/5595958967324891130?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/04/04-04-13-worker-thread-forward-permit.html" title="04-04-13 - Worker Thread Forward Permit Delay-Kicks" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>2</thr:total></entry><entry gd:etag="W/&quot;DUQMQXY6eSp7ImA9WhBWEUw.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-5531425803741843817</id><published>2013-04-04T14:20:00.000-07:00</published><updated>2013-04-04T16:16:20.811-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-04-04T16:16:20.811-07:00</app:edited><title>04-04-13 - Oodle Compression on BC1 and WAV</title><content type="html">I put some stupid filters in, so here are some results for the record and my reference.

&lt;P&gt;

BC1 (DXT1/S3TC) DDS textures : &lt;P&gt;

&lt;IMG SRC="http://chart.apis.google.com/chart?cht=bvg&amp;chs=900x320&amp;chbh=10,0,3&amp;chxt=x,y&amp;chco=3072F3,00aaaa,f0da00,aa00aa&amp;chxr=1,0,100,10&amp;chxs=0,000000,12,0,lt|1,000000,10,1,lt&amp;chdl=Zip|RAR|LZMA|Oodle&amp;chd=e:rqrn0liNaum7rApn,rqrI1dXVaHlmonoJ,k9mwtXboVrgSk8h8,lCl6soYUXAgnkwh8&amp;chxl=0:|a.dds|b.dds|c.dds|d.dds|e.dds|f.dds|g.dds|h.dds&amp;chg=0,10,5,5&amp;chtt=percent+of+raw"&gt;

&lt;P&gt;

All compressors run in max-compress mode.  Note that it's not entirely fair because Oodle has the BC1 swizzle
and the others don't.

&lt;P&gt;

Some day I'd like to do a BC1-specific encoder.  Various ideas and possibilities there.  Also RD-DXTC.

&lt;P&gt;

I also did a WAV filter.  This one is particularly ridiculous because nobody uses WAV, and if you want to
compress audio you should use a domain-specific compressor, not just OodleLZ with a simple delta filter.
I did it because I was annoyed that RAR beat me on WAVs (due to its having a multimedia filter), and RAR should
never beat me.

&lt;P&gt;

WAV compression : &lt;P&gt;

&lt;IMG SRC="http://chart.apis.google.com/chart?cht=bvg&amp;chs=900x320&amp;chbh=10,0,3&amp;chxt=x,y&amp;chco=3072F3,00aaaa,f0da00,f080f0,aa00aa&amp;chxr=1,0,100,10&amp;chxs=0,000000,12,0,lt|1,000000,10,1,lt&amp;chdl=RARnofilt|RARfilt|FLAC|Oodlenofilt|Oodlefilt&amp;chd=e:7p6p7P,wCtXxJ,m5kJp3,7l6k7I,rzpDtP&amp;chxl=0:|H|I|S&amp;chg=0,10,5,5&amp;chtt=percent+of+raw"&gt;

&lt;P&gt;

See also : 
&lt;A HREF="http://chart.apis.google.com/chart?cht=bvg&amp;chs=900x320&amp;chbh=10,0,3&amp;chxt=x,y&amp;chco=3072F3,00aaaa,f0da00,ff0000,f080f0,aa00aa&amp;chxr=1,0,100,10&amp;chxs=0,000000,12,0,lt|1,000000,10,1,lt&amp;chdl=7z|RARnofilt|RARfilt|FLAC|Oodlenofilt|Oodlefilt&amp;chd=e:5O0844,7p6p7P,wCtXxJ,m5kJp3,7l6k7I,rzpDtP&amp;chxl=0:|H|I|S&amp;chg=0,10,5,5&amp;chtt=percent+of+raw"&gt; same
chart with 7z &lt;/A&gt; (not really fair cuz 7z doesn't have a WAV filter)

&lt;P&gt;

Happy to see that Oodle-filt handily beats RAR-filt.  I'm using just a trivial linear gradient predictor :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

out[i] = in[i] - 2*in[i-1] + in[i-2]

&lt;/PRE&gt;&lt;/FONT&gt;

this could surely be better, but whatever, WAV filtering is not important.

&lt;P&gt;

I also did a simple BMP delta filter and EXE (BCJ/relative-call) transform.  I don't really want to get into
the business of offering all kinds of special case filters the way some of the more insane modern archivers
do (like undoing ZLIB compression so you can recompress it, or WRT), but anyhoo there's a few.

&lt;P&gt;&lt;HR&gt;&lt;P&gt;

ADDED : I will say something perhaps useful about the WAV filter.

&lt;P&gt;

There's a bit of a funny issue because the WAV data is 16 bit (or 24 or 32), and the back-end entropy
coder in a simple LZ is 8 bit.

&lt;P&gt;

If you just take a 16-bit delta and put it into bytes, then most of your values will be around zero,
and you'll make a stream like :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;
[00 00] [00 01] [FF FF] [FF F8] [00 04] ...
&lt;/PRE&gt;&lt;/FONT&gt;

The bad thing you should notice here are the high bytes are switching between 00 and FF even though the
values have quite a small range.  (Note that the common thing of centering the values with +32768 doesn't
change this at all).

&lt;P&gt;

You can make this much better just by doing a bias of +128.  That makes it so the most important range
of values (around zero (specifically [-128,127])) all have the same top byte.

&lt;P&gt;

I think it might be even slightly better to do a "folded" signed-&gt;unsigned map, like

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;
{ 0,-1,1,-2,2,-3,...,32767,-32768 }
&lt;/PRE&gt;&lt;/FONT&gt;

The main difference being that values like -129 and +128 get the same high byte in this mapping, rather
than two different high bytes in the simple +128 bias scheme.

&lt;P&gt;

Of course you really want a separate 8-bit huffman for alternating pairs of bytes.  One way to get that
is to use a few bottom bits of position as part of the literal context.  Also, the high byte should really
be used as context for the low byte.  But both of those are beyond the capabilities of my simple LZ-huffs
so I just deinterleave the high and low bytes to two streams.

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/ThkklL3DfWM" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/5531425803741843817/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=5531425803741843817" title="11 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/5531425803741843817?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/5531425803741843817?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/04/04-04-13-oodle-compression-on-bc1-and.html" title="04-04-13 - Oodle Compression on BC1 and WAV" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>11</thr:total></entry><entry gd:etag="W/&quot;CkMDRng4fyp7ImA9WhBWEUw.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-4118978187851925946</id><published>2013-04-02T00:00:00.000-07:00</published><updated>2013-04-04T14:21:17.637-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-04-04T14:21:17.637-07:00</app:edited><title>04-02-13 - The Method of Random Holdouts</title><content type="html">Quick and dirty game-programming style hacky way to make fitting model parameters somewhat better.

&lt;P&gt;

You have some testset {T} of many items, and you wish to fit some heuristic model M over T which has some parameters.
There may be multiple forms of the model and you aren't sure which is best, so you wish to compare models against
each other.

&lt;P&gt;

For concreteness, you might imagine that T is a bunch of images, and you are trying to make a perceptual DXTC coder;
you measure block error in the encoder as something like (SSD + a * SATD ^ b + c * SSIM_8x8 ) , and the goal is to minimize the
total image error in the outer loop, measured using something complex like IW-MS-SSIM or "MyDCTDelta" or whatever.  So you
are trying to fit the parameters {a,b,c} to minimize an error.

&lt;P&gt;

For reference, the naive training method is : run the model on all data in {T}, optimize parameters to minimize error over {T}.

&lt;P&gt;

The method of random holdouts goes like this :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

Run many trials

On each trial, take the testset T and randomly separate it into a training set and a verification set.
Typically training set is something like 75% of the data and verification is 25%.

Optimize the model parameters on the {training set} to minimize the error measure over {training set}.

Now run the optimized model on the {verification set} and measure the error there.
This is the error that will be used to rate the model.

When you make the average error, compensate for the size of the model thusly :
average_error = sum_error / ( [num in {verification set}] - [dof in model] )

Record the optimal parameters and the error for that trial

&lt;/PRE&gt;&lt;/FONT&gt;

Now you have optimized parameters for each trial, and an error for each trial.  You can take the average over
all trials, but you can also take the sdev.  The sdev tells you how well your model is really working - if it's
not close to zero then you are missing something important in your model.  A term with a large sdev might just be
a random factor that's not useful in the model, and you should try again without it.

&lt;P&gt;

The method of random holdouts reduces over-training risk, because in each run you are measuring error only
on data samples that were not used in training.

&lt;P&gt;

The method of random holdouts gives you a decent way to compare models which may have different numbers of DOF.
If you just use the naive method of training, then models with more DOF will always appear better, because they
are just fitting your data set.

&lt;P&gt;

That is, in our example say that (SSD + a * SATD ^ b) is actually the ideal model and the extra term ( + c * SSIM_8x8 )
is not useful.  As long as it's not just a linear combo of other terms, then naive training will find a "c" such
that that term is used to compensate for variations in your particular testset.  And in fact that incorrect "c" can be quite a large value
(along with a negative "a").

&lt;P&gt;

This kind of method can also be used for fancier stuff like building complex models from ensembles of simple models, "boosting" models, etc.
But it's useful even in this case where we wind up just using a simple linear model, because you can see how it varies over the random
holdouts.

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/8NVQcIrv8GM" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/4118978187851925946/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=4118978187851925946" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/4118978187851925946?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/4118978187851925946?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/04/04-02-13-method-of-random-holdouts.html" title="04-02-13 - The Method of Random Holdouts" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;AkcGRXk7cCp7ImA9WhBXGEg.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-6880504964376050184</id><published>2013-03-31T10:50:00.003-07:00</published><updated>2013-04-01T16:13:44.708-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-04-01T16:13:44.708-07:00</app:edited><title>03-31-13 - Some GDC Observations</title><content type="html">From my very limitted view of GDC standing at the RAD booth.

&lt;P&gt;

1. Programming is dead.  There were basically zero programming talks at GDC this year.  That's sad, but
also perfectly reasonable since programming is not the problem any more (*).  (* = assuming that you just
want to make the same old shit with different graphics)

&lt;P&gt;

2. Piece of shit mobile games that people have thrown together in a month look better than AAA games 10 years
ago.  It's not just that GPU's are so much better, but the free engines are really amazing these days, and
the content pipes are so much better, and there are so many more decent 3d artists that can just make tons
of content.

&lt;P&gt;

3. Game developers look like human beings now.  If you looked at a GDC when I first started going, we were
all classic troglodyte nerds; unwashed sweatshirts and open backpacks with slide-rules falling out.
We were all vampirically pale from being locked in a dark box surrounded by our giant CRTs.  (more generally
I'm noticing that the average fitness level (on the west coast anyway) is way up in the past 5 years or so).

&lt;P&gt;

4. Mobile is dead, downloadable is king.  I do an unscientific random sampling every year just by
asking the people who stop by the RAD booth what they're working on.  For the past few years it has
been mobile mobile "we're making a game for ios and android", tons of kids and startups and indies
trying to get into mobile.  That seems to be gone, and the new gold rush
is "downloadable" (PC, XBLA, etc).

&lt;P&gt;

5. Games are tacky and tasteless.  One of the worst things for me standing at the booth is just
hearing and seeing games all day.  I don't play games much, I never watch TV with commercials, and I never
watch things like cable news with all the excessive HUD and overstimulation, I find all that stuff
abusive of my senses.  Games are stuck in this awful "bling bling whoosh blammo" flashing and fast-cuts
and just really tacky aesthetic.  It's just like TV ads, or a bit like standing in the slot machine
section of a casino (which is surely some level of hell).

&lt;P&gt;

6. I saw one really amazing game at GDC that stood out from the rest.
It had all the players instantly smiling and laughing.  It was
fun for kids and adults.  It created a feeling of group affinity.  Everyone around wanted to join in.
It was even beneficial to the body.  It was an inflatable ball.  Personally I had the "holy shit what
we make is total crap" (actually worse than crap, because it's actively harmful to the body and mind)
epiphany some 10+ years ago, but it just struck me so hard standing there with all these
shit games around and people having so much more fun in the most basic game in the non-electronic world.

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/2q9L6FO8_7E" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/6880504964376050184/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=6880504964376050184" title="15 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/6880504964376050184?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/6880504964376050184?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/03/03-31-13-some-gdc-observations.html" title="03-31-13 - Some GDC Observations" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>15</thr:total></entry><entry gd:etag="W/&quot;DUQDSXY9cCp7ImA9WhBWEUw.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-2753030058507776305</id><published>2013-03-31T10:50:00.001-07:00</published><updated>2013-04-04T16:16:18.868-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-04-04T16:16:18.868-07:00</app:edited><title>03-31-13 - Market Equilibrium</title><content type="html">I'm sure there's some standard economic theory of all this but hey let's reinvent the wheel without
any background.

&lt;P&gt;

There's a fundamental principal of any healthy (*) market that the reward for some labor is equal across
all fields - proportional only to standard factors like the risk factor, the scarcity of labor, the
capital required for entry, etc. (* = more on "healthy" later).
The point is that those factors have *nothing* to do with the details of the field.

&lt;P&gt;

The basic factor at play is that if some field changes and suddenly becomes much more profitable, then
people will flood into that field, and the risk-equal-capital-return will keep going down until it becomes equal to
other fields.  Water flows downhill, you know.

&lt;P&gt;

When people like Alan Greenspan try to tell you that oh this new field is completely unlike anything we've seen
in the past because of blah blah - it doesn't matter, they may have lots of great points that seem reasonable
in isolation, but the equilibrium still applies.  The pay of a computer programmer is set by the pay of a farmer,
because if the difference were out of whack, the farmer would quit farming and start programming; they pay of
programmers will go down and the wages of farmers
will go up, then the price of lettuce will go up, and in the end a programmer
won't be able to buy any more lettuce than anyone else in a similar job.  ("similar" only in terms of risk,
ease of entry, rarity of talent, etc.)

&lt;P&gt;

We went through a drive-through car wash yesterday and Tasha idly wondered how much the car wash operator makes
from an operation like that.  Well, I bet it's about the same as a quick-lube place makes, and that's about the
same as a dry cleaner, and it's about the same as a pizza place (which has less capital outlay but more risk), because if one of them was much more profitable,
there would be more competition until equilibrium was reached.

&lt;P&gt;

Specifically I've been thinking about this because of the current indie game boom on the PC, which seems to be a
bit of a magic gold rush at the moment.  That almost inevitably has to die out, it's just a question of when.
(so hurry up and get your game out before it does!).

&lt;P&gt;

But of course that leads us into the issue of broken markets, since all current game sales avenues are deeply
broken markets.

&lt;P&gt;

Equilibrium (like most naive economic theory) only applies to markets where there's fluidity, robust competition,
no monopolistic control, free information, etc.  And of course those don't happen in the real world.

&lt;P&gt;

Whenever a market is not healthy, it provides an opportunity for unbalanced reward, well out of equilibrium.

&lt;P&gt;

Lack of information can be particularly be a factor in small niches.  There can be a company that does
something random like make height-adjustable massage tables.  If they're a private operation and nobody really
pays attention to them, they can have super high profit levels for something that's not particularly difficult -
way out of equilibrium.  If other people knew how easy that business was, lots of others would enter, but due
to lack of information they don't.

&lt;P&gt;

Patents and other such mechanisms that create legally enforced distortions of the market.  Of course things
like the cable and utility systems are even worse.

&lt;P&gt;

On a large scale, government distortion means that huge fields like health care, finance, insurance, oil,
farming, etc. are all more profitable than they should be.

&lt;P&gt;

Perhaps the biggest issue in downloadable games is the oligopoly of App Store and Steam.  This creates an unhealthy
market distortion and it's hard to say exactly what the long term affect of that will be.  (of course you don't see
it as "unhealthy" if you are the one benefiting from the favor of the great powers; it's unhealthy in a market
fluidity and fair competition sense, and may slow or prevent equilibrium)

&lt;P&gt;

Of course new fields are not yet in equilibrium, and one of the best ways to "get rich quick" is to chase new fields.
Software has been out of equilibrium for the past 50 years, and is only recently settling down.  Eventually software
will be a very poorly paid field, because it requires very little capital to become a programmer, it's low risk,
and there are lots of people who can do it.

&lt;P&gt;

Note that in *every* field the best will always rise to the top and be paid accordingly.

&lt;P&gt;

Games used to be a great field to work in because it was a new field.  New fields are exciting, they offer
great opportunities for innovation, and they attract the best people.  Mature industries are well into equilibrium
and the only chances for great success are through either big risk, big capital investment, or crookedness.

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/t86fll2J89M" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/2753030058507776305/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=2753030058507776305" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/2753030058507776305?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/2753030058507776305?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/03/03-31-13-market-equilibrium.html" title="03-31-13 - Market Equilibrium" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;CEUHRH87eip7ImA9WhBXF0g.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-8292934099326261070</id><published>2013-03-31T09:19:00.003-07:00</published><updated>2013-03-31T10:50:35.102-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-03-31T10:50:35.102-07:00</app:edited><title>03-31-13 - Index - Game Threading Architecture</title><content type="html">Gathering the series for an index post :

&lt;P&gt;

&lt;A HREF="http://cbloomrants.blogspot.com/2011/08/08-01-11-game-threading-model_01.html"&gt; cbloom rants 08-01-11 - A game threading model &lt;/A&gt; &lt;BR&gt;
&lt;A HREF="http://cbloomrants.blogspot.com/2011/12/12-03-11-worker-thread-system-with.html"&gt; cbloom rants 12-03-11 - Worker Thread system with reverse dependencies &lt;/A&gt; &lt;BR&gt;
&lt;A HREF="http://cbloomrants.blogspot.com/2012/03/03-05-12-oodle-handle-table.html"&gt; cbloom rants 03-05-12 - Oodle Handle Table &lt;/A&gt; &lt;BR&gt;
&lt;A HREF="http://cbloomrants.blogspot.com/2012/03/03-08-12-oodle-coroutines.html"&gt; cbloom rants 03-08-12 - Oodle Coroutines &lt;/A&gt; &lt;BR&gt;
&lt;A HREF="http://cbloomrants.blogspot.com/2012/06/06-21-12-two-alternative-oodles.html"&gt; cbloom rants 06-21-12 - Two Alternative Oodles &lt;/A&gt; &lt;BR&gt;
&lt;A HREF="http://cbloomrants.blogspot.com/2012/07/07-19-12-experimental-futures-in-oodle.html"&gt; cbloom rants 07-19-12 - Experimental Futures in Oodle &lt;/A&gt; &lt;BR&gt;
&lt;A HREF="http://cbloomrants.blogspot.com/2012/10/10-26-12-oodle-rewrite-thoughts.html"&gt; cbloom rants 10-26-12 - Oodle Rewrite Thoughts &lt;/A&gt; &lt;BR&gt;
&lt;A HREF="http://cbloomrants.blogspot.com/2012/12/12-18-12-async-await-microsoft.html"&gt; cbloom rants 12-18-12 - Async-Await ; Microsoft's Coroutines &lt;/A&gt; &lt;BR&gt;
&lt;A HREF="http://cbloomrants.blogspot.com/2012/12/12-21-12-coroutine-centric-architecture.html"&gt; cbloom rants 12-21-12 - Coroutine-centric Architecture &lt;/A&gt; &lt;BR&gt;
&lt;A HREF="http://cbloomrants.blogspot.com/2012/12/12-21-12-coroutines-from-lambdas.html"&gt; cbloom rants 12-21-12 - Coroutines From Lambdas &lt;/A&gt; &lt;BR&gt;
&lt;A HREF="http://cbloomrants.blogspot.com/2012/12/12-06-12-theoretical-oodle-rewrite.html"&gt; cbloom rants 12-06-12 - Theoretical Oodle Rewrite Continued &lt;/A&gt; &lt;BR&gt;
&lt;A HREF="http://cbloomrants.blogspot.com/2013/02/02-23-13-threading-reasoning-behind.html"&gt; cbloom rants 02-23-13 - Threading - Reasoning Behind Coroutine Centric Design &lt;/A&gt; &lt;BR&gt;

&lt;P&gt;

My contention is not that this is "the ultimate solution".  I believe it's a good architecture using the techniques
that we currently have available, without doing anything that I consider bananas like writing your own programming
language (*).  Of course if you are platform-specific or know you can use C++11 there are small ways to make things more
convenient, but the fundamental architecture would be about the same (and assuming that you will never need to port to
a broken platform is a mistake I know well).

&lt;P&gt;

(* = a lot of people that I consider usually smart seem to think that writing a custom language is a great solution
for lots of problems.  Whenever we're talking about "oh reflection in C is a mess" or "dependency analysis should
be automatic", they'll throw out "well if you had the time you would just write a custom language that does all
this better".  Would you?  I certainly wouldn't.  I like using tools that actually work, that new hires are familiar
with, etc. etc. I don't have to list the pros of sticking with standard languages.  In my experience every clever
custom language for games is a huge fucking disaster and I would never advocate that as a good solution for any
problem)

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/BTjmM4cZvK4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/8292934099326261070/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=8292934099326261070" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/8292934099326261070?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/8292934099326261070?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/03/03-31-13-index-game-threading.html" title="03-31-13 - Index - Game Threading Architecture" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;DEADRnk5eCp7ImA9WhBXF0k.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-7944659493900989438</id><published>2013-03-31T09:19:00.001-07:00</published><updated>2013-03-31T09:19:37.720-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-03-31T09:19:37.720-07:00</app:edited><title>03-31-13 - Endian-Independent Structs</title><content type="html">I dunno, maybe this is common practice, but I've never seen it before.

&lt;P&gt;

The easy way to load many file formats (I'll use a BMP here to be concrete) is just to point a
struct at it :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

struct BITMAPFILEHEADER
{
    U16 bfType; 
    U32 bfSize; 
    U16 bfReserved1; 
    U16 bfReserved2; 
    U32 bfOffBits; 
} __attribute__ ((__packed__));


BITMAPFILEHEADER * bmfh = (BITMAPFILEHEADER *)data;

if ( bmfh-&gt;bfType != 0x4D42 )
    ERROR_RETURN("not a BM",0);

etc..

&lt;/PRE&gt;&lt;/FONT&gt;

but of course this doesn't work cross platform.

&lt;P&gt;

So people do all kinds of convoluted things (which I have usually done), like changing to a
method like :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

U16 bfType = Get16LE(&amp;ptr);
U32 bfSize = Get32LE(&amp;ptr);

&lt;/PRE&gt;&lt;/FONT&gt;

or they'll do some crazy struct-parse fixup thing which I've always found to be bananas.

&lt;P&gt;

But there's a super trivial and convenient solution :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

struct BITMAPFILEHEADER
{
    U16LE bfType; 
    U32LE bfSize; 
    U16LE bfReserved1; 
    U16LE bfReserved2; 
    U32LE bfOffBits; 
} __attribute__ ((__packed__));

&lt;/PRE&gt;&lt;/FONT&gt;

where U16LE is just U16 on little-endian platforms and is a class that does bswap on itself on big-endian
platforms.

&lt;P&gt;

Then you can still just use the old struct-pointing method and everything just works.  Duh, I can't believe
I didn't think of this earlier.

&lt;P&gt;

Similarly, here's a WAV header :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

struct WAV_header_LE
{
    U32LE FOURCC_RIFF; // RIFF Header 
    U32LE riffChunkSize; // RIFF Chunk Size 
    U32LE FOURCC_WAVE; // WAVE Header 
    U32LE FOURCC_FMT; // FMT header 
    U32LE fmtChunkSize; // Size of the fmt chunk 
    U16LE audioFormat; // Audio format 1=PCM,6=mulaw,7=alaw, 257=IBM Mu-Law, 258=IBM A-Law, 259=ADPCM 
    U16LE numChan; // Number of channels 1=Mono 2=Sterio 
    U32LE samplesPerSec; // Sampling Frequency in Hz 
    U32LE bytesPerSec; // bytes per second 
    U16LE blockAlign; // normall NumChan* bytes per sample
    U16LE bitsPerSample; // Number of bits per sample 
}  __attribute__ ((__packed__));;

&lt;/PRE&gt;&lt;/FONT&gt;

easy.

&lt;P&gt;

For file-input type structs, you just do this and there's no penalty.  For structs you keep in memory
you wouldn't want to eat the bswap all the time, but even in that case this provides a simple way to get
the swizzle into native structs by just copying all the members over.

&lt;P&gt;

Of course if you have the Reflection-Visitor system that I'm fond of, that's also a good way to go.
(cursed C, give me a "do this macro on all members").

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/bABGMOx3A4U" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/7944659493900989438/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=7944659493900989438" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/7944659493900989438?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/7944659493900989438?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/03/03-31-13-endian-independent-structs.html" title="03-31-13 - Endian-Independent Structs" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>2</thr:total></entry><entry gd:etag="W/&quot;C0YFQn08eip7ImA9WhBXFks.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-709837147673770195</id><published>2013-03-30T09:21:00.001-07:00</published><updated>2013-03-30T09:31:53.372-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-03-30T09:31:53.372-07:00</app:edited><title>03-30-13 - Error codes</title><content type="html">Some random rambling on the topic of returning error codes.

&lt;P&gt;

Recently I've been fixing up a bunch of code that does things like

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;
void  MutexLock( Mutex * m )
{
    if ( ! m ) return;
    ...

&lt;/PRE&gt;&lt;/FONT&gt;

yikes.  Invalid argument and you just silently do nothing.  No thank you.  

&lt;P&gt;

We should all know that silently nopping in failure cases is pretty horrible.
But I'm also
dealing with a lot of error code returns, and it occurs to me that returning an error code in
that situation is not much better.

&lt;P&gt;

Personally I want unexpected or unhandleable errors to just blow up my app.  In my own code I would
just assert; unfortunately that's not viable in OS code or perhaps even in a library.

&lt;P&gt;

The classic example is malloc.  I hate mallocs that return null.  If I run out of memory, there's
no way I'm handling it cleanly and reducing my footprint and carrying on.  Just blow up my app.  Personally
whenever I implement an allocator if it can't get memory from the OS it just prints a message and exits (*).

&lt;P&gt;

(* = aside : even better is "functions that don't fail" which I might write more about later;
basically the idea is the function tries to handle the failure case itself and never returns it out to
the larger app.  So in the case of malloc it might print a message like "tried to alloc N bytes; (a)bort/(r)etry/return (n)ull?".
Another common case is when you try to open a file for write and it fails for whatever reason, it should just
handle that at the low level and say "couldn't open X for write; (a)bort/(r)etry/change (n)ame?" )

&lt;P&gt;

I think error code returns are okay for *expected* and *recoverable* errors.

&lt;P&gt;

On functions that you realistically expect to always succeed and will not check error codes for, they
shouldn't return error codes at all.  I wrote recently about 
&lt;A HREF="http://cbloomrants.blogspot.com/2013/03/03-16-13-writing-portable-code-rambles.html"&gt; wrapping system APIs for portable code &lt;/A&gt;
; an example of the style of level 2 wrapping that I like is to "fix" the error returns.

&lt;P&gt;

(obviously this is not something the OS should do, they just have to return every error; it requires app-specific knowledge about
what kind of errors your app can encounter and successfully recover from and continue, vs. ones that just mean you have a catastrophic
unexpected bug)

&lt;P&gt;

For example, functions like lock &amp; unlock a mutex shouldn't fail (in my code).  99% of the user code in the world that locks and
unlocks mutexes doesn't check the return value, they just call lock and then proceed assuming the lock succeeded - so don't return it :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

void mypthread_mutex_lock(mypthread_mutex_t *mutex)
{
    int ret = pthread_mutex_lock(mutex);
    if ( ret != 0 )
        CB_FAIL("pthread_mutex_lock",ret);
}

&lt;/PRE&gt;&lt;/FONT&gt;

When you get a crazy unexpected error like that, the app should just blow up right at the call site (rather
than silently failing and then blowing up somewhere weird later on because the mutex wasn't actually locked).

&lt;P&gt;

In other cases there are a mix of expected failures and unexpected ones, and the level-2 wrapper should differentiate
between them :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

bool mysem_trywait(mysem * sem)
{
    for(;;)
    {
        int res = sem_trywait( sem );
        if ( res == 0 ) return true; // got it

        int err = errno;
        if ( err == EINTR )
        {
            // UNIX is such balls
            continue;
        }
        else if ( err == EAGAIN )
        {
            // expected failure, no count in sem to dec :
            return false;
        }
        else
        {
            // crazy failure; blow up :
            CB_FAIL("sem_trywait",err);
        }
    }
}

&lt;/PRE&gt;&lt;/FONT&gt;

(BTW best practice these days is always to copy "errno" out to an int, because errno may actually be
#defined to a function call in the multithreaded world)

&lt;P&gt;

And since I just stumbled into it by accident, I may as well talk about EINTR.  Now I understand that
there may be legitimate reasons why you *want* an OS API that's interrupted by signals - we're going to ignore
that, because that's not what the EINTR debate is about.  So for purposes of discussion pretend that you
never have a use case where you want EINTR and it's just a question of whether the API should put that
trouble on the user or not.

&lt;P&gt;

I ranted about EINTR at RAD a while ago and was informed (reminded) this was an
&lt;A HREF="http://www.jwz.org/doc/worse-is-better.html"&gt; ancient argument &lt;/A&gt; that I was
on the wrong side of.

&lt;P&gt;

Mmm.  One thing certainly is true : if you want to write an operating system (or any piece of software)
such that it is easy to
port to lots of platforms and maintain for a long time, then it should be absolutely as
simple as possible (meaning simple to implement, not simple in the API or simple to use),
even at the cost of "rightness" and pain to the user.  That I certainly
agree with; UNIX has succeeded at being easy to port (and also succeeded at being a pain to
the user).

&lt;P&gt;

But most people who argue on the pro-EINTR side of the argument are just wrong; they are confused
about what the advantage of the pro-EINTR argument is  (for example
&lt;A HREF="http://www.codinghorror.com/blog/2004/08/worse-is-better.html"&gt; Jeff Atwood takes off on a general rant
against complexity &lt;/A&gt; ; I think we all should know by now that huge complex APIs are bad; that's not interesting,
and that's not what
"Worse is Better" is about; or Jeff's example of INI files vs the registry - INI files are just massively better in
every way, it's not related at all, there's no pro-con there).

&lt;P&gt;

(to be clear and simple : the pro-EINTR argument is entirely about simplicity of implementation and porting of the API; it's about requiring
the minimum from the system)

&lt;P&gt;

The EINTR-returning API is not simpler (than one that doesn't force you to loop).  Consider an API like this :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

U64 system( U64 code );

doc :

if the top 32 bits of code are 77 this is a file open and the bottom 32 bits specify a device; the
return values then are 0 = call the same function again with the first 8 chars of the file name ...
if it returns 7 then you must sleep at least 1 milli and then call again with code = 44 ...
etc.. docs for 100 pages ...

&lt;/PRE&gt;&lt;/FONT&gt;

what you should now realize is that *the docs are part of the API*.  (that is not a "simple" API)

&lt;P&gt;

An API that requires you to carefully read about the weird special cases and understand what is going
on inside the system is NOT a simple API.  It might look simple, but it's in disguise.  A simple API
does what you expect it to.  You should be able to just look at the function signature and guess what
it does and be right 99% of the time.

&lt;P&gt;

Aside from the issue of simplicity, any API that requires you to write the exact same boiler-plate every time
you use it is just a broken fucking API.

&lt;P&gt;

Also, I strongly believe that any API which returns error codes should be usable if you don't check the error code
at all.  Yeah yeah in real production code of course you check the error code, but for little test apps you
should be able to do :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

int fd = open("blah");

read(fd,buf);

close(fd);

&lt;/PRE&gt;&lt;/FONT&gt;

and that should work okay in my hack test app.  Nope, not in UNIX it doesn't.  Thanks to its wonderful "simplicity"
you have to call "read" in a loop because it might decide to return before the whole read is done.

&lt;P&gt;

Another example that occurs to me is the reuse of keywords and syntax in C.  Things like making
"static" mean something completely different depending on how you use it makes the number of special keywords
smaller.  But I believe it actually makes the "API" of the language much *more* complex.
Instead of having
intuitive and obvious separate clear keywords for each meaning that you could perhaps figure out just by looking at them,
you instead have to read a bunch of docs and have very technical knowledge of the internals of what the keywords mean in each
usage.
(there are legitimate advantages to minimizing the number of keywords, of course, like leaving as many names available to users as possible).
Knowledge required to use an API is part of the API.  Simplicity is determined by the amount of knowledge
required to do things correctly.

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/8K59qG_Wvj4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/709837147673770195/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=709837147673770195" title="23 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/709837147673770195?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/709837147673770195?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/03/03-30-13-error-codes.html" title="03-30-13 - Error codes" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>23</thr:total></entry><entry gd:etag="W/&quot;C0cESHY6fyp7ImA9WhBXE0Q.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-2625132043148425648</id><published>2013-03-26T00:00:00.001-07:00</published><updated>2013-03-27T06:30:09.817-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-03-27T06:30:09.817-07:00</app:edited><title>03-26-13 - Simulating Deep Yield with a Wait</title><content type="html">I'm becoming increasingly annoyed at my lack of "deep yield" for coroutines.

&lt;P&gt;

Any time you are in a work item, if you decide that you can get some more parallelism by
doing a branch-merge inside that item, you need deep yield.

&lt;P&gt;

Remember you should never ever do an OS wait on a coroutine thread (with normal threads anyway;
on a WinRT threadpool thread you can).
The reason is the OS wait disables that worker thread, so you have one less.  In the worst
case, it leads to deadlock, because all your worker threads can be asleep waiting on work items,
and with no worker threads they will never get done.

&lt;P&gt;

Anyway, I've cooked up a temporary work-around, it looks like this :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

I'm in some function and I want to branch-merge

If I'm not on on a worker thread
  -&gt; just do a normal branch-merge, send the work off and use a Wait for completion

If I am on a worker thread :

inc target worker thread count
if # currently live worker threads is &lt; target count
  start a new worker thread (either create or wake from pool)

now do the branch-merge and use OS Wait
dec the target worker thread count


on each worker thread, after completing a work item and before popping more work :
if target worker thread count &lt; currently live count
  stop self (go back into a sleeping state in the pool)

&lt;/PRE&gt;&lt;/FONT&gt;

this is basically using OS threads to implement stack-saving deep yield.  It's not awesome,
but it is okay if deep yield is rare.

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/t-W2tf3TMeU" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/2625132043148425648/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=2625132043148425648" title="8 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/2625132043148425648?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/2625132043148425648?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/03/03-26-13-simulating-deep-yield-with-wait.html" title="03-26-13 - Simulating Deep Yield with a Wait" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>8</thr:total></entry><entry gd:etag="W/&quot;C0cESXc5fyp7ImA9WhBXE0Q.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-4263810722287813780</id><published>2013-03-26T00:00:00.000-07:00</published><updated>2013-03-27T06:30:08.927-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-03-27T06:30:08.927-07:00</app:edited><title>03-26-13 - Oodle 1.1 and GDC</title><content type="html">Hey it's GDC time again, so if you're here come on by the RAD booth and
say "hi" (or "fuck you", or whatever).

&lt;P&gt;

The &lt;A HREF="http://www.radgametools.com/oodle.htm"&gt; Oodle web site &lt;/A&gt; just went live a few
days ago.

&lt;P&gt;

Sometimes I feel embarassed (ashamed? humiliated?) that it's taken me five years to write a file IO
and data compression library.  Other times I think I've basically written an entire OS by myself
(and all the docs, and marketing materials, and a video compressor, and aborted paging engine, and
a bunch of other crap) and that doesn't sound so bad.  I suppose the truth is somewhere in the middle.
(perhaps with Oodle finally being officially released and selling, I might write a
little post-mortem about how it's gone, try to honestly look back at it a bit.  (because lord knows
what I need is more introspection in my life)).

&lt;P&gt;

Oodle 1.1 will be out any day now.  Main new features :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

Lots more platforms.  Almost everything except mobile platforms now.

LZNIB!  I think LZNIB is pretty great.  8X faster to decode than ZLIB and usually
makes smaller files.

Other junk :
All the compressors can run parallel encode &amp; decode now.
Long-range-matcher for LZ matching on huge files (still only in-memory though).
Incremental compressors for online transmission, and faster resets.

&lt;/PRE&gt;&lt;/FONT&gt;

Personally I'm excited the core architecture is finally settling down, and we have a more focused
direction to go forward, which is mainly the compressors.  I hope to be able to work on some new
compressors for 1.2 (like a very-high-compression option, which I currently don't have), and then
eventually move on to some image compression stuff.

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/t9Wp4L_68fA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/4263810722287813780/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=4263810722287813780" title="8 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/4263810722287813780?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/4263810722287813780?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/03/03-26-13-oodle-11-and-gdc.html" title="03-26-13 - Oodle 1.1 and GDC" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>8</thr:total></entry><entry gd:etag="W/&quot;C0cESXwzfCp7ImA9WhBXE0Q.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-8512267573653397308</id><published>2013-03-23T00:00:00.000-07:00</published><updated>2013-03-27T06:30:08.284-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-03-27T06:30:08.284-07:00</app:edited><title>03-23-13 - AI Rambles</title><content type="html">With GDC coming up I've been thinking generally about the state of game technology in general.
First a bit of a rant on that.

&lt;P&gt;

I am so fucking bored of graphics.  Graphics are not the damn problem.  I'm completely appalled by
the derivative repetitive boring games you all keep making.  I don't want to play "Shoot People
in the Face 227" or "Space Marines 154" or "Slide Blocks to Make them Go Bling N" or "Cute Creatures
Jump Around on Blocks N".  Barf, boring.  And making them all
shiny with new graphics is just gilding the turd.  Stop working on graphics.

&lt;P&gt;

Games have huge tech problems that nobody seems to want to work on.  One that I have wanted to work on
for a long time is animation.  And by "animation" I don't really mean playing back clips, which fundamentally
looks like garbage, but making characters move naturally, able to transition movements the way their body
should, respond to surface variations and so on.  Game animation just looks so awful, and it's becoming more
uncanny as the graphics get better.

&lt;P&gt;

(in fact if we were smart we would have done it the other way around.  Every cartoonist for a hundred
years has known that it's actually ok for the visuals to look unrealistic if the animation and sound are really
good.  Human perception cares more about motion than the static appearance of things.)

&lt;P&gt;

Anyhoo, the other big one is AI.  And by "AI" I don't mean playing scripts, or moving to designer-placed cover
spots.  Even some of the more sophisticated game AI systems are really just fancy whack-a-mole.  You
can see the AI's run to one spot, do a pre-programmed routine, run to another spot, pop out of cover so the
player can shoot me, pop back in cover.  Now, certainly there are
merits to whack-a-mole AI.  If you're making a platformer you don't want the enemy to do surprising things, you
just want them to walk back and forth on a set pattern that the player can pick up easily.  They're not really AI
at all, they're rigid bodies with an animal painted on them.

&lt;P&gt;

These AI's never surprise you, they never make you laugh, they never make you want to play again because they
might do something new.  They feed off your energy and don't give anything back, like a bad conversation partner.

&lt;P&gt;

So it made me realize that game AIs are actually more interesting when the game is very simple.  It might naively
seem like a big complex sandbox 3d world has got a more complex AI, but really that complex world means that the AI
no longer understands what it's doing.  Your only hope is to give it simple rules to follow about what it can do in
that world.

&lt;P&gt;

In contrast, AI for simple game systems (chess, checkers, backgammon, poker) can do amazing things that the human
programmer never anticipated.  There's a funny thing that happens with computer algorithms where a cold rational
scientific brute-force search of a mathematical problem space actually leads to behavior that's more human than
the shitty heuristic decision-tree type programming that's explicitly trying to simulate human behavior.

&lt;P&gt;

For example, when I was writing poker AI, I was really amazed at the "creative" plays that a simulation-based bot
makes.  (for review : a standard UAlberta-style poker bot works by building a model of the opponent based on observation
of previous action; it then simulates all the possibilities for future cards and imagines what the opponent will do in
each situation; it sums the EV over all paths for each of its own actions, and chooses the action that maximizes EV).

&lt;P&gt;

At the simplest level, it figures out things like check-raising when you tend to bet checked flops too much.  But it
did even weirder things.  For example the bot would very quickly become hyper-aggressive against an opponent that folds
even slightly too much; it adjusted faster and way more severely than any human.  I would play against it sometimes
with our cards face up so that I could make sure it was doing sane things, and I would see it make a huge check-raise
bluff on the river with junk.  My first thought is "I have a bug" and I'd go looking into the stats of the model,
and found that there was no bug, it's just that the AI had learned that I thought a big river raise meant strength, so I
was folding to them a lot, and therefore the simulation will jam almost every hand.

&lt;P&gt;

This type of poker AI is not the game theoretic equilibrium solution.  It's assuming that the opponent plays
by some scheme which may not be optimal, and that its own strategy is not face up.  That can lead it to make
mistakes.  One I've long been aware of is that it doesn't hedge correctly.  Normal humans hedge all the time
in their poker play, perhaps too much; you will often suspect that someone is bluffing a huge percent of the
time, but you aren't sure.  A non-hedging AI would immediately start making very light call-downs, but a
cautious human will weight in some factor for the model being wrong and play with a blended strategy that's
not disastrous if the model is wrong (like only doing the light call-down in small pots, or waiting for a
call-down with a hand that has some chance of being best even if the model is wrong).

&lt;P&gt;

Continuing the random rambling train of thought, I just realized (re-realized?) that one of the flaws with this style
of poker AI is that it doesn't anticipate the reaction to its moves.  Of course it does anticipate the
reaction just in terms of "if I bet, what hands will he call with or raise with", but it is evaluating
based on the *past* model of the opponent.  After you make your bet, the opponent sees it and adjusts their
view of you, so you need to be anticipating how their play style changes.  For example in the case I
mentioned above - when someone is playing pretty weak/tight the bot rapidly becomes hyper-aggressive, which
is mostly good, but the bot never gets the idea that "hey he can see I'm raising every single street of
every hand, he's going to adjust and call me down more".

&lt;P&gt;

Anyway, bringing it back to games, it occurred to me that it would be interesting to try some really simple
2d games, and give them a mathematical solving AI, instead of the usual heuristic crap we do.  Like,
let's face facts - we can't actually make games in these big free form 3d worlds, it's too complex.  Our
ability to do the graphics has gotten way beyond every other aspect.  We need to back up and go to like
Ultima-style 2d tile-based games.  Now you have a space where the AI can just explore future actions,
and things like advancing on the player by moving from cover to cover just pops out of the behavior
automatically because it maximizes EV, not because it was explicitly coded.

&lt;P&gt;

(I'm not contending that this is the "right way" to make games or that it will necessarily make good games,
I just thought it was interesting)

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/bpbhFnVSB4c" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/8512267573653397308/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=8512267573653397308" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/8512267573653397308?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/8512267573653397308?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/03/03-23-13-ai-rambles.html" title="03-23-13 - AI Rambles" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>2</thr:total></entry><entry gd:etag="W/&quot;C0cERns5cSp7ImA9WhBXE0Q.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-7685139167430536241</id><published>2013-03-19T07:48:00.000-07:00</published><updated>2013-03-27T06:30:07.529-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-03-27T06:30:07.529-07:00</app:edited><title>03-19-13 - Windows Sleep Variation</title><content type="html">Hmm, I've discovered that Sleep(n) behaves very differently on my three Windows boxes.

&lt;P&gt;

(Also remember &lt;A HREF="http://cbloomrants.blogspot.com/2009/03/03-02-09-sleep-sucks-and-vsync-woes.html"&gt;
there are a lot of other issues with Sleep(n) &lt;/A&gt; ; the times are only reliable here because this is in a no-op
test app)

&lt;P&gt;

This actually started because I was looking into Linux thread sleep timing, so I wrote a little test
to just Sleep(n) a bunch of times and measure the observed duration of the sleep.

&lt;P&gt;

(Of course on Windows I do timeBeginPeriod(1) and bump my thread to very high priority
(and timeGetDevCaps says the minp is 1)).

&lt;P&gt;

Anyway, what I'm seeing is this :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

Win7 :
sleep(1) : average = 0.999 , sdev = 0.035 ,min = 0.175 , max = 1.568
sleep(2) : average = 2.000 , sdev = 0.041 ,min = 1.344 , max = 2.660
sleep(3) : average = 3.000 , sdev = 0.040 ,min = 2.200 , max = 3.774

Sleep(n) averages n
duration in [n-1,n+1]

WinXP :
sleep(1) : average = 1.952 , sdev = 0.001 ,min = 1.902 , max = 1.966
sleep(2) : average = 2.929 , sdev = 0.004 ,min = 2.665 , max = 2.961
sleep(3) : average = 3.905 , sdev = 0.004 ,min = 3.640 , max = 3.927

Sleep(n) averages (n+1)
duration very close to (n+1) every time (tiny sdev)

Win8 :
sleep(1) : average = 2.002 , sdev = 0.111 ,min = 1.015 , max = 2.101
sleep(2) : average = 2.703 , sdev = 0.439 ,min = 2.017 , max = 3.085
sleep(3) : average = 3.630 , sdev = 0.452 ,min = 3.003 , max = 4.130

average no good
Sleep(n) minimum very precisely n
duration in [n,n+1] (+ a little error)
rather larger sdev

&lt;/PRE&gt;&lt;/FONT&gt;

it's like completely different logic on each of my 3 machines.  XP is the most precise,
but it's sleeping for (n+1) millis instead of (n) !  Win8 has a very precise min of n, but
the average and max is quite sloppy (sdev of almost half a milli, very high variation even
with nothing happening on the system).  Win7 hits the average really nicely but has a large
range, and is the only one that will go well below the requested duration.

&lt;P&gt;

As noted before, I had a look at this because I'm running Linux in a VM and seeing very poor
performance from my threading code under Linux-VM.  So I ran this experiment :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

Sleep(1) on Linux :

native : average = 1.094 , sdev = 0.015 , min = 1.054 , max = 1.224
in VM  : average = 3.270 , sdev =14.748 , min = 1.058 , max = 656.297

(added)
in VM2 : average = 1.308 , sdev = 2.757 , min = 1.052 , max = 154.025

&lt;/PRE&gt;&lt;/FONT&gt;

obviously being inside a VM on Windows is not being very kind to Linux's threading system.
On the native box, Linux's sleep time is way more reliable than Windows (small min-max range)
(and this is just with default priority threads and SCHED_OTHER, not even using a high priority
trick like with the Windows tests above).

&lt;P&gt;

added "in VM2".  So the VM threading seems to be much better if you let it see many fewer cores
than you have.  I'm running on a 4 core (8 hypercore) machine; the base "in VM" numbers
are with the VM set to see 4 cores.  "in VM2" is with the VM set to 2 cores.  Still a really bad max
in there, but much better overall.

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/GZN2wAZrwLA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/7685139167430536241/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=7685139167430536241" title="10 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/7685139167430536241?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/7685139167430536241?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/03/03-19-13-windows-sleep-variation.html" title="03-19-13 - Windows Sleep Variation" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>10</thr:total></entry><entry gd:etag="W/&quot;C0YFQ3s6fSp7ImA9WhBXFks.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-5824464718308299797</id><published>2013-03-16T00:00:00.000-07:00</published><updated>2013-03-30T09:31:52.515-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-03-30T09:31:52.515-07:00</app:edited><title>03-16-13 - Writing Portable Code Rambles</title><content type="html">Some thoughts after spending some time on this (still a newbie).
How I would do it differently if I started from scratch.

&lt;P&gt;

&lt;B&gt;1.&lt;/B&gt; Obviously you all know the best practice of using your own data types (S32 or whatever) and making macros
for any kind of common operation that the standards don't handle well (like use a SEL macro instead of ?: ,
make a macro for ROT, etc).  Never use bit-fields, make your own macros for manipulating bits within words.
You also have to make your own whole macro meta-language for things not quite
in the language, like data alignment, restrict/alias, etc. etc.  (god damn C standard people, spend some time
on the actual problems that real coders face every day.  Thanks mkay).  That's background and it's the way
to go.

&lt;P&gt;

Make your own defines for SIZEOF_POINTER since stupid C doesn't give you any way to check sizeof() in a macro.
You probably also want SIZEOF_REGISTER.  You need your own equivalent of ptrdiff_t and intptr_t.  Best practice
is to use pointer-sized ints for all indexing of arrays and buffer sizes.

&lt;P&gt;

(one annoying complication is that there are platforms with 64 bit pointers on which 64-bit int math is very
slow; for example they might not have a 64-bit multiply at all and have to emulate it.  In that case you will
want to use 32-bit ints for array access when possible; bleh)

&lt;P&gt;

Avoid using "wchar_t" because it is not always the same size.  Try to explicitly use UTF16 or UTF32 in your code.
You could make your own SIZEOF_WCHAR and select one or the other on the appropriate platform.  (really try to
avoid using wchar at all; just use U16 or U32 and do your own UTF encoding).

&lt;P&gt;

One thing I would add to the macro meta-language next time is to wrap every single function (and class) in my
code.  That is, instead of :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

int myfunc( int args );

do

FUNC1 int FUNC2 myfunc(int args );

or even better :

FUNC( int , myfunc , (int args) );

&lt;/PRE&gt;&lt;/FONT&gt;

this gives you lots of power to add attributes and other munging as may be needed later on some platforms.
If I was doing this again I would use the last style, and I would have two of them, a FUNC_PUBLIC and FUNC_PRIVATE
to control linkage.  Probably should have separate wrapper macros for the proto and the body.

&lt;P&gt;

While you're at it you may as well have a preamble in every func too :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

FUNC_PUBLIC_BODY( int , myfunc , (int args) )
{
    FUNC_PUBLIC_PRE

    ...
}

&lt;/PRE&gt;&lt;/FONT&gt;

which lets you add automatic func tracing, profiling, logging, and so on.

&lt;P&gt;

I wish I had made several different layers of platform Id #defines.  The first one you want is the lowest level, which
explicitly Id's the current platform.  These should be exclusive (no overlaps), something like OODLE_PLATFORM_X86X64_WIN32
or OODLE_PLATFORM_PS3_PPU.

&lt;P&gt;

Then I'd like another layer that's platform *groups*.  For me the groups would probably be OODLE_PLATFORM_GROUP_PC , GROUP_CONSOLE,
and GROUP_EMBEDDED.  Those let you make gross characterizations like on "GROUP_PC" you use more memory and have more debug systems
and such.  With these mutually exclusive platform checks, you should never use an #else.  That is, don't do :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;
#if OODLE_PLATFORM_X86X64_WIN32
.. some code ..
#else
.. fallback ..
#endif
&lt;/PRE&gt;&lt;/FONT&gt;

it's much better to explicitly enumerate which platforms you want to go to which code block, and then have an

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;
#else
#error new platform
#endif
&lt;/PRE&gt;&lt;/FONT&gt;

at the end of every check.  That way when you try building on new platforms that you haven't thought carefully about yet, you get
nice compiler notification about all the places where you need to think "should it use this code path or should I write a new one".
Fallbacks are evil!  I hate fallbacks, give me errors.

&lt;P&gt;

Aside from the explicit platforms and groups I would have platform flags or caps which are non-mutually exclusive.  Things like
PLATFORM_FLAG_STDIN_CONSOLE.

&lt;P&gt;

While you want the raw platform checks, in end code I wish I had avoided using them explicitly, and instead
converted them into logical queries about the platform.  What I mean is, when you just have an "#if some platform"
in the code, it doesn't make it clear why you care that's the platform, and it doesn't make it reusable.
For example I have things like :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;
#if PLATFORM_X86X64
// .. do string matching by U64 and xor/cntlz
#else
// unaligned U64 read may be slow
// do string match byte by byte
#endif
&lt;/PRE&gt;&lt;/FONT&gt;

what I should have done is to introduce an abstraction layer in the #if that makes it clear what I am checking
for, like :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

#if PLATFORM_X86X64
#define PLATFORM_SWITCH_DO_STRING_MATCH_BIGWORDS 1
#elif PLATFORM_PS3
#define PLATFORM_SWITCH_DO_STRING_MATCH_BIGWORDS 0
#else
#error classify me
#endif

#if PLATFORM_SWITCH_DO_STRING_MATCH_BIGWORDS
// .. do string matching by U64 and xor/cntlz
#else
// unaligned U64 read may be slow
// do string match byte by byte
#endif

&lt;/PRE&gt;&lt;/FONT&gt;

then it's really clear what you want to know and how to classify new platforms.  It also lets you reuse
that toggle in lots of places without code duping the fiddly bit, which is the platform classification.

&lt;P&gt;

Note that when doing this, it's best to make high level usage-specific switches.  You might be tempted to
try to use platform attributes there.  Like instead of "PLATFORM_SWITCH_DO_STRING_MATCH_BIGWORDS" you
might want to use "PLATFORM_SWITCH_UNALIGNED_READ_PENALTY" .  But that's not actually what you want to know,
you want to know if on my particular application (LZ string match) it's better to use big words or not, and
that might not match the low level attribute of the CPU.

&lt;P&gt;

It's really tempting to skip all this and abuse the switches you can see (lord knows I do it); I see (and write)
lots of code that does evil things like using "#ifdef _MSC_VER" to mean something totally different like "is this x86 or x64" ?
Of course that screws you when you move to another x86 platform and you aren't detecting it correctly (or when you use MSVC to
make PPC or ARM compiles).

&lt;P&gt;

Okay, that's all pretty standard, now for the new bit :

&lt;P&gt;

&lt;B&gt;2.&lt;/B&gt; I would opaque out the system APIs in two levels.  I haven't actually ever done this, so grains of salt, but I'm
pretty convinced it's the right way to go after working with a more standard system.

&lt;P&gt;

(for the record : the standard way is to make a set of wrappers that tries to behave the same on all systems, eg. that
tries to hide what system you are on as much as possible.  Then if you need to do platform-specific stuff you would just
include the platform system headers and talk to them directly.  That's what I'm saying is not good.)

&lt;P&gt;

In the proposed alternative, the first level would just be a wrapper on the system APIs with minimal or no behavior change.  That is,
it's just passing them through and standardizing naming and behavior.

&lt;P&gt;

At this level you are doing a few things :

&lt;P&gt;

2.A. Hiding the system includes from the rest of your app.  System includes are often in different places,
and often turn on compiler flags in nasty ways.  You want to remove that variation from the rest of your code
so that your main codebase only sees your own wrapper header.

&lt;P&gt;

2.B. Standardizing naming.  For example the MSVC POSIX funcs are all named wrong; at this level you can
patch that all up.

&lt;P&gt;

2.C. Fixing things that are slightly different or don't work on various platforms where they really
should be the same.  For example things like
pthreads are not actually all the same on all the pthreads platforms, and that can catch you out in nasty ways.
(eg. things like sem_init always failing on Mac).

&lt;P&gt;

Note this is *not* trying to make non-POSIX platforms look like POSIX.  It's not hiding the system you're on,
just wrapping it in a standard way.

&lt;P&gt;

2.D. I would also go ahead and add my own asserts for args and returns in this layer, because I hate functions
that just return error codes when there's a catastrophic failure like a null arg or an EHEAPCORRUPT or whatever.

&lt;P&gt;

So once you have this wrapper you no longer call any system funcs directly from your main codebase, but you still would
be doing things like :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

#if PLATFORM_WIN32

    HANDLE h = platform_CreateFile( ... )

#elif PLATFORM_POSIX

    int fd = platform_open( name , flags )

#else
    #error unknown platform
#endif

&lt;/PRE&gt;&lt;/FONT&gt;

that is, you're not hiding what platform you're on, you're still letting the larger codebase get to the low level calls,
it's just the mess of how fucked they are that's hidden a bit.

&lt;P&gt;

&lt;B&gt; 3. &lt;/B&gt; You then have a second level of wrapping which tries to make same-action interfaces that dont require you to know
what platform you're on.  Second level is written on the first level.

&lt;P&gt;

The second level wrappers should be as high level as necessary to opaque out the operation.  For example rather than having
"make temp file name" and "open file" you might have "open file with temp name", because on some platforms that can be more
efficient when you know it is a high-level combined op.  You don't just have "GetTime" you have "GetTimeMonotonic" , because
on some platforms they have an efficient monotonic clock for you, and on other platforms/hardwares you may have to do a lot
of complicated work to ensure a reliable clock (that you don't want to do in the low level timer).

&lt;P&gt;

When a platform can't provide a high-level function efficiently, rather than emulate it in a complex way I'd rather just not
have it - not a stub that fails, but no definition at all.  That way I get a compile error and in those spots I can do something
different, using the level 1 APIs.

&lt;P&gt;

The first level wrappers are very independent of the large code base's usage, but the second level wrappers are very much specifically
designed for their usage.

&lt;P&gt;

To be clear about the problem of making platform-hiding second layer wrappers, consider something like OpenFile().  What are the args
to that?  What can it do?  It's hopeless to make something that works on all platforms without greatly reducing the capabilities of
some platforms.  And the meaning of various options (like async, temporary, buffered, etc.) all changes with platform.

&lt;P&gt;

If you wanted to really make a general purpose multi-platform OpenFile you would have to use some kind of "caps" query system, where
you first do something like OpenFile_QueryCaps( OF_DOES_UNBUFFERED_MEAN_ALIGNMENT_IS_REQUIRED ) and it would be an ugly disaster.
(and it's retarded on the face of it, because really what you're doing there is saying "is this win32" ?).  The alternative to the
crazy caps system is to just make the high level wrappers very limited and specific to your usage.  So you could make a platform-agnostic
wrapper like OpenFile_ForReadShared_StandardFlagsAndPermissions().  Then the platforms can all do slightly different things and satisfy
the high level goal of the imperative in the best way for that platform.

&lt;P&gt;

A good second level has as few functions as possible, and they are as high level as possible.  Making them very high level allows you to
do different compound ops on the platform in a way that's hidden from the larger codebase.

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/cYHaV3qGmSg" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/5824464718308299797/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=5824464718308299797" title="5 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/5824464718308299797?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/5824464718308299797?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/03/03-16-13-writing-portable-code-rambles.html" title="03-16-13 - Writing Portable Code Rambles" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>5</thr:total></entry><entry gd:etag="W/&quot;D04CSX05fip7ImA9WhBQEEw.&quot;"><id>tag:blogger.com,1999:blog-5246987755651065286.post-1383192408509640356</id><published>2013-03-10T00:00:00.000-08:00</published><updated>2013-03-11T08:32:48.326-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-03-11T08:32:48.326-07:00</app:edited><title>03-10-13 - Two LZ Notes</title><content type="html">Note 1 : on rep matches.

&lt;P&gt;

"Rep matches" are a little weird.  They help a lot, but the reason why they help depends on the file you are compressing.
(rep match = repeat match, gap match, aka "last offset")

&lt;P&gt;

On text files, they work as interrupted matches, or "gap matches".  They let you generate something like :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

stand on the floor
stand in the door

stand in the door
[stand ][i][n the ][d][oor]

[off 19, len 6][1 lit][rep len 6][1 lit][off 18, len 3]

&lt;/PRE&gt;&lt;/FONT&gt;

that is, you have a long match of [stand on the ] but with a gap at the 'o'.

&lt;P&gt;

Now, something I observed was that more than one last offset continues to help.  On text the main benefit from having two last offsets is that it lets
you use a match for the gap.  When the gap is not just one character but a word, you might want to use a match to put that word in, in which case
the continuation after the gap is no longer the first last-offset, it's the second one.  eg.

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;

cope
how to work with animals
how to cope with animals

[how to ][cope][ with animals]
[off 25 ][off 32][off 25 (rep2)]

&lt;/PRE&gt;&lt;/FONT&gt;

&lt;P&gt;

You could imagine alternative coding structures that don't require keeping some number of "last offsets".  (oddly,
the last offset maintenance can be a large part of decode time, because maintaining an MTF list is something that
CPUs do incredibly poorly).  For example you could code with a scheme where you just send the entire long match,
and then any time you send a long match you send a flag for "are there any gaps", and if so you then code some gaps
inside the match.

&lt;P&gt;

The funny thing is, on binary files "last offsets" do something else which can be more important.  They become the most common offsets.  In particular,
on highly structured binary data, they will generally be some factor of the structure size.  eg. on a file that has a struct size of 36, and that struct
has dwords and such in it, the last offsets will generally be things like 4,8,16,36, or 72.  They provide a sort of dictionary of the most common offsets
so that you can code those smaller.  You are still getting the gap-match effect, but the common-offset benefit is much bigger on these files.

&lt;P&gt;

(aside : word-replacing transform on text really helps LZ (and everything) by removing the length variance of tokens.  In particular for LZ77,
word length variance breaks rep matches.  There are lots of common occurances of a single replaced word in a phrase, like :
"I want some stuff" -&gt; "I want the stuff".  You can't get a rep match here of [ stuff] because the offset changed because the substituted
word was different length.  If you do WRT first, then gap matches get these.)

&lt;P&gt;

Note 2 : on offset structure.

&lt;P&gt;

I've had it in the back of my head for quite some time now to do an LZ compressor specifically designed for structured data.

&lt;P&gt;

One idea I had was to use "2d" match offsets.  That is, send a {dx,dy} where dx is within the record and dy is different records.
Like imagine the data is in a table, dy is going back rows, dx is an offset on the row.  You probably want to mod dx around the
row so its range is always the same, and special case dy=0 (matches within your own record).

&lt;P&gt;

It occurred to me that the standard way of sending LZ offsets these days actually already does this.  The normal way that good LZ's send
offsets these days is to break it into low and high parts :

&lt;FONT COLOR=GREEN&gt;&lt;PRE&gt;
low = offset &amp; 7F;
high = offset &gt;&gt; 7;
&lt;/PRE&gt;&lt;/FONT&gt;

or similar, then you send "high" using some kind of "NoSB" scheme (Number of Significant Bits is entropy coded, and the bits themselves
are sent raw), and you send "low" with an order-0 entropy coder.

&lt;P&gt;

But this is just a 2d structured record offset for a particular power-of-2 record size.  It's why when I've experimented with 2d offsets
I haven't seen huge wins - because I'm already doing it.

&lt;P&gt;

There is some win to be had from custom 2d-offsets (vs. the standard low/high bits scheme) when the record size is not a power of two.

&lt;img src="http://feeds.feedburner.com/~r/CbloomRants/~4/5Ky7GuRuPCk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://cbloomrants.blogspot.com/feeds/1383192408509640356/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=5246987755651065286&amp;postID=1383192408509640356" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/1383192408509640356?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5246987755651065286/posts/default/1383192408509640356?v=2" /><link rel="alternate" type="text/html" href="http://cbloomrants.blogspot.com/2013/03/03-10-13-two-lz-notes.html" title="03-10-13 - Two LZ Notes" /><author><name>cbloom</name><uri>http://www.blogger.com/profile/10714564834899413045</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>2</thr:total></entry></feed>
