SMAZ - compression for very small strings

Thoughts?

Another, although more complicated option is

1 Like

hmm - I think adding compression is a great idea (and using a bit in our existing flags header field to say ā€˜this message is compressedā€™). But I think weā€™d need to use a generalized compression library. Because a fair portion of our messages are framing and other data (i.e. positions) are not text (each message is sent as a Protocol Buffer).

Also - as time moves on, there will be other payloads (for API consumers that arenā€™t a text messaging app). So I think plopping in a standard compression lib and just compressing the entire packet we send is probably a) a pretty good win and b) fairly straightforward.

If the framing and other data can be compressed much Iā€™m guessing there is redundant data that should be minimized.

One of the advantages to SMAZ is the decompression dictionary is not included in each payload.

General compression programs can make small payloads larger due to the need of including the dictionary.

There are lots of option for reducing the number of bits for locations as well.

1 Like

Maybe we could adapt this to work on the entire message with a customized Compression Model

https://ed-von-schleck.github.io/shoco/

1 Like

I think also, it might be worth experimenting with something simple like huffman/deflate but with a static dictionary constructed at compile time based on a corpus of a few hundred messages. That dictionary/tree would probably give a fairly optimal encoding of the messages that we actually send. Protobufs do a good job of saving space (for small values) but I bet most of that savings is eaten up by their small amount of framing.

2 Likes

That is much more in line with what I was thinking. :slight_smile:

2 Likes

http/2 has support for compression of header communications. This a well established precedent.
https://httpwg.org/specs/rfc7540.html#HeaderBlock

I am working on this. This is my rudimentary idea:
Fork the meshtastic app, and before any message is sent, compress it and add a special character at the beginning of the message to indicate itā€™s compressed. The only problem is, the meshtastic oled display will display the compressed version since I donā€™t know how to modify the firmware and I donā€™t want to mess with it. I will just decompress it on the app itself.

This is a very simple thing to implement in practice, I will report back with results once I get my hands on 2 meshtastic boards.

1 Like

Nice.

Iā€™m very interested in what kind of results you can achieve. Doing this in the Android app isnā€™t ideal but maybe it will be the initial step to move it along.

I think there is a guide for setting up a dev environment for building the device firmware. From what Iā€™ve seen the code base is really well organized. I donā€™t think you youā€™d need to concern yourself with device specifics, just dig into the message pay load functions.

1 Like

I think Unishox shows more promise than shoco / smaz / etc. And it already runs on the ESP32!

It has these benefits:

  • Higher compression than shoco
  • ā€œUnlike smaz and shoco, we assume no a priori knowledge about the input text. However we rely on a posteriori knowledge about the research carried out on the language and common patterns of sentence formation and come out with pre-assigned codes for each letter.ā€
  • Unicode/UTF-8 compatible. (Will users send emojis / foreign character sets with their app keyboards?)

Unishox (aka Shox96) paper: https://vixra.org/pdf/1908.0403v1.pdf

Arduino implementation:

Hey, thanks for sharing this. Definitely looks interesting.

@geeksville Do the meshtastic boards have a minimum packet size? If we are going to compress strings and find out way later that there is a minimum payload size and padding was used, then it will prove quite futile to implement this :smiley:

Iā€™m not @Geeksville, but I can play him on tv.

Thereā€™s no minimum packet size and no padding. We transmit as little data as possible.

If compression is implemented, it should go lower in the stack than just messages. It can be applied to the entire payload (excluding the header) just before the payload is encrypted.

On compression, the size of the compressed result is compared against the uncompressed and then a decision is made which one to transmit.

If we encrypt the entire payload rather than just the contents of the text message, all future applications will be able to take advantage of this. Heck, even our IP Tunneling will be able to use it.

1 Like

We will be able to reduce airtime as well as power consumption. If we use LongSlow, we will get longer range but a bit faster transmission time.

If compression is implemented, it should go lower in the stack than just messages. It can be applied to the entire payload (excluding the header) just before the payload is encrypted.

One thing to keep in mind (with Unishox anyway). It says the binary compression is worse than no compression at all. So low level, yes. But only at a level where text strings appear. I do like the idea of an option bit specifying whether the text is compressed or legacy uncompressed. (Legacy uncompressed would work well if other file-like things are ever pushed through the mesh.)

I think all packets should be compressed but utilized if the packet ends up being smaller. That increases the opportunity for compression and removes any assumptions we have.

FYI - the targz library is now being used in the device code. We can use that for compression.

What exactly is targz being used for?

Added this last night:

If you have a device that doesnā€™t have the web files or need the web files updated, click a button and itā€™ll be setup in about 10 seconds.

Not totally stable right now, I labeled it as experimental.