I have created a discrete-event simulator in Python to better understand what is going on under the hood of the mesh algorithm. For example, it shows which nodes are flooding a message over time, who receives it and when it leads to collisions.
Furthermore, it can be used to analyze the scalability of the routing protocol, which might be useful to optimize or extend it later on. See the README for an example.
The simulator is based on Meshtastic-device commit e7a825d. I only subtracted the important parts of it, so e.g. each generated message is sent to everyone on the channel as one-to-one messaging is not used now. I hope I did not make any mistakes in mimicking the current algorithm.
Feel free to try it out if you think it is useful/fun and let me know if you have any questions.
Do you suggest any improvements to the routing algorithm as it stands today?
In general, I think it is quite good considering it’s made with low complexity in mind. However, I have seen some flaws while running these simulations and got some ideas on improving it, but these need to be tested.
For example, right now the zero-hop reliability retransmit timer depends on shortPacketMsec, i.e. the airtime of a packet without payload, so only the header. This means that if you send a large packet, it will be on the air for quite some time and you might try a retransmit too early. I think it would be good to make the retransmit timer dependent on the airtime of the packet you just transmitted.
Next to that, the new SNR-based flooding sounds good, but does not have any randomness build in anymore. This means that two nodes that are roughly at the same distance from the original sender will try to flood roughly at the same time, which leads to collisions. At a lower layer it might still need some randomness.
I think you have created a very valuable tool to better understand the mesh algorithm by using simulation.
I have just noticed that you only ran the simulation up to a count of 4 hops, however it is my understanding that the hard coded hop count is 7?
And you have already highlighted some areas of possible improvement.
I think the hard-coded maximum is indeed 7, since there are three bits reserved for it in the header. However, the default setting is 3, to limit flooding.
I could also run the simulations for a hop limit of up till 7. Though, I now ran each simulation of 200s one hundred times, which already took quite some time for a larger number of nodes.
Update: I added the results of up till 7 hops for completeness
@mc-hamster You might want to read this, before we finalise 1.3
If this is true, it won’t stop 1.3.
Please file a defect with full reproduction steps.
So this will wait for a few years too?
EDIT: “You might want to read this, before we finalize 1.3” I am referring to the main post here, talking about routing. I will make a new post about the bug related to delivery confirmation.
You are welcome to make your first pull request if you find a bug.
@kokroo Please be mindful of this kind of snark. We’re all volunteers and this project needs to stay fun.
If you have found a bug, please file it in our issue tracker in GitHub with the steps to reproduce it. If the details are here in the forum, it will be lost. This forum is just-in-time and not built for prioritization or provide a dashboard for a volunteer to step up and address defects/enhancements.
Like @garth said, we will grow together if you submit a pull request for this.
There is no snark. I don’t know in what tone you read my messages.
It was a serious question because the OP spoke about some important points, and since you worked on the new routing algorithm, I thought I should bring it to your attention.
I asked “So this will wait for a few years too?” because you said the changes for 1.3 are locked and the mesh protocol will not be changed for the next few years.
I don’t know why everybody loves bringing out pitchforks so quickly.
Please file a defect with full steps to reproduce the problem.
It read as really snarky to me as well. Honesty not sure why you think it sounded nice or genuine as it does not read that way at all.
I think there’s some confusion. I was simply asking MC Hamster if the suggestions to routing by the original poster of the thread will also need to wait a few years before being included in the protocol, since he told me on discord that the protocol will be “locked” for the next few years and no changes will be allowed that break backwards compatibility.
Very cool tool, working on my Mac now and the updated explanation made it more obvious what is going on. Great work @GUVWAF this will be a huge help debugging and visualizing networks and should help debug and test all the routing changes in 1.3.