Happy New Year everyone! Now that I have a number of meshtastic devices, it became apparent that having a better understanding of various nodes and networks as they scale up is important. Thankfully we have a bunch of great ways to interface with the Meshtastic devices and network. I happen to work for Splunk and I felt like this was a great opportunity to provide a mechanism for this awesome community to quickly and easily begin working with Meshtastic data at scale.
So with that, I’m happy to announce the first release of a Meshtastic app for Splunk. Using this app, you can quickly configure Splunk to collect mesh, node, radio, and message info from any meshtastic device accessible by TCP. Further, I have created a set of dashboards to help you quickly analyze various aspects of that data.
You can find the source, app packages, and step by step instructions here:
And I have created a short video that walks you through the process from a bare metal server to a fully functional environment in about 12 minutes.
Minutes 1-12 is the technical overview and setup. Minutes 12+ is an overview of the data and dashboard content.
I hope someone finds this useful. I think there’s a lot of room for improvement. I’m a novice when it comes to the types of insights one might care to know on these networks. I approached this release looking at some of the more obvious data points but I’m really hoping that the community has suggestions for changes and additional helpful knowledge that can be derived from.
Further, I think there’s a great opportunity to gain insight through analysis of the serial output/debug but I haven’t tried to approach that yet. @geeksville I’d be interested to know if you already run the serial output through any mechanisms for pulling out/searching data other than just grepping. If not, or if there’s something about the debug logs that you would find helpful to know about I would gladly include that in the next release.
Btw, one of the features I’m hoping that the (coming) MQTT gateway is to let users optionally opt into sharing this sort of data (and the default setting will be off)
Are there any repetitive tasks you find yourself doing when you debug via serial? I’m thinking about adding a dashboard that will help analyze that output. Even something like just pulling the commonly requested details from any given serial dump. It would be neat to be able to take a serial output and upload it and get a nice report of it. Metadata, error counts, event timelines… I dunno.
re: repetitive tasks over serial
hmm - not too much. mostly when looking to debug a particular bug I usually only need the last 50ish lines of serial output to get the clues I need.
Though the creation of your tool gives me extra reason to continue the cleanup of the device logging and do the “if there is an api client, send log messages via the debug log protobuf”. Also, finally implementing different log levels on the device would help. I was kinda amazed I couldn’t find an existing ‘provide a java/python style’ logging abstraction for Arduino that is small. So I’ll probably split the thing I made out as a separate platformio library for others eventually.
Oh sweet. This is helpful too. It’s pretty trivial to point Splunk at API endpoints (especially unauthenticated ones) to collect data. If I’m reading that right, each point in the XX_log field is a increment of seconds_per_period, with the total number of elements = periods_to_log.
I will think on this a bit. Since Splunk is data over time, the historical numbers are less important - we’ll already have those through routine collection. So, having a top level field that is something like “tx_log_last” which contains the last full period measurement is easier to work with than having an array of values…
However, it doesn’t really matter and probably not worth changing anything for - because we can also infer this value by looking at tx_log[1]. And… what’s nice is that we can also poll /json/report more frequently than seconds_per_period and examine tx_log[0] to see what the current value is and also track that over time.
So, if we look at the “current value” (tx_log[0]) over time we’d see a sawtooth pattern (since it will roll over to 0 at each interval). and we’d have an actual value (tx_log[1]) that only changes once every “seconds_per_period” no matter how frequently it’s collected.
I think that makes sense. I will start collecting this data as part of v2 and see where it gets us. As for more content I don’t really know what all is possible but when I think of my other IoT use cases I see… Things like wifi drops/reconnects. Security events like # of hits to the api or web urls? If we can track source IPs seen, showing where connections to the API/web are coming from would be awesome. I don’t know if we have temperature/accelerometer sensing available or on for these devices, but knowing the total change in environment over time would be helpful too. ambient temperature, cumulative G’s/altitude changes could indicate theft/removal or other disruption.
Battery/power readings would also be really useful. Then I can use this telemetry to track and report on the health of the power source that the network might be dependent on. Doing things like prediction of battery charge and mean time between charge cycles can help us maintain hardware better.
Of course, exposing this stuff by API is great but only useful when these devices aren’t remotely deployed. I think ideally the node_info messages could be configured to send along more telemetry than they already do. I would guess we’re pretty limited on how much data we try to transmit routinely. But having a configurable node health packet would be pretty sweet. then remote nodes could have more robust telemetry reporting.
Oh, you know what might be useful? Memory usage statistics. If we can chart usage, we will know if we’re leaking memory or getting close to any of our limits.
Some of the information can also go into a metrics plugin with the new plugin architecture and that can be extended to get information from remote nodes.
Hey all. Just an update here. I pushed version 0.1.1 to the repo - which contains the mechanism for pulling /json/report from the devices.
Very quick search shows some nice airtime stats so far.
A search like this: index=meshtastic sourcetype=“meshtastic:api:rest” | eval prevTX = mvindex(‘data.airtime.tx_log{}’,1), prevRX = mvindex(‘data.airtime.rx_log{}’, 1) | timechart avg(prevTX) as avgtx, avg(prevRX) as avgrx by data.wifi.ip
This is looking at the second value in the airtime arrays so we see the “settled” amount of traffic.
An alternative would be to watch the “current” value by changing the mvindex parameter from 1, to 0. And, like I was expected we see the sawtooth pattern as the value constantly grows and resets.
Question for folks that might use this. If I added the airtime chart to the message tracking dashboard so that you could zoom into a specific point in time and see the messages flowing for just that time, would that help?
You can see that on the top chart I’ve got the airtime stats and I selected a window of about an hour towards the right. By selecting that time window, the panels below reflect just the message activity for that time. In fact you can see the gap in messages that correlate with the flat line in the selected portion of the graph.
I presume that the airtime totals are a reflection of the actual messages being sent and received, so I thought it would be useful to be able to see which messages were sent (or not sent) when anomalies are seen in the airtime stats.
Very nice chart! Unless everyone on a network provides consent, I think messages should be treated as private and not be displayed on a dashboard.
I closed your request for more data as done. You’ll get what power stats we have (thanks @crossan007!), memory usage for the heap, psram (this is currently unused) and spiffs.
The psram is not available on all devices, so you may get a 0 depending on your hardware.
I’ve just tried to get this running on the latest Splunk, but it seems the Python script itself is broken by breaking changes to Meshtastic’s Python API.
For starters, meshtastic.TCPInterface was moved to meshtastic.tcp_interface.TCPInterface, meshtastic.RadioConfig is no more, etc.
If I can find time, I could try rebasing it against the latest API and submit a PR, but right now, I don’t have the time