First Release of Meshtastic App for Splunk!

Happy New Year everyone! Now that I have a number of meshtastic devices, it became apparent that having a better understanding of various nodes and networks as they scale up is important. Thankfully we have a bunch of great ways to interface with the Meshtastic devices and network. I happen to work for Splunk and I felt like this was a great opportunity to provide a mechanism for this awesome community to quickly and easily begin working with Meshtastic data at scale.

So with that, I’m happy to announce the first release of a Meshtastic app for Splunk. Using this app, you can quickly configure Splunk to collect mesh, node, radio, and message info from any meshtastic device accessible by TCP. Further, I have created a set of dashboards to help you quickly analyze various aspects of that data.

Global overview of mesh devices

Message tracking and analysis

You can find the source, app packages, and step by step instructions here:

And I have created a short video that walks you through the process from a bare metal server to a fully functional environment in about 12 minutes.

Minutes 1-12 is the technical overview and setup. Minutes 12+ is an overview of the data and dashboard content.

I hope someone finds this useful. I think there’s a lot of room for improvement. I’m a novice when it comes to the types of insights one might care to know on these networks. I approached this release looking at some of the more obvious data points but I’m really hoping that the community has suggestions for changes and additional helpful knowledge that can be derived from.

Further, I think there’s a great opportunity to gain insight through analysis of the serial output/debug but I haven’t tried to approach that yet. @geeksville I’d be interested to know if you already run the serial output through any mechanisms for pulling out/searching data other than just grepping. If not, or if there’s something about the debug logs that you would find helpful to know about I would gladly include that in the next release.

Take care all!

9 Likes

Amazingly awesome - I’m going to use the hell out of this.

Btw, one of the features I’m hoping that the (coming) MQTT gateway is to let users optionally opt into sharing this sort of data (and the default setting will be off)

2 Likes

Are there any repetitive tasks you find yourself doing when you debug via serial? I’m thinking about adding a dashboard that will help analyze that output. Even something like just pulling the commonly requested details from any given serial dump. It would be neat to be able to take a serial output and upload it and get a nice report of it. Metadata, error counts, event timelines… I dunno.

re: repetitive tasks over serial
hmm - not too much. mostly when looking to debug a particular bug I usually only need the last 50ish lines of serial output to get the clues I need.

Though the creation of your tool gives me extra reason to continue the cleanup of the device logging and do the “if there is an api client, send log messages via the debug log protobuf”. Also, finally implementing different log levels on the device would help. I was kinda amazed I couldn’t find an existing ‘provide a java/python style’ logging abstraction for Arduino that is small. So I’ll probably split the thing I made out as a separate platformio library for others eventually.

This is really awesome!

On your device with the latest firmware, point your browser to /json/report . Have fun with that.

The airtime report is currently in seconds but I’ll probably rescale it to centiseconds in the not too distant future.

2 Likes

Oh sweet. This is helpful too. It’s pretty trivial to point Splunk at API endpoints (especially unauthenticated ones) to collect data. If I’m reading that right, each point in the XX_log field is a increment of seconds_per_period, with the total number of elements = periods_to_log.

Will you be adding more content here?

You’re good! Spot on!

Yes, I’ll be adding more there of counters that are not exposed over the protobuf API. Is there anything you can think of? It’s trivial to add more.

If it’s helpful, do this math:

data.airtime.seconds_per_period - (data.airtime.seconds_since_boot % data.airtime.seconds_per_period)

That’ll give you the number remaining seconds within the current period and everything will roll over.

It’ll slowly be documented here:

I will think on this a bit. Since Splunk is data over time, the historical numbers are less important - we’ll already have those through routine collection. So, having a top level field that is something like “tx_log_last” which contains the last full period measurement is easier to work with than having an array of values…

However, it doesn’t really matter and probably not worth changing anything for - because we can also infer this value by looking at tx_log[1]. And… what’s nice is that we can also poll /json/report more frequently than seconds_per_period and examine tx_log[0] to see what the current value is and also track that over time.

So, if we look at the “current value” (tx_log[0]) over time we’d see a sawtooth pattern (since it will roll over to 0 at each interval). and we’d have an actual value (tx_log[1]) that only changes once every “seconds_per_period” no matter how frequently it’s collected.

I think that makes sense. I will start collecting this data as part of v2 and see where it gets us. As for more content I don’t really know what all is possible but when I think of my other IoT use cases I see… Things like wifi drops/reconnects. Security events like # of hits to the api or web urls? If we can track source IPs seen, showing where connections to the API/web are coming from would be awesome. I don’t know if we have temperature/accelerometer sensing available or on for these devices, but knowing the total change in environment over time would be helpful too. ambient temperature, cumulative G’s/altitude changes could indicate theft/removal or other disruption.

Battery/power readings would also be really useful. Then I can use this telemetry to track and report on the health of the power source that the network might be dependent on. Doing things like prediction of battery charge and mean time between charge cycles can help us maintain hardware better.

Of course, exposing this stuff by API is great but only useful when these devices aren’t remotely deployed. I think ideally the node_info messages could be configured to send along more telemetry than they already do. I would guess we’re pretty limited on how much data we try to transmit routinely. But having a configurable node health packet would be pretty sweet. then remote nodes could have more robust telemetry reporting.

3 Likes

Please open a feature request to add the battery/power readings to the endpoint. I’ll get to it when I’m done working on my short list.

If anything else comes to mind, go ahead and open tickets for those too.

1 Like

Oh, you know what might be useful? Memory usage statistics. If we can chart usage, we will know if we’re leaking memory or getting close to any of our limits.

Some of the information can also go into a metrics plugin with the new plugin architecture and that can be extended to get information from remote nodes.

2 Likes

And we should probably add these to one of the cross platform protobufs (because that will also eventually allow remote access via the mesh)

1 Like

Sounds good, I’ll add those in asap.

Hey all. Just an update here. I pushed version 0.1.1 to the repo - which contains the mechanism for pulling /json/report from the devices.

Very quick search shows some nice airtime stats so far.

A search like this: index=meshtastic sourcetype=“meshtastic:api:rest” | eval prevTX = mvindex(‘data.airtime.tx_log{}’,1), prevRX = mvindex(‘data.airtime.rx_log{}’, 1) | timechart avg(prevTX) as avgtx, avg(prevRX) as avgrx by data.wifi.ip

gives us a nice set of graphs

This is looking at the second value in the airtime arrays so we see the “settled” amount of traffic.

An alternative would be to watch the “current” value by changing the mvindex parameter from 1, to 0. And, like I was expected we see the sawtooth pattern as the value constantly grows and resets.

Looking forward to tracking batteries next :slight_smile:

I have not added any content to the dashboards the incorporate this data yet.

Cheers!

Question for folks that might use this. If I added the airtime chart to the message tracking dashboard so that you could zoom into a specific point in time and see the messages flowing for just that time, would that help?

Like dis -

You can see that on the top chart I’ve got the airtime stats and I selected a window of about an hour towards the right. By selecting that time window, the panels below reflect just the message activity for that time. In fact you can see the gap in messages that correlate with the flat line in the selected portion of the graph.

I presume that the airtime totals are a reflection of the actual messages being sent and received, so I thought it would be useful to be able to see which messages were sent (or not sent) when anomalies are seen in the airtime stats.

Thoughts?

1 Like

Very nice chart! Unless everyone on a network provides consent, I think messages should be treated as private and not be displayed on a dashboard. :slight_smile:

I closed your request for more data as done. You’ll get what power stats we have (thanks @crossan007!), memory usage for the heap, psram (this is currently unused) and spiffs.

The psram is not available on all devices, so you may get a 0 depending on your hardware.

1 Like

Added another counter for you.

wifi.web_request_count

Simple counter of the number of requests sent to the web server.

1 Like

I’ve just tried to get this running on the latest Splunk, but it seems the Python script itself is broken by breaking changes to Meshtastic’s Python API.

For starters, meshtastic.TCPInterface was moved to meshtastic.tcp_interface.TCPInterface, meshtastic.RadioConfig is no more, etc.

If I can find time, I could try rebasing it against the latest API and submit a PR, but right now, I don’t have the time :frowning: