Trust-driven approach to enhance early forest fire detection using machine learning

In this research work, we assign the unique pair < ID, Location > to each GPS-enabled ISN and organize the ISNs into an optimal number of clusters. These ISNs monitor forests and sense various environmental parameters such as temperature, humidity, smoke, windy speed, level of toxic gases, light intensity etc. The ISNs are deployed in such a way that sensors can cover large forest areas without overlap coverage situations. Moreover, due care is given to the forest density during the deployment process since some barriers (such as trees) block the sensor range. The base station (sink) is placed near to the group members due to the resource-constrained nature of ISNs. We incorporate a novel factor known as trust score at the sensor level to assess the reliability of the monitored data since ISNs can be damaged/attacked by some adversaries in a hostile environment. Moreover, trust factors determine and diminish the effect of distance on the data integrity during data exchange between two communicating ISNs. If any ISN is found malicious at any stage, it will be removed from the network to get an accurate result. Within a group, each group member (GM) senses the environmental parameters, stores them into a logical sliding window¹, and sends the observed data along with the location to the group head (GH). The GH maintains the record of data packets sent by GMs, processes them, and further sends them to the base station. If any packet contains abnormal events such as a spark of fires, smoke, etc. then it is processed and forwarded on high priority. At the base station, a ML algorithm will test the monitored data and generate an alert message if fire accidents are detected. Moreover, if fire intensity and density are high then nearby ISNs themselves generate an alert message provided that ISN is a trusted ISN. The Fig. 2 shows the complete flowchart of proposed FFD system.

The trained data set consists of day and night scenarios, Peaceful fire, wildfire, low and high fire density levels, downpour scenarios, road presence, fire intensity and volume, small-width fire propagating over a large distance, fire hazard, etc. to minimize the false alarm rates. The distances between two ISNs are computed using Eq. (1) where SR denotes the sensing range of each ISN.

$$-\!\!\!\hboxz\right=2*\left[\textz\right*\text(60^\circ)\right]$$

(1)

Using the Eq. (1), the distance between two ISNs and GHs are 8.66 m and 88.66 m respectively. The sensor coverage distance function is determined by the forest density (d) at the sensor’s location $\:_{j}^{d})$, the forest density at the specific point $\:{(f}_{i}^{d})$, and the sensor’s maximum coverage distance (SR). Thereby, the coverage sensor distance is $\:\left[SR*\right(1-\frac{{(f}_{j}^{d}+\:{f}_{i}^{d}}{2*{f}_{max}^{d}}$)]. We compute the threshold ratio ($\:{\text{R}}_{\text{T}\text{h}}$) using Eq. (2) for each monitored feature by performing various trials in different climatic zones. In Eq. (2), X may be any feature such as Temperature, humidity, light intensity etc.

$${\text{R}}_{{{\text{Th}}}} = ~\frac{{{\text{average}}\;{\text{value}}\;{\text{of}}\;{\text{the}}\;{\text{logical}}\;{\text{sliding}}\;{\text{window}}\;{\text{for}}\;{\text{feature}}\;~\left( X \right)~~}}{{{\text{latest}}\;{\text{value}}\;{\text{of}}\;{\text{the}}\;{\text{feature}}\left( X \right)~}}$$

(2)

If threshold ratio is greater than threshold value of feature defined in our FFD system then it might be occurrence of a fire.

Table of Contents

Optimal number of clusters

The primary purpose for calculating the most optimal number of clusters WSN is to minimize the overall energy required for data collection, in comparison to alternative clustering patterns within the WSN. Additionally, it prolongs the lifespan of the WSN. If the number of clusters is not optimized, the overall energy consumption of the network per round increases exponentially. The optimal result cannot be achieved when the clusters are generated randomly. Clusters and network life span are closely linked so need to determine the optimal number of clusters for a network. In WSNs, increasing the number of clusters helps distribute the workload evenly among GHs, resulting in shorter communication distances between an ISN and its corresponding GH. Consequently, the total energy usage is correspondingly diminished. Conversely, when the number of clusters is increased, the communication line between a SD and the BS will involve additional GH, leading to a greater overall energy usage. Thus, determining the ideal number of clusters is a crucial aspect for WSNs. We examine a network consisting of $\:{N}_{s}$ ISNs that are randomly placed in a $\:M\:X\:M{\:m}^{2}$forest area. This network is then partitioned into K_opt (optimal value of clusters), which is the ideal number of clusters. Each cluster in WSN consists of $\:\frac{N}{Kopt}\:SDs$, with one SD designated as the ISN and the remaining ($\:\frac{N}{Kopt}$ -1) ISNs serving as members of that cluster. As the BS is at the boundary consequently, free spaces (fs) as well as multipath (mp) losses are considered. The K_opt are obtained using Eq. (3) as follows

$$k_{{opt}} = \sqrt {\frac{{N_{s} *\epsilon _{{fs}} }}{{2\pi *E_{{avg}} }}} *\frac{M}{{\sqrt {\epsilon _{{mp}} *D^{2} + \epsilon _{{fs}} } }}$$

(3)

Where D represents the distance from the BS to the GH, and ${N}_{s}$ is the total number of SDs divided among the clusters. The Eq. (3) considers the average energy available per ISN (${E}_{avg}$) to reflect a realistic energy constraint on ISNs which balances communication efficiency with ISN lifetime. On the other hand, it models the trade-off between intra-cluster and inter-cluster communication very well through explicit modeling both free-space${\epsilon}_{fs}$) and multipath $({\epsilon}_{mp}$) energy. This modification enables the Eq. (3) to be broadly scalable across a diverse range of network densities, forest coverages, and energy profiles. This improvement does benefit the network lifespan by decreasing energy consumption yet allowing efficient data transmission. It also makes the optimization of cluster design in WSNs much more generalized and thus can be adaptable to situations with a high level of complexity and heterogeneity, such as those with limited energy or dynamic environments.

Trust assessment model

In this subsection, we discuss the trust evaluation model to assess the reliability of ISNs. The trust score of ISN depends upon its activities (packet forwarding behavior, processing, ISN location, and ISN energy) monitored by its neighboring ISNs in $\:{\Delta\:}\text{t}\:$time. The trust score is updated after every $\:{\Delta\:}\text{t}\:$time and stored into the forwarding table maintained by ISNs. A malicious ISN always tries to drop/alter packet to damage the network. We compute communication trust (direct and indirect), data trust, and energy trust to isolate selfish ISNs from the network. Communication trust reflects the reliability that ISNs interact in the data transmission and message exchange.Data trust evaluates the accuracy and reliability of the data shared by ISNs. The energy trust calculation typically involves monitoring the energy levels of ISNs and adjusting trust scores accordingly. If a ISN’s energy drops below a certain threshold, its trust score can be significantly punished, reflecting that low energy ISNs are either unreliable or may not work as expected. Since sink ISN and GHs are trustworthy and powerful ISNs, we only compute the GH to GM trust and ignore the base station to GH trust evaluation because it will increase the unnecessary load on the networking which will reduce the power level of ISNs. A logical timing window¹ is employed to record the cooperative and non-cooperative interactions among ISNs. It records recent interaction history and removes older information to make accurate decisions about the ISN’s reliability. Each ISN takes $0.05\:\text{joule}$ to send and receive a data packet. Figure 3 shows the flowchart for trust evaluation.

Each GH stores configuration information of each GM present in its cluster in the form of a matrix < ID, location, Energy > as shown in Eq. (4).

$${\text{M}} = \left[ {\begin{array}{*{20}c} {{\text{ID}}_{{1,}} } & {\left( {{\text{X}}_{1} ,{\text{Y}}_{1} } \right)} & {{\text{E}}_{1} } \\ {{\text{ID}}_{{2,}} } & {\left( {{\text{X}}_{2} ,Y_{2} } \right)} & {{\text{E}}_{2} } \\ {…} & {…} & {…} \\ {{\text{ID}}_{{k – 1,}} } & {\left( {{\text{X}}_{{k – 1}} ,Y_{{k – 1}} } \right)} & {{\text{E}}_{{k – 1}} } \\ \end{array} } \right]$$

(4)

The GM (say x) to GM (say y) direct communication trust ($\:{T}_{x,y}^{C}\left({\Delta\:}\text{t}\right))$ and data trust ($\:D{T}_{x,y}^{D}\left({\Delta\:}\text{t}\right))$ is computed using Eqs. (5 and 6) as follows. The symbol S and U are amount of cooperative and non-cooperative interactions between ISN (x) and ISN (y) at time $\:\left({\Delta\:}\text{t}\right).$ The parameter U reflect how the suggested system handles data inconsistencies or false data injections. A higher value of U penalizes ISNs severely for data discrepancies, discouraging malicious behavior. The symbol q is natural abnormal factor (NAF) that determine the number of non-cooperative interactions due to natural calamities such as earthquake etc. In the suggested research work, the symbol φ denotes the penalty component, whereas θ represents the threshold value for the trust score. The parameter q reflects the extent to which negative interactions affect the trust score, while ϕ describes the punishment for behaving uncooperatively. Higher value of these parameters are meant for higher susceptibility of the network to misbehaviors such as packet loss or malicious attacks. For networks with high packet loss rates, increasing the value of q would increase the penalty to poor communications. $\:{\text{T}}_{\text{m}\text{a}\text{x}}$ is the highest trust value utilized in the research study. During the experimental investigation, we select 10 as the maximum value for $\:{\text{T}}_{\text{m}\text{a}\text{x}}$. Algorithm 1 calculates and updates the trust values inside a cluster. algorithm 2 provide the pseudo-code for data transmission between ISNs and BS.

$$T_{{x,y}}^{C} \left( {\Delta t} \right) = \left[ {T_{{\max }} \times \left( {\frac{{S_{{x,y}} \left( {\Delta t} \right) + 1}}{{\left( {S_{{x,y}} \left( {\Delta t} \right) + q*U_{{x,y}} \left( {\Delta t} \right)} \right) + 2}}} \right)*~\frac{{S_{{x,y}} \left( {\Delta t} \right)}}{{S_{{x,y}} \left( {\Delta t} \right) + 1~}}*~\frac{1}{{\sqrt[2]{{\varphi *U_{{x,y}} \left( {\Delta t} \right)}}}}} \right]~$$

(5)

$$DT_{{x,y}}^{D} \left( {\Delta t} \right) = \left[ {T_{{\max }} \times \left( {~\frac{{~S_{{x,y}}^{D} \left( {\Delta t} \right) + 1}}{{\left( {S_{{x,y}}^{D} \left( {\Delta t} \right) + q*U_{{x,y}}^{D} \left( {\Delta t} \right) + 2} \right)}}} \right)^{{\left( {\frac{{U_{{x,y}}^{D} \left( {\Delta t} \right) + 1}}{{S_{{x,y}}^{D} \left( {\Delta t} \right) + q*U_{{x,y}}^{D} \left( {\Delta t} \right) + 2}}} \right)}} } \right]~$$

(6)

The trust is constrained by a maximum value (T_max), preventing it from increasing without limit. The ratio $\:\left(\frac{{S}_{x,y}\left({\Delta\:}\text{t}\right)+1}{\left({S}_{x,y}\left({\Delta\:}\text{t}\right)+q*{U}_{x,y}\left({\Delta\:}\text{t}\right)\right)+2}\right)$ represents the Beta likelihood ratio, indicating the probability of success relative to the probability of failure, based on the recorded successes and failures for ISN (x) and ISN (y). When $\:{S}_{x,y}\left({\Delta\:}\text{t}\right)\:$significantly exceeds $\:{U}_{x,y}({\Delta\:}\text{t}$), the ratio will approximate 1, demonstrating a high level of trust. If $\:{U}_{x,y}({\Delta\:}\text{t}$) significantly exceeds $\:{S}_{x,y}\left({\Delta\:}\text{t}\right)$, the ratio will diminish, indicating a lack of trust. Thus, the Beta distribution’s correlation in Eq. (5) illustrates how the trust score is affected by both successes and failures, encouraging the model to adjust the level of trust accordingly. The term denotes the ratio of success probability to failure probability, modified by constants that are frequently employed to maintain calculation smoothness and avert divide-by-zero errors. This association with the Beta distribution underscores that the trust computation is adaptively modified according to recorded successes and failures, similar to way in which a Beta distribution revises its parameters when further communication exchanges is gathered.

The Eqs. (5 and 6) incentivizes positive interaction and penalizes negative behavior depending on the q and ϕ parameters, which indicate the effectiveness of proposed model to managing untrustworthy activity. Data trust can similarly be quantified using Eq. (6) that prioritizes consistency and accuracy in the data supplied by ISNs, while energy trust can be computed by a linear or non-linear function contingent upon energy thresholds. The above equations represents a balance between positive interactions and untrustworthy interactions. Positive interactions enhance the trust score, whereas the score diminishes with an increase in unsuccessful interactions, particularly if they are malevolent or erroneous. The penalty factor q and the severity factor ϕ enable precise regulation of the impact of failures on confidence. The normalization process, achieved by multiplying T_max in Eq. (5), guarantees that the trust score is confined within an acceptable range (limited by T_max) and prevents any single factor from disproportionately influencing the trust computation. The second term $\:\frac{{S}_{x,y}\left({\Delta\:}\text{t}\right)}{{S}_{x,y}\left({\Delta\:}\text{t}\right)+1\:}$ is vital because successful interactions enhance the trust score; nevertheless, a large number of successful interactions should not result in unreasonably increased trust values. Normalizing the success rate mitigates the excessive inflation of the trust score. The third part $\:\frac{1}{\sqrt[2]{{\phi\:\text{*}U}_{x,y}\left({\Delta\:}\text{t}\right)}}$ introduces a severity factor to punish untrustworthy behavior more effectively that moderates the penalty for untrustworthy interactions. As the frequency of unsuccessful interactions escalates, the punishment intensifies. The factor ϕ modulates the penalty according to the severity of untrustworthy behaves that deserve punishment. The square root guarantees that although the penalty increases with $\:{U}_{x,y}({\Delta\:}\text{t}$) it does so in a manner that is not excessively severe. This segments of the Eq. (5) evaluates the seriousness of failures and finds their impact on the trust score. Increased failure rates, particularly those of a malevolent nature (such as blackhole attacks or selective forwarding), incur a more severe penalty. A practical case would clarify the model’s functionality in real-world contexts. For instance, in an industrial WSN application, consider that ISNs A, B, and C participate in communication. ISN A possesses a record of trustworthy contact with ISN B, yet exhibits low energy, whereas ISN C frequently transmits erroneous data yet maintains high energy levels. According to the weighted trust considerations, ISN A may be deemed reliable for communication but penalized for energy consumption, whereas ISN C may face penalties for data trustworthiness but receive rewards for energy efficiency. The ultimate trust score will represent the combination of all of these factors, with the weighting modified according to the application’s priorities (e.g., communication prioritized over energy or vice versa).

Algorithm 1 states that if the cooperative contacts between ISN(x) and ISN(y) are larger than zero and there are no unsuccessful interactions, then the proposed system will assign the maximum trust value $\:{\text{T}}_{\text{m}\text{a}\text{x}}$. Furthermore, if the successful interactions between ISN(x) and ISN(y) are nonexistent and the unsuccessful interactions are more than zero, the recommended approach will assign a trust value of zero to SNs. In addition, if the number of favorable interactions between ISN(x) and ISN(y) is more than zero, and the number of negative contacts is also greater than zero, then the suggested system assigns trust levels based on the calculations from Eqs. (5 and 6). In the event that there are no successful or unsuccessful contacts, the suggested system utilizes suggestions from peers to improve the accuracy of the proposed technique in a hostile environment. When evaluating trust in peer recommendations, we only take into account trusted neighbors who have a direct connection. This helps to minimize communication burden and enhance the accuracy of the Unified confidence Model (UTM). The peer recommendation trust estimation enhances the resilience of trust value as it mitigates the potential compromise of trust value by malicious ISNs. The peer recommendation trust is determined by utilizing Eq. (7).

$$\:{PR}_{x,y\:}\left({\Delta\:}\text{t}\right)\:=\left[\left(\frac{{\sum\:}_{j=1}^{z}{T}_{x,j\:}\times\:\:{T}_{j,y}}{\left|z\right|}\right)\right]$$

(7)

Where z represents the collection of SDs that are directly trusted. In this context, we exclude ISNs with $\:{T}_{x,y\:}\left({\Delta\:}\text{t}\right)<\frac{{\text{T}}_{\text{m}\text{a}\text{x}}}{2}$, which means we eliminate malicious ISNs in order to get an adaptable trust value. Final trust value $\:\left({f}_{x,y}^{T}\left({\Delta\:}\text{t}\right)\right)$ is computed by simply aggregating Eqs. (5, 6 and 7) as follows using Eq. (8). the sum of weights in Eq. (8) is 1. The status of a ISN is determined using Eq. (9) as follows.

$$\:{f}_{x,y}^{T}\left({\Delta\:}\text{t}\right)\:=\:\frac{{{w}_{1*}T}_{x,y}^{C}\left({\Delta\:}\text{t}\right)+{w}_{2*}\:D{T}_{x,y}^{D}\left({\Delta\:}\text{t}\right)\:+\:{{w}_{3*}PR}_{x,y\:}\left({\Delta\:}\text{t}\right)}{3}$$

(8)

$$S\left( {f_{{x,y}}^{T} \left( {\Delta t} \right)} \right) = ~\left\{ {\begin{array}{*{20}c} {\left[ {\left\lceil {\frac{{{\text{T}}_{{{\text{max}}}} + 1}}{2}} \right\rceil ;\frac{{{\text{T}}_{{{\text{max}}}} }}{2}} \right]} \\ {~\left( {0;\theta } \right)} \\ {\left[ {\theta ;\left\lceil {\frac{{{\text{T}}_{{{\text{max}}}} + 1}}{2}} \right\rceil } \right)} \\ \end{array} \left| {\begin{array}{*{20}l} {highly~trusted~ISN} \hfill \\ {malicious~ISN~} \hfill \\ {legitimate~ISN} \hfill \\ \end{array} } \right.} \right\}$$

(9)

The significance of communication, data, and energy trust factors is dependent upon the context and the function of the ISN within the IWSN. In an IWSN, when ISNs are required to deliver vital and highly precise data, data trust may be prioritized over energy trust. In contrast, in situations where the network’s primary objective is to optimize the operational longevity of ISNs, such as in remote monitoring or sensor networks, energy trust may assume greater importance. The trade-off process can be represented as a weighted sum, in which each trust element is multiplied by a weight that signifies its significance in the specific circumstance. The weights can be dynamically adjusted according to the real-time performance of ISNs or the changing requirements of the network. For example, if a ISN demonstrates persistently poor communication performance yet remains energy-efficient, the system may recalibrate the weight assigned to communication trust to diminish its impact on the overall score, while augmenting the weight of energy trust. The trade-offs of trust factors can be optimized using empirical data or simulations that replicate various network situations. The weights allocated to each aspect must be dictated by the network’s specific objectives, such as emphasizing dependability over energy efficiency. An illustrative scenario could be an emergency response network where the dependability of communication is crucial, therefore the communication trust component may prevail, however in a long-term monitoring application, energy trust could be the determining factor. The symbol $\:\theta\:\:$ is “application-dependent trust threshold whose value depends on application requirements. The energy trust of an ISN x at $\:\left({\Delta\:}\text{t}\right)$ time$\:(\:{ET}^{x}\left({\Delta\:}\text{t}\right)$) is computed with the help of a predefined threshold value $\:{E}_{th}$ which is 20% of the total energy. If the current energy level (CEL) of an ISN (let’s say x) is greater than 50% of total energy, then ISN (x) is highly trusted. If $\:20\text{\%}\:\text{o}\text{f}\:\text{t}\text{o}\text{t}\text{a}\text{l}\:\text{e}\text{n}\text{e}\text{r}\text{g}\text{y}<{\text{C}\text{E}\text{L}}^{\text{x}\:}\le\:50\text{\%}\:\text{o}\text{f}\:\text{t}\text{o}\text{t}\text{a}\text{l}\:\text{e}\text{n}\text{e}\text{r}\text{g}\text{y},$ then ISN (x) is trusted, else it is non-trusted. Let us, we define the number of ISNs in WSN is $\:{\text{N}}_{\text{s}}$ and the number of clusters (groups/cluster heads) are g then size $\:\left(\text{n}\right)$ of each cluster is $\:{\text{N}}_{\text{s}}$/g. We divide the total communication overhead into intra-cluster communication overhead and inter-cluster communication overhead. In intra-cluster trust evaluation, ISN x sends and receive one GH feedback request to interact with ISN y i.e. total communication overhead of $\:2$ request packets. In the worst case, if ISN x wants to interact with all $\:(n-2)$ ISNs then total communication overhead is $\:2*(n-2)$ request packets. If all GMs (except GH) wants to interact with each other then maximum communication overhead is $\:2\:(n-2)(n-1)$. During intra-cluster feedback trust calculation, GH sends r requests to only direct trusted members and receive r response where $\:(\:r\:\le\:\:n-1)$. The total communication overhead due to feedbacks by GH at GM level is $\:2*r$ request and response packets. Thus total communication overhead in intra-cluster trust computation $\:{C}_{intra}=\:\:2\:(n-2)(n-1)+\:2r$. In intra-cluster trust evaluation, $\:GH\:\left(i\right)$ sends and receive one BS feedback request to interact with $\:GH\:\left(j\right)$ i.e. total communication overhead of $\:2$ request packets. Thus total communication overhead for g groups in inter-cluster trust computation” C_inter = 2g. Hence, in the most unfavorable scenario, the highest level of communication overhead $\:\left({C}_{Max}\right)$ for the suggested scheme (UTM) can be expressed as:” $~C_{{{\text{Max}}}} = g*C_{{{\text{inter}}}} + C_{{{\text{inter}}}} = g*\left( {2~\left( {n – 2} \right)\left( {n – 1} \right) + ~2r~} \right) + 2g.$

Fire detection algorithm (FDA)

This subsection discussed the fire detection algorithm (FDA) in forest areas. Algorithm 3 is an efficient fire detection algorithm (FDA) that takes various environmental features as input and provides a correct outcome in terms of fire ignitions. According to algorithm 3, if sensed parameter value exceeds the predefined threshold value then it indicates the occurrence of a forest fire. In this scenario, an alarm will sound so that all the officers will come to know about the forest fire at an earlier stage for necessary action. Sometimes malicious ISNs don’t provide correct information and neither provides genuine feedback about neighbor ISN intentionally then in such cases, the proposed algorithm computes the final trust score of ISN and removes the malicious ISNs. The data sensed and processed by good ISNs are forward to the BS through GHs where a ML algorithm is applied to the data for testing the model. If any fire ignitions are found during the testing process then Initiate an alert notification, indicating the occurrence of a fire. The detailed fire detection algorithm (FDA) (algorithm 3) is given below as follows.