Mantis Search Algorithm Integrated with Opposition-Based Learning and Simulated Annealing for Feature Selection

Feature selection (FS) plays a vital role in minimizing the high-dimensional data as much as possible to aid in enhancing the classification accuracy and reducing computational costs. The purpose of the FS techniques is to extract the most effective subset features

Every sample has a unique set of characteristics.The problem with the dataset is not just its enormous dimension sizes; it also includes attributes that are redundant or unimportant.Furthermore, the collected dataset can have a high level of noise, and the model might be complex.These issues raise computing costs and reduce the accuracy of machine learning (ML) techniques.Therefore, the feature selection (FS) as a preprocessing step is used to pick the best subset of the valuable characteristics to reduce the computational cost and improve the classification accuracy of the ML classifiers.The use of feature selection to lessen the effects of data dimensionality has proven to be quite effective.Locating the optimal selected feature (OSF) is a NP-hard optimization problem because it requires observing a huge number of combinations to reach the best one that could simultaneously optimize both the number of selected features and classification accuracy.Several feature selection techniques have been presented in the literature, which is divided into three categories: Filter, wrapper, and embedding techniques [1].The filter method assesses the chosen subset of features based on the data's properties so, we can say that it always focuses on the broad features of the data.In contrast, the wrapper methods employ a ML classifier to observe the quality of the selected features.Those methods yield more accurate results than filters, but they are expensive in terms of computational cost.Embedded techniques are a combination of filters and wrapping methods.When using embedded approaches, feature selection happens concurrently with the classifier during the training phase [2].Although wrapper procedures are slower, they yield better results than filter methods.Due to the effectiveness of wrapper-based FS techniques, they are extensively employed in the literature to optimize the FS problem for several fields.Among those techniques, the metaheuristic algorithms could achieve outstanding outcomes when applied to solve this problem due to their strong exploration and exploitation characteristics.The metaheuristic techniques could achieve outstanding results for several optimization problems, including both continuous and combinatorial problems, in a reasonable amount of time.The majority of those techniques are based on two phases: exploration and exploitation.In the first phase, known as exploration, the search area is more extensively examined as the algorithm searches for the most promising places.The second step of the metaheuristic begins by scouring the most promising regions in greater detail to identify even better solutions.According to some papers [3], Metaheuristic algorithms are categorized into five categories as follows: the first category is called evolutionary-based metaheuristic algorithms, these algorithms including optimization paradigms that are based on evolutionary mechanisms, such as biological genetics and natural selections, genetic algorithms (GA) [4], and differential evolution (DE) [5] are examples of evolutionary-based metaheuristic algorithms.Human-based metaheuristics are the second category, the foundation for introducing human-based metaheuristic methods is a mathematical simulation of a variety of human behaviors.Teaching-Learning-Based Optimization (TLBO) [6], Poor and Rich Optimization (PRO) [7], and Human Mental Search (HMS) [8] are examples of the Human-based metaheuristics.Swarm-based is the third category, they are designed to mimic the swarming habits of birds, mammals, and other natural organisms.some of the well-known algorithms that come to mind are Particle Swarm Optimization (PSO) [9], Ant Colony Optimization (ACO) [10], and Firefly Algorithm (FA) [11].The mathematical representation of different physical laws and occurrences serves as the foundation for the building of physics-based metaheuristic algorithms, which is the fourth category of metaheuristics.Simulated Annealing (SA) [12] and the Gravitational Search Algorithm (GSA) [13] are two examples of widely recognized physics-based algorithms.Mathematics based is considered the final category of metaheuristic algorithms, it based on the mathematics mechanisms.Arithmetic optimization algorithm (AOA) [14], and sine cosine algorithm [15] are examples of the mathematics based metaheuristic.Many metaheuristic algorithms are presented in this respect to address feature selection challenges.However, we argue that previous research has some shortcomings, including poor convergence, local optima trapping, and increased computation durations.The previous problems were the motivation to present our proposed model.The proposed opposition-based mantis search simulated annealing (OBMSASA) utilizes an improved version of the mantis search algorithm (MSA) based on an opposition-based learning method as the first phase, this mechanism improves the algorithm's ability to explore, making it able to provide better quality solutions.Secondly, the opposition-based mantis search algorithm (OBMSA) is hybridized with simulated annealing (SA).The convergence speed is increased by using SA as a local search to reinforce the exploratory operator even more.For assessing the performance of the proposed OBMSASA, twenty-one datasets were employed to track the effectiveness of the suggested technique.A comparative analysis was conducted using multiple recently published methods to address the feature selection issue, such as Discrete equilibrium optimizer combined with simulated annealing for feature selection (EOSA) [16], two-phase mutation gray wolf optimizer (TPGWO) [17], hybrid harris hawks optimization simulated annealing algorithm (HHOSA) [18], slime mould algorithm marine predators algorithm (SMAMPA) [19], sine cosine algorithm (SCA) [20], opposition based learning salp swarm algorithm (OBSA) [21], crossover cooperative whale optimization algorithm (CCWOA) [22], and the standard mantis search algorithm (MSA) [23].The main contribution of this work is finding the best subset of features using a hybrid approach between an enhanced mantis search algorithm and simulated annealing addresses most of the constraints found in the previous studies.Since the SA can accept a subpar solution based on probability, it is hybridized with the Mantis Search Algorithm to escape from the local optima and improve population diversity as well.Moreover, using opposition-based learning in MSA can increase the diversity of the original population.Different and large-dimensional dataset sizes are utilized to assess the efficacy of the proposed method.The remainder of the paper is arranged as follows: Section 2 discusses some recently published techniques for the FS; Section 3 briefly describes the K-nearest neighbor approach, the MSA algorithm, the oppositionbased learning method, and simulated annealing (SA) as the main components of the proposed algorithm; Section 4 discusses the proposed algorithm; Section 5 provides numerical results and discussion; and the conclusions and recommendations for the future are presented in Section 6.

| Related Work
Large datasets present a significant challenge to machine learning techniques because of their high dimensionality, which might hinder data mining.Applications that use datasets with a lot of dimensions must therefore raise the classification parameters.Consequently, the classifier's performance considerably deteriorates.According to this principle, there is an urgent need to use methods for dimensionality reduction.Dimensional reduction is one well-liked method to get rid of noise and unnecessary features.It is a useful technique for increasing model generalization, reducing computational complexity, increasing precision, and reducing the amount of storage needed.One of the most popular techniques used in solving this issue is the feature selection process.A lot of well-known feature selection methods are used to solve the problem of high dimensionality; Metaheuristics is one of these methods that has become widespread recently.The different feature selection methods are indicated in Figure 1.Feature selection methods aim to find the best subset of the features, keeping in mind the efficiency of classification as the priority.The problem of identifying the optimal subset of characteristics is classified as  The primary tasks carried out by metaheuristic algorithms can be illustrated in Figure 2. To find the optimally selected feature, the scholars developed a number of metaheuristic approaches.Thus, we shall look into a number of those algorithms.Lately, to address the feature selection problem, a variation of the Vortex Search Algorithm (VSA) integrated with different chaotic maps has been investigated as a way to enhance the VSA operators and aid in controlling both exploitation and exploration [25].This method's effectiveness was assessed using 24 UCI benchmark datasets.In [26], the authors suggested that MetaSCA stands for hybrid metaheuristic optimization.It is based on a golden sine strategy and a multilevel regulatory factor strategy for feature selection, as well as an enhanced sine cosine algorithm.Seven UCI datasets are employed in the evaluation process.The outcomes demonstrated that it attains superior performance in terms of accuracy and the ideal feature subset.This module's drawback was that it took a long time to extract the optimal feature subset from a large number of features; hence, it still requires a lot of work to enable a noticeable boost in the speed of the feature selection process.Another module was introduced by the authors of [27].The goal of this study was to introduce a new feature selection (FS) technique by enhancing Gorilla Troops Optimizer (GTO) performance through the use of the GTO-BSA technique, a method for bird swarms (BSA).By using BSA, with a great ability to identify the viable regions that offer the optimum solution, the performance of GTO was improved.The testing results demonstrated that the suggested GTO-BSA method outperformed several existing metaheuristic algorithms in terms of results.One of the research's limitations is that it does not address all multi-objective challenges.Furthermore, in order to identify the best feature subsets from the NSL-KDD dataset, this research [28] suggests an innovative feature selection technique that makes use of a genetic algorithm (GA).Moreover, decision trees (DT) and logistic regression (LR) have been used in hybrid classification to improve accuracy (ACC) and detection rate (DR).This study optimized the chosen ideal features by applying and contrasting the performance of multiple meta-heuristic techniques.Despite providing good accuracy, the suggested task has certain drawbacks.This disadvantage is that, in addition to increasing complexity, the suggested method takes longer to converge, which could be expensive computationally.Another method in [29], in which the Sine-Cosine hybrid optimization algorithm is combined with a modified whale-optimization approach to handle feature selection and achieve high accuracy, is called SCMWOA.On 19 datasets, SCMWOA is evaluated.The results demonstrate how accurate the SCMWOA algorithm is.
Jun Li et al. [30] provide a better-hybridized salp swarm algorithm that is based on the first two stages of the TLSSA teaching-learning-based optimization technique.Although TLSSA produces higher results, it is limited to use on four feature selection datasets.Furthermore, a modified SSA method known as quantized SSA (QSSA) is recommended by [31] to increase performance.The quantization operator, a mathematical operator, is integrated into the fundamental SSA in the proposed method to choose the best features from benchmark datasets while maintaining accuracy.To lessen the dimensionality of agricultural disease detection, Sonal Jain et al. [32] presented a binary version of the memetic salp swarm optimization method (MSSOA), which finds the ideal number of characteristics for the best classification accuracy.The findings show that the suggested approach performs better than the other algorithms in terms of achieving accurate classification and minimizing the size of the feature.Amel Ali Alhussan et al. [33] proposed an innovative feature selection technique that uses the KNN classifier and the binary version of the waterwheel plant's method of prey selection (bWWPA) as communication to find the optimal feature combination.Thirty datasets from the UCI machine learning repository were used in experiments to test the robustness and stability of the suggested bWWPA approach.A nonlinear binary grasshopper whale optimization algorithm (NL-BGWOA) is an amalgamated algorithm that is put forth by the authors of [34].The suggested method maximizes the breadth of exploration in the target region by expressing a new position update strategy that combines the position variations of the whale and grasshopper populations.For assessment, ten different high-dimensional UCI datasets are used.NL-BGWOA produces good results; however, when it comes to datasets with fewer features, its fitness and accuracy of classification still need to be improved.Mustafa Serter Uzer et al. [35] introduced a new binary hybrid optimization-based wrapper feature selection technique called BWPLFS is put forth.It combines the Lévy Flight, Particle Swarm Optimization, and Whale Optimization Algorithms.To assess the suggested algorithm's performance, some common benchmark datasets are taken from the UCI repository.
Mahmoud Ragab [36] introduced a binary combination of currently available meta-heuristic methods, the particle swarm optimization (PSO) algorithm and the firefly algorithm (FA), that combines the best aspects from each method to offer an optimized and effective method of addressing the feature selection problem and is used to handle high-dimensionality datasets.Moving to [37], the authors address the drawbacks of the standard grasshopper optimization algorithm (GOA) by strengthening its global optimization capability and hindering it from getting into the local optimum trap by integrating elite opposition-based learning and Gaussian bare-bones into the GOA.The suggested module still has a lengthy computation time, even though it achieves good results in terms of accuracy and obtaining the best subset of features.
To tackle feature selection problems, a lot of algorithms are introduced in this regard.One of them, EOSA, in the article [16] suggests a binary adjusting of the recently suggested meta-heuristic, discrete equilibrium optimizer (EO), boosted with simulated annealing (SA).This procedure is employed as a local search process to improve the exploitation capability.The authors use the proposed EOSA method on eighteen popular UCI datasets and compare it with many other algorithms.EOSA exhibits strong performance on several highdimensional datasets.To tackle feature selection for classification issues based on wrapper approaches, Abdel-Basset et al. suggested a new Grey Wolf Optimizer algorithm incorporated with a two-phase mutation called (TMGWO) [17].Another study introduced a hybrid variant of the Harris Hawks Optimization algorithm (HHOSA) that uses wrapper approaches to solve the FS issue for the sake of classification [18].HHOSA relies on bit-wise processing and Simulated Annealing.Recently, an improved version of the slime mold algorithm called SMAMPA has been introduced to address FS problems [19].This version relies on the Marine Predators Algorithm (MPA) operators, which play the role of a local search technique, so it helps SMA increase the rate of convergence and prevents the attraction to local optima.A sine-cosine algorithm is introduced to make an appropriate tradeoff between selecting the optimal subset of features and maximizing the accuracy of classification [38].Some authors introduced an enhanced version of the salp swarm algorithm to tackle feature selection problems and choose the best subset of features in wrapper mode.The original SSA algorithm was modified in two key ways to address its shortcomings and make it suitable for feature selection issues [21].Opposition-Based Learning (OBL) is used during the SSA's startup phase to increase the diversity of populations in the area of search, which is the first improvement.The creation and application of a new local search algorithm with SSA to enhance its exploitation constitutes the second enhancement.Another study presented a novel strategy called Horizontal Crossover and Cooperative-hunting-based WOA (CCWOA) to solve these shortcomings [22].The WOA framework is strengthened by this algorithm by adding a weight, horizontal crossover approach, and cooperative learning methods.
With remarkable success, metaheuristic approaches were developed to address a wide range of contemporary and emerging issues.As a result, numerous academics used metaheuristic algorithms to quickly address FS problems.Nonetheless, we contend that prior research suffers from several flaws, such as:  Poor convergence and getting trapped in LO.
 The lengthening of computation times.
 Unfortunately, huge data dimensions may have an impact on the algorithm's performance.
 It takes time to prepare the algorithm's parameters before having to select the ideal configuration.
A hybrid approach between an improved mantis search algorithm and simulated annealing is looking for the optimal subset of features to address the majority of the constraints discovered in the earlier investigations.Since the SA can accept an inferior solution depending on probability, it is hybridized with the Mantis Search Algorithm to escape from the local optima and enhance population diversity as well.Furthermore, applying opposition-based learning to MSA can broaden the initial population's diversity.Various and largedimensional dataset sizes are employed to examine the effectiveness of the suggested approach.

|Mantis Search Algorithm
Abdel-Basset introduced a nature-inspired algorithm that mimics the physical and behavioral methods used by mantises to protect their prey and evade predators [23].There are about 2400 kinds of this kind of bug around the globe, grouped into 434 genera.The bug is distinguished by its long body, triangular face with two antennae and exophthalmic eyes that are compound, and elastic neck that allows certain species to rotate the head around 180 degrees.Ants, scorpions, and wasps are the usual food sources for mantises.Small mantises can be consumed by large mantises as well.

|MSA's Mathematical Model
The three primary MSA stages are shown mathematically in this section and are explained in brief in the following order: The initial positions of the mantises (population initialization) are the initial stage, which is in charge of randomly assigning the mantises within the optimization's search space.(ii) The second step is the phase of exploration, or "looking for prey," which imitates the actions taken by the mantises to locate their prey.(iii) The third step is the phase of exploitation, which imitates the mantises' attacking behavior.(iv) The fourth phase indicates sexual cannibalism.(v) The final stage describes the method of retrieving the solutions that are located outdoors in the search space.All these stages will be mathematically formulated in the following sections.

|Initialization
The suggested algorithm begins with an initial group of mantises, just like population-based approaches do.In a mantis optimization methodology, every mantis stands for a potential solution for an optimization issue.A two-dimensional matrix m of size  ×  can represent a population of  mantises (solutions) in a D-dimensional search space.A vector of the following form can be used to define the position of the mantis  for function assessment : Where  indicates the current solution that belongs to the set {1,2, … . ., } ;  stands for the function evaluation that is now in effect;  denotes the problem's dimension; and the  ℎ mantis' position can be represented by    .The  ℎ mantis's    initial vector in the search space can be randomly generated using the equation below: where        stand for the j-dimension's upper and lower bounds, correspondingly.A random number ∈ [0,1].Every time the mantis moves into a new place, the potency of the solution is assessed based on the fitness function.The following is an update of the current position.The mantis shifts to the new place when the solution quality there is superior to the one it is now in.If not, the MOA approach keeps the mantis at its current location.

|Exploration Phase (Searching for Prey)
Mantis shrimp are divided into two groups: smashers and spearers.The smashers hunt their prey far away in their natural habitat, on the ground and the leaves and branches of trees, while the spearers wait for their prey to approach their burrows and strike.This novel algorithm simulates this phase in two ways: first, it attempts to mimic the actions of the smashers, which search other areas for their victim, and second, it mimics the behaviors of the spearers, which wait for their prey in a concealed position before pounces.

|Smasher's Exploration Behavior
Those attackers use a variety of step sizes, including long, little, and surprise orientations, to hunt food outside of their burrows.This behavior covers both short and large step sizes by incorporating the Levy flight and normal distribution, while the surprise orientation is randomized and imitated.
The levy flight produces small step sizes that require huge function assessments to reach the desired solution, making it impractical when used alone.In contrast, the normal distribution produces huge numbers that push the solution into distant positions and subsequently discard an extensive number of solutions.Hence, to simulate the behaviors of the smashers while they hunt for their victims, the authors study recombination between these to produce distinct sequences of numbers in this research, supporting both little and comparatively high numbers.Thus, the steps produced by hybridization are in between those of a very large one and a very small one.At last, the following is the mathematical framework for this behavior: Where  ⃗ ⃗⃗   is the location of the  ℎ mantis at the function assessment t, | 2 | is a random number derived from the normal distribution, and  1 ,  2 , and  3 are three randomly produced values between 0 and 1.  1 ⃗⃗⃗ is a vector of values constructed using the Levy-flight approach. ⃗⃗   ,  ⃗⃗   and  ⃗⃗   are three mantis, which were selected in a random manner from the current population, such that  ⃗⃗   ≠  ⃗⃗   ≠  ⃗⃗   ≠  ⃗⃗   .a binary vector U ⃗ ⃗ is produced regarding to the next formula: Where  4 ⃗⃗⃗⃗ and  5 ⃗⃗⃗⃗ are two random vectors including values between 0 and 1.The first mathematical formula in Eq. (3) imitates the hybrid movements, whereas the second formula imitates the sudden orientation of the movements.

|Smasher's Exploration Behavior
These predators' means of establishing archives that include the locations of several burrows, where they wait for prey to approach and strike, are used to imitate their exploratory habits (Smasher behavior).Every mantis' local-best solutions are assigned to this archive, which is filled by randomly selecting one solution from inside it to replace the old one.With the help of their 180-degree rotating eyes located in their heads, these predators use them to survey their surroundings.The formula below is used to imitate this behavior: To allow the mantis to cross the ambush's distance, α is a parameter that controls its position.This parameter can be mathematically defined as described below: wherein  5 is a number that is generated at random in the range of [0,1].The function assessments' maximum number is denoted by .Mimicking the movements of prey as they seek their prey in their surroundings; nevertheless, this behavior can be determined to bring the prey inside ambush range employing the subsequent formula: where the upper and lower boundaries for the j-dimension are denoted, respectively, by    and    .A number chosen at random between in the range of [0,1] is called .  ⃗⃗  ′ represents a solution that was chosen at random from the repository to symbolize the  ℎ mantis' burrow.μ, a distance parameter that regulates the prey's position, is calculated using the subsequent formula: Furthermore, the following mathematical formulation represents the actions of ambush behavior mantises and their prey: In this case, the two numbers,  2 and  3 , are chosen at random in the range of [0,1] for exchanging the behaviors of ambush hunters and prey.Lastly, there are the two exploratory behaviors of spearers (Eq.( 9)) and ambushers (Eq.( 3)).The recycling control factor (RCF) is a factor that can be used to balance various behaviors during the optimization process.The following is the mathematical formulation for this factor: )) Where  represents an integer.

|Attack the Target: Stage of Exploitation
The two phases of a mantis's prey-catching action are their approach and the sweeping motion [39].A mantis raises and spreads its arms during the first stage, known as the approach phase.The mantis gathers its prey at a fast speed and drags it in to consume it during the second phase, known as the sweeping phase.It's interesting to note that the mantis may gauge its distance from its target before choosing to sweep (strike) [40].The mantis hovers at an appropriate angle before the strike and strikes the prey quickly.A mantis will frequently fix its error with a similar pause if it initially misjudges the velocity of its prey.As a result, two essential components of the hunting process's effectiveness are the assessment of the distance that exists between the predator and the target, or striking distance, and the velocity of the assault, or strike speed.To model this behavior analytically, the following three steps must be taken:  Calculating the strike distance(  ).
 Determining the striking speed (  ).
 The mantis tries to strike again in this instance, taking into account the strike's failure.

i). Calculating the strike distance(𝒅 𝒔𝒕 ).
The following formula can be used to determine the mantis's strike distance at the function execution .
, these relations can be indicated in (Figure 5) in [23].
We get     after a certain amount of computations.
Where  ⃗⃗   is the  ℎ mantis's present position, and  * is the exact position of the prey or the solution that was found to be the best by far.
The mantis attacks its prey using its frontal legs.By stabilizing its rear legs and expanding its forelegs out as far as it can in the direction of the prey, it updates its location.The concept of the sigmoid function can be used to quantitatively approximate the speed at which a mantis strikes its prey with its front legs.The velocity of striking is determined using the following equation: Where ρ, a constant value, indicates the gravity acceleration ratio of the mantis's strike, and V is the mantis's strike velocity.The number  is produced to regulate the acceleration caused by gravity rate, and it ranges from -1 to 1.To capture the prey, every mantis gets updated using the formula below: The mantis shifts its location between where it is now,    , and the target's position to minimize the space between them to expedite its assault process.  +1 indicates the new location of mantis  during the function assessment , and   sets the mantis's striking velocity.A mantis's strike may occasionally miss, in which case it must alter its path before trying again.As a result, the mantis changes its course in response to the directions of two randomly chosen mantises from the entire population.
where two mantises, X and Y, were chosen at random from the existing population.The mantis fell into a trap of the local optimality as a result of the mantis strike failing.To keep the algorithm from slipping into the local optimum trap, the subsequent mathematical formula is suggested: This formula is applied in MSA with a probability of failure; for two reasons, the first one avoids getting into local minima, and the second accelerates the convergence speed to the best solution.This probability of failure can be expressed as follows: Where A is a predetermined, fixed value that governs both exploration and exploitation operators and ranges from 0 to 1.

|Sexual Cannibalism
Sexual cannibalism is the term for the act of the female praying mantises eating the male during or after copulation.This behavior is formulated as follows: In the case of praying mantises,    stands for the female, and    , a randomly chosen answer, symbolizes the male who is drawn to the female and mates with it before being devoured.An imprisoned female initiates the process of attracting a partner by acting on a possibility   whose worth progressively decreases with repetition.
The mating process and the creation of new progeny are expressed by the uniform crossover operator, which is derived from the operators of genetics and is given by the formula that follows: where  ⃗⃗ 11  symbolizes the male that mates with the female.Following or throughout the mating process, the female will use the following equation to consume the male: In this case,    stands for the male, μ for the male's consumed portion, and cos(2) gives the female the freedom to spin the male around throughout the eating process.
To express the prior behavior in all of its phases, utilize the MSA framework (Algorithm 1).

|Proposed Algorithm
In this section, the suggested algorithm OBMSASA is going to be thoroughly examined and clarified.The feature in FS is binary; if it is picked, it is set to one; if not, it is set to zero.The goal of the Harris Hawks optimization technique is to resolve continuous issues that defy the binary character of the FS issue.Our suggested method consists of two primary steps: firstly, the integration of the MSA algorithm in conjunction with the Opposition Based Learning technique (OBMSA) for FS.Secondly, the integration of the SA and OBMSA will be covered in the second step.MSA becomes trapped in local optima, just like a lot of other metaheuristics do.Consequently, the SA seeks to keep the MSA algorithm out of local optima.The framework of OBMSASA is displayed in Figure 3, and Algorithm 2 expresses the framework of OBMSASA.

|Mantis Search Algorithm for Solving FS Problem
There are two primary phases to the suggested OBMSASA for handling the FS problem.Initialization, transformation function, K-nearest neighbor (KNN) classifier, and assessment are the steps that make up the first stage.Additionally, the opposition-based learning method is used to raise the standard of the solution.On the other hand, the hybridization of the first stage with simulated annealing occurs in the second stage.

|Initialization
In this stage, a random population of agents for searching ( mantis) is formed.Every Mantis in the population stands for a potential solution.A potential solution is represented by a vector of dimensions .The size of a dataset's features is represented by d.The vector's values can all be either 1 or 0, signifying whether or not the feature is chosen.

|KNN
A Metaheuristic algorithm is used to produce newly discovered samples for dimensionality reduction.This new sample is not labeled yet.The primary goal of classification is to assign a class to newly discovered samples that lack a label for a particular class.In this context, numerous classifiers have been employed.KNN classifier [41] is among the most popular.Because it is simple to use and only requires one parameter, K, to specify a number of neighbors, this can be shown in Figure 4. To allow the classifier to recognize the unique characteristics of the data, the relationship between the values of the attributes, and the label of the class, we must first train the classifier.We are unable to determine whether or not the classifier is successfully trained in real life.Thus, it is standard procedure to reserve some data that is labeled as a training dataset and some as a dataset for testing.The new database on which the classifier has not been trained still poses a dilemma.It is still necessary to ensure that the classifier performs well, and work has been done to use the training database to train the classifier to achieve better performance, the testing dataset is preserved at a distance.Each of the samples for the dataset being tested needs to use Euclidean distance to find its K nearest neighbors from the dataset used for training, as shown in Figure 5.The  ℎ ℎ attribute in the sample from the dataset being tested is called   ,.One measure that shows how well the classifier predicts the class labels is the accuracy of classification.This can be calculated by dividing the proportion of accurate occurrences by the overall number of instances in the data set being tested.Conversely, the classification rate of error is calculated by dividing the overall number of cases detected in the testing dataset by the percentage of inaccurate instances.

|Assessment
The accuracy rate of classification derived from the KNN classifier is used to evaluate a solution's efficiency.A solution that optimizes the rate of classification accuracy is the optimal one.There will be two competing objectives in the fitness function used to evaluate the mantis population: maximizing one and minimizing the second.To reduce the two goals, the function of fitness will focus on decreasing the rate of error in classification instead of accuracy.The function of fitness is made with the following two goals in mind: here (1 − )represents the classification rate of error and  denotes the accuracy of the classification determined using KNN.The total number of features selected is shown by the variable |   #|.The size of a dataset's features is denoted by |D|.The two weight parameters for every aim are denoted by ℎ 1 and ℎ 2 .Decreasing the error of classification (the maximization of classification accuracy) takes precedence over decreasing the quantity of the chosen attributes.

|Binary Representation (Transformation Functions)
The searching agents, or solutions, of metaheuristic approaches, are expressed by real values and are intended to solve continuous optimization issues.For the algorithm to adjust to the nature of FS, the search agent values need to be converted into binary values.For this, the transformation functions are accountable.The S and V shapes are two of the most significant and commonly used transformation functions [42].
Table 1.S-shape and V-shape transfer functions.

TF Name
Mathematical Formula TF Name Mathematical Formula

|Opposition Based Learning Method (OBL)
The initialization step for the majority of metaheuristic algorithms usually starts with a population that is produced randomly within the boundaries and has no prior knowledge of the search area.On the other hand, we can identify better solutions in the process of initialization potentially lessen the computing load, and improve global convergence if the starting population is produced by better methods.It is more beneficial to take into account both opposition and unpredictability, as opposed to pure randomness.An effective method for finding solutions in the opposite direction of the existing placements is opposition-based learning (OBL), which helps to improve an algorithm's search capabilities.The OBL strategy's primary concept is described as follows: Both  and  are expressing the lower and upper bounds,    is the original value.Actually, and  ′   denotes the opposite value of    .The OBL technique can be applied at several updating stages to improve search ability as well as to improve the initialization population's quality.

|MSA and SA Hybridization
An algorithm called simulated annealing (SA) was introduced as a single solution to mimic the annealing process of metals.Metals are hardened by annealing, a physical process that involves heating the metal to an elevated temperature and allowing it to cool gradually.Initial temperature (  0 ), final temperature (   ), and cooling rate  are the initial parameters of SA.The maximum temperature is called the beginning temperature, and it is progressively lowered to the end temperature by the rate of cooling.The algorithm starts with a randomly generated solution.It depends on the present solution being gradually improved.Iterations select a new adjacent solution to the present solution.If the new, nearby solution proves to be superior, the current one is changed accordingly.Additionally, if an adjacent solution proves to be superior, the best solution is upgraded.When the target temperature has been met, the algorithm terminates.
To overcome the LO, SA is an algorithm based on probabilities that can accept the worst solution instead of the current surrounding solution.The likelihood that a worse alternative will be accepted depends on the degree to which it is worse and on how much the current temperature value is, which is as follows: V-shape S-shape where ∆ represents the fitness differences between the existing fitness and the new fitness that results from the surrounding solution.The temperature right now is T. The value of the exponent for raising e to is ), and  is the function of the exponential form.
The SA is used to enhance the MSA algorithm's effectiveness and keep it from entering local optima to achieve even greater gains.Because the SA algorithm usually admits better solutions, it is capable of accepting worse ones depending on how likely they are to be worse and how high the temperature is at the moment.The SA algorithm can now start after the MSA iteration is complete.SA begins with a mantis location bunny created from the initial hybridization of OBL with MSA (OBMSA), as opposed to a solution generated at random.
The overall algorithm of the proposed can be expressed by the following Algorithm (2).

|Results
This section examines the comparative outcomes of the proposed improved algorithm and most of the wellknown meta-heuristic (MH) algorithms.Using MATLAB R2023 (b) on a personal computer running Windows 7/64-bit / Intel Core (TM) i7-3840QM, 2.80 GHz, and 16 GB RAM, all the algorithms used were coded and run in the same way.Each approach is implemented in the MATLAB 2023b environment.

|Dataset Description
The efficiency of the suggested OBMSASA algorithm was verified by experiments and assessments using twenty-one datasets as benchmarks.The UCI repository is the source of the datasets [43].We concentrate on datasets that have a high dimension size (number of features), a high number of occurrences, or both.There are anywhere from 72 to 14980 instances.Additionally, there are 10 and 7129 features.Table 2 presents an explanation of the dataset.

|Influence of V-shaped and S-shaped Transfer Functions on a Number of Recently Metaheuristic Algorithms
Recently, a lot of new metaheuristic algorithms have been introduced in the last few years, achieving great success in solving global optimization problems.Accordingly, many researchers have taken advantage of the power of these algorithms to solve FS problems, due to their power in obtaining many appropriate solutions to the problems in a reasonable time.).These algorithms are converted to binary versions to meet the nature of the feature selection problem.The transformation from continuous to binary is done by using nine transformation methods: eight functions are introduced in Table 2, and the last method is a threshold method.After applying these transformation functions the results will be navigated to select the best transfer function, which achieves the best result.A sample of seven feature selection datasets is used to evaluate performance.Tables (3-9) mirror the performance of the aforementioned algorithm, in order.Then tracked the algorithms' performance on the FS problem in terms of fitness so that a winner could be chosen, to be the place of study.
Table 3 indicates the influence of different S-shape and V-shape functions, and also threshold method is used besides both S, and V shapes for transformation, so we operate the MSA algorithm on nine transformation methods.The first 4 columns represent the effect of four S-shape functions, while the following 4 columns represent the V-shape formulas, and the last column in the table represents the effect of the threshold method on MSA on seven selected datasets.The total average of fitness (Total AVG), and total standard deviation (Total STD) of each transformation method are calculated, by observing the result, we can say that the threshold method is the best transformation method for MSA.  Figure 6 indicates the influence of the nine transformation methods on the performance of MSA, in terms of total average.The threshold method outperforms all its peers.The total average STD is figured out in Figure 7.
Looking closely at the results obtained from Table 4, we can say that the threshold method is the best transformation method for NOA in terms of total average.Figure 8 indicates the influence of the ninth transformation method on the performance of NOA, in terms of total average fitness.The threshold method outperforms all its peers.The total average STD is figured out in Figure 9.
Looking closely at the results obtained from Table 5, we can say that the threshold method is the best transformation method for YDSE in terms of total average.Figure 10 indicates the influence of the nine transformation methods on the performance of YDSE, in terms of total average fitness.The threshold method outperforms all its peers.The total average STD is figured out in Figure 11.
Results from Table 6 indicate that a threshold method performs better than other functions for algorithm SWO. Figure 12 indicates the influence of the nine transformation methods on the performance of SWO, in terms of total average fitness.The threshold method outperforms all its peers.The total average STD is figured out in Figure 13.
Results from Table 7 indicate that the second V-Shape formula performs better than other functions for algorithm SCHO in terms of the average fitness.Figure 14 indicates the influence of the nine transformation methods on the performance of SCHO, in terms of total average fitness.The V-Shape second transformation method outperforms all its peers.The total average STD is figured out in Figure 15.
Results from Table 8 indicate that the first V-Shape formula performs better than other functions for algorithm EDO in terms of the average fitness.

SCHO
Figure 16 indicates the influence of the nine transformation methods on the performance of EDO, in terms of total average fitness.The V-Shape first transformation method outperforms all its peers.The total average STD is figured out in Figure 17.
Results from Table 9 indicate that the threshold formula performs better than other functions for algorithm ZOA in terms of average fitness.Figure 18 indicates the influence of the nine transformation methods on the performance of ZOA, in terms of total average fitness.It is clear that the S-shape fourth transformation method outperforms all its peers.The total average STD is figured out in Figure 19.
By tracking past performance, the Algorithm mantis search was chosen to be the subject of the study.

|Tuning of Parameters
Any algorithm's performance can be impacted by how its parameter values are configured.In practice, a lot of experiments are needed to investigate the impact of parameter adjustment on the suggested algorithm.As a result, the parameter values are determined by trial and error or by following the advice of earlier research.
The suggested algorithm's efficacy is contrasted with that of other algorithms already in use.We evaluate each method over twenty separate runs.Furthermore, for every experiment, a maximum number of iterations is fixed at thirty.We observe that adding more search agents doesn't substantially change the findings, so we limit the number of mantis or search agents to 5. In this case, 80% of every dataset is used for training while the remaining 20% is used for testing, as indicated by [50][51][52][53].To guarantee the same ranking of the instances number across all algorithms, the dataset's instances were randomly seeded before splitting.Compared to other classifiers, the KNN classifier using the Euclidean distance measurement has just one parameter (k) that needs to be tuned, making it a popular wrapping approach.The optimal outcomes are attained when k = 5, and other earlier research also supports this number [54][55][56].10.

|Accuracy
It is a measure of efficiency that assesses how well the classifier chooses the best subset of attributes after executing the algorithm N rounds.One can compute the optimal categorization accuracy as follows: where   * represents the optimal rate of classification at run  after the algorithm has been run M rounds. represents the method's  ℎ run, and  =1,...,N.One can calculate the average accuracy of classification as follows: The final accuracy can be expressed by .

|The Selected Features Number
It is about how big the features that are chosen for a solution are.Here we consider two distinct measurements.The first metric, called Selected Features (SF), measures how big the chosen features are in a solution that has the highest fitness value.The Average Selected Features (   ), the second metric, can be computed in the manner described below.
The best subset of features obtained by the algorithm is expressed by   .

|Fitness Function
There are three fitness metrics used: best value, average value, and worst value.The lowest fitness value reached after executing the algorithm N times is represented by the best value or best fitness.
where    * is the lowest fitness value reached when running the algorithm N times at run .
The total of all fitness values obtained by executing the algorithm N times is represented by the average fitness (), which is then divided by the total number of runs (N).The calculation for it is as follows: The greatest fitness value attained after executing the algorithm M times is known as (), and it can be calculated as follows: where   represents the final value of fitness obtained.

|Fitness Function
The current subsection examines how well the HHOBSA algorithm performs in comparison to a number of other metaheuristic algorithms, including CSA, CCWOA, OBSSA, TPGWO, EOSA, SAMPA, HHOSA, and MSA.The optimal, average, and worst fitness values as well as the standard deviation attained by each method are listed in By observing , we notice that the proposed OBMSASA outperforms its peers in terms of the convergence curve to the average fitness in Figs.(20)(21)(22)(24)(25)(26)(27)(28)(29)(30)(31)(32)(33)(34)36,39), so, the proposed algorithm came in first place, outperforming 16 datasets out of 21 datasets, achieving the minimum value of the total average fitness, which is 0.092476.Furthermore, EOSA comes in second place with a value of 0.097641, concerning the total average fitness, but it achieves superiority on only one dataset (Spambase).Exploring the numerical results in Table 11, we can see that, although TPGWO reaches the best total average fitness on three datasets (DNA, Waveform, Optdigits), it comes in the third rank.CCOWA comes in the last rank with a value of 0.124347.Fig. 24 shows that TPGWO Algorithm came in first place, followed by Algorithm OBSSA, followed by Algorithm HHOSA in third place, then Algorithm EOSA in fourth place, and the proposed algorithm OBMSASA came in fifth place.By analyzing Fig. 36 the superiority went to EOSA, followed by TPGWO, then HHOSA in third place, MSA in fourth place, followed by SMAMPA, and the proposed OBMSASA came in sixth place.Fig. 38 shows that TPGWO converges to the best solution, followed by EOSA, then SMAMPA, and MSA came in fourth place, while OBMSASA came in fifth place.Moving to Fig. 39, HHOSA is the best one and converges to the best solution, Algorithm OBMSASA is exhausted in reaching the best solution, occupying second place in achieving it, while CCWOA is followed to arrive at the best solution.Lastly, Fig. 41 shows that TPGWO outperforms all algorithms, then EOSA follows it, followed by OBMSASA in the third place.The total average of best values of all algorithms for all datasets, which is shown in Fig. 42(a, b), goes to the proposed OBMSASA with a value of 0.063793, while EOSA came in second place, and CCWOA comes in the last rank with a value of 0.081452.Moving to fig.43(a, b), we can say that the proposed OBMSASA outperforms all other algorithms, achieves the total average fitness on all datasets with a value of 0.092476, while EOSA came in second place, and CCWOA comes in last rank with a value of 0.124347.The superior results obtained by OBMSASA result from the advantages of the integration of the SA, which boosts the convergence of OBMSASA, and the opposition-based learning method helps the algorithm to explore the search space more effectively, reaching to best solution areas.

|Accuracy
within this subsection, we discuss the comparison between the proposed method and all other algorithms in terms of classification accuracy.According to Table 12, four criteria are used for assessment best classification accuracy, average classification accuracy, worst classification accuracy, and STD.The numerical results in the following table indicate that OBMSASA is the best model for achieving the best average classification rate.Figure 43 displays that the proposed OBMSASA achieves the total best classification accuracy rate with a value of 0.879874, followed by HHOSA, and EOSA comes in last place with a rate of 0.557036.Figure 44 shows that sixteen out of twenty-one datasets, OBMSASA is recording the maximum total average classification accuracy with a value of 0.909478667, followed by TPGWO, and then EOSA comes in the last rank with a value of 0.655656238.

|Selected Features
The minimization of features is a crucial goal that is pursued to uphold the optimization of classification precision.It has been observed previously that OBMSASA surpasses alternative algorithms based on the appropriateness of values related to fitness and accuracy in classification.At this juncture, it is imperative to evaluate the capacity of OBMSASA to reduce features when juxtaposed with other metaheuristics.Table 13 presents the quantitative outcomes of the specified criteria regarding features.Performance monitoring incorporates four distinct criteria: the optimal subset of selected features, the average subset of selected features (AVG), the suboptimal subset of selected features, and the standard deviation (STD).Based on the numerical findings, it can be inferred that HHOSA demonstrates the most favorable average subset of selected features, registering a value of 12.280952.Following HHOSA, CSA performs well, while the proposed OBMSASA ranks third with a value of 25.728573.The overall average of selected features across all datasets is illustrated in Figure 45.Consequently, OBMSASA exhibits promising outcomes with the average selected features.

|Time Execution for OBMSASA
Table 14 indicates how much time each algorithm takes to execute all the datasets, from the numerical analysis, TPGWO comes in first place, followed by HHOSA, and OBMSASA comes in sixth place.

|Conclusions and Future Work
In the present investigation, a hybrid methodology integrating the Mantis Search Algorithm with the Opposition-based learning technique and SA algorithm OBMSASA is utilized to explore the optimal subset of features through a wrapper method.The algorithm put forward incorporates the use of KNN due to its widespread application, simplicity in implementation, and the presence of a solitary parameter for adjustment.The OBMSASA technique is utilized for 21 standardized datasets, with the potential for their dimensions to extend into the thousands.Within the context of the Feature Selection (FS) issue, the decision must be made whether to include a particular feature, resulting in a binary problem.Consequently, an adaptation function is integrated into the original MSA algorithm.The examination of the impact of V-shaped functions, S-shaped functions, and the threshold method on the proposed algorithm is initiated.Firstly, the selection of the Mantis Search algorithm was based on its demonstrated efficacy compared to numerous recently developed algorithms that have not been previously applied to address the research problem at hand, thus justifying its inclusion as the focal point of study.the threshold method demonstrates superior performance and rapid convergence towards the optimal solution throughout iterations in comparison to the S-shaped and V-shaped approaches.Secondly, to mitigate the risk of encountering local optima, the incorporation of the Oppositionbased learning (OBL) technique within the framework of MSA is implemented in the initialization phase.
OBL is used to better improve the spread of sample solutions in the research area.Thirdly, incorporating algorithm SA into algorithm MSA enhances the MSA algorithm's capacity to achieve optimal solutions as the SA algorithm is the local search component within the framework.The analysis of OBMSASA's performance is thoroughly scrutinized with seven highly esteemed metaheuristics published in this publication.The findings demonstrated the excellence of the suggested algorithm and its capacity to effectively address the issue under its remarkable skill in navigating the trade-off between exploration and exploitation, evading local optima, and enhancing population diversity.This superiority results from observing how well the algorithm performs with a number of parameters, including fitness, classification accuracy, and the chosen features.
There are four numerical results for each criterion (best, worst, average, and STD).Future research should evaluate the suggested algorithm's performance using a variety of classifiers, such as support vector machines, and other classifiers.Another noteworthy development is the application of the FS classification to financial data, and the Internet of Things.One of the main drawbacks of the proposed algorithm is the computational time.To leverage computational resources and minimize processing time, we want to create a parallel version of OBMSASA, which will improve the algorithm's performance when handling large data dimension sizes.

Figure 4 .
Figure 4.The representation of a new possible Mantis labeling process.

Figure 6 .
Figure 6.MSA total average for the average fitness values.Figure 7. MSA total average for STD values.

Figure 8 .Figure 9 .
Figure 8. NOA total average for the average fitness values

Figure 14 .
Figure 14.SCHO total average of fitness of values Figure 15.SCHO total average of fitness values.

Figure 18 .
Figure 18.ZOA total average of fitness values for 7 datasets.

Figure 19 .
Figure 19.ZOA total average of fitness values for 7 datasets.

Figure 20 .
Figure 20.The average of fitness values for Fri_c0_1000_10 dataset.

Figure 21 .
Figure 21.The average of fitness values for Page blocks dataset.

Figure 22 .
Figure 22.The average of fitness values for Clean1 dataset.Figure 23.The average of fitness values for DNA dataset.

Figure 24 .
Figure 24.The average of fitness values for Wisconsin dataset.

Figure 25 .
Figure 25.The average of fitness values for Segment dataset.

Figure 26 .Figure 27 .
Figure 26.The average of fitness values for Fri_c1_1000_10 dataset

Figure 30 .
Figure 30.The average of fitness values for the WDBC dataset.Figure 31.The average of fitness values for the Glass dataset.

Figure 31 .
Figure 30.The average of fitness values for the WDBC dataset.Figure 31.The average of fitness values for the Glass dataset.

Figure 28 .
Figure 28.The average of fitness values for Ionosphere dataset.

Figure. 29 .
Figure.29.The average of fitness values for SpectEW dataset.

Figure 32 .
Figure 32.The average of fitness values for the Australian dataset.

Figure 33 .
Figure 33.The average of fitness values for the Fri_c1_1000_25 dataset.

Figure 34 .
Figure 34.The average of fitness values for the Fri_c2_1000_25 dataset.

Figure 35 .
Figure 35.The average of fitness values for the Spambase dataset.

Figure 36 .Figure 37 .
Figure 36.The average of fitness values for Eeg-eye -state dataset

Figure 38 .
Figure 38.The average of fitness values for the Leukemia dataset.

Figure 39 .Figure 40 .
Figure 39.The average of fitness values for the Pendigits dataset

Figure 41 (
Figure 41 (a).The total of best values for all datasets.

Figure 42 (
Figure 42 (a).The total average of fitness values for all datasets.

Figure 42 (
Figure 42 (b).The total average of fitness values for all datasets.

Figure 43 .
Figure 43.The total average of best accuracy for all datasets.

Figure 44 .
Figure 44.The total average of accuracy values for all datasets.

Table 2 .
An explanation of the datasets.
Seven recently published algorithms that have not yet been used to solve feature selection problems were selected for navigation to select the fittest one to be the basic core of

Table 3 .
Influence of V-shaped and S-shaped transfer functions on MSA.
BoldValues indicate the best average fitness values.

Table 4 .
Influence of V-shaped and S-shaped transfer functions on NOA.

Table 5 .
Influence of V-shaped and S-shaped transfer functions on YDSE.

Table 6 .
Influence of V-shaped and S-shaped transfer functions on SWO.

Table 7 .
Influence of V-shaped and S-shaped transfer functions on SCHO.

Table 8 .
Influence of V-shaped and S-shaped transfer functions on EDO.
Figure 16.EDO total average of fitness values for 7 datasets.Figure 17.EDO total average of fitness values for 7 datasets.

Table 9 .
Influence of V-shaped and S-shaped transfer functions on ZOA.
BoldValues indicate the best average fitness values.
A comparison is made between the suggested algorithm OBMSASA and some well-known methods, such as hybrid Harris Hawks with Simulated Annealing(HHSA)[18], hybrid Slime Mould algorithm with Marine Predators algorithm(SMAMPA)[19], Two-phase Mutation Gray wolf algorithm(TMGWO)[17]. hybrid Opposition Based with Salp Swarm algorithm (OBSSA)[21], hybrid Crossover and Cooperative with Whale Optimization algorithm (CCWOA)[22], Hybrid Equilibrium Optimizer with Simulated Annealing (EOSA)[16], Sine Cosine Algorithm(SCA), and the standard Mantis Search Algorithm(MSA)[23].All previous algorithms used in the comparison are in the form of binary versions.The parameter configuration is introduced in Table

Table 10 .
The parameter configuration.

Table 12 .
The average value of fitness (AVG) is what's gained when all of the dataset's features are chosen.It aids in quantifying the progress achieved by every method in the table.Based on the table's results, it is evident that in the majority of the datasets, OBMSASA outperforms all other methods.When compared to other algorithms, it can achieve more promising solutions.In 16 of the 21 datasets, we can see that OBMSASA can outperform its counterparts.Figures 20-40 express the convergence curves for all algorithms on all 21 datasets.

Table 11 .
The results of best fitness, average fitness, worst fitness, and STD for all algorithms.
Bold Values indicate the best average fitness values

Table 12 .
The results of classification accuracy criteria for all algorithms.

Table 13 .
The results of best SF, average SF, worst SF, and STD for all algorithms.Figure 45.The total average of selected features for all datasets.

Table 14 .
The time execution for all algorithms.