Advanced Forecasting of Maize Production using SARIMAX Models:
An Analytical Approach
Gregorius Airlangga
Information System Study Program, Atma Jaya Catholic University of Indonesia, Jakarta, Indonesia Email: [email protected]
Correspondence Author Email: [email protected]
Abstract−Agricultural production forecasting is crucial for food security and economic planning. This study conducts a detailed analysis of maize production forecasting using the Seasonal Autoregressive Integrated Moving Average (SARIMA) model, emphasizing the applicability of time-series models in capturing complex agricultural dynamics. Following a comprehensive literature review, the SARIMA model was justified for its ability to integrate seasonal fluctuations inherent in agricultural time series. Optimal model parameters were meticulously determined through an iterative process, optimizing the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). The best-performing SARIMA(1, 1, 2)x(2, 2, 2, 12) model achieved an AIC of 339914.85450182937 and a BIC of 339950.64499813004, indicating its strong fit to the historical data. This model was applied to a historical dataset of maize production, providing forecasts that align closely with actual production trends on a short-term basis. Notably, the model's short-term predictions for the subsequent year showed less than a 2% deviation from the actual figures, affirming its precision. However, long-term forecasts revealed greater variability, underscoring the challenge of accounting for unforeseen environmental and economic factors in agricultural production systems. This research substantiates the efficacy of SARIMA models in agricultural forecasting, delivering strategic insights for resource management. It also points towards the integration of SARIMA with other variables and advanced modeling techniques as a future avenue to enhance forecasting robustness, particularly for long-term projections. The findings serve as a valuable resource for policymakers and stakeholders in optimizing decision-making processes for agricultural production.
Keywords: Forecasting; Maize Production; Sarimax; Statistics; Time Series
1. INTRODUCTION
The challenge of accurately forecasting agricultural yields, particularly for key crops like maize, is of paramount importance in the realms of global food security [1], economic planning [2], and agricultural policy-making [3].
Maize, as a critical staple and a major agricultural commodity, holds a significant place in the world's food supply chain [4]–[6]. Its versatility as a food source, animal feed, and bioenergy crop underscores the substantial economic and social implications of its production levels. This research zeroes in on maize due to its unique status as a linchpin in the food security of numerous nations, where it plays a dual role in subsistence agriculture and as a cash crop. The production of maize is influenced by a myriad of factors including climatic conditions, agricultural practices, economic trends, and technological advancements [7]–[9]. The necessity for reliable forecasts is further intensified by the volatility of global markets and the increasing incidence of extreme weather events, both of which can have a disproportionate impact on maize compared to other less globally integrated crops. In contrast to other crops, maize's extensive genetic diversity and adaptability to different climates make it a complex subject for forecasting. This study seeks to address the complexities involved in forecasting maize production by employing advanced statistical methodologies, specifically focusing on the use of the Seasonal Autoregressive Integrated Moving Average with eXogenous variables (SARIMAX) model.The accurate prediction of maize yields is not just a matter of economic interest but also a crucial factor in ensuring food security and managing agricultural supply chains [10]–[12]. The inherent unpredictability in agricultural production, driven by environmental fluctuations and market dynamics, presents significant challenges in forecasting[13]–[15]. Traditional forecasting models, predominantly time-series models like ARIMA, have provided foundational insights but often fall short in capturing the complex, seasonal, and multi-faceted nature of agricultural data [10]. The introduction of more sophisticated models capable of integrating external variables and capturing seasonal patterns is thus a critical need in this field [16]. A review of existing literature reveals a spectrum of methodologies employed in agricultural forecasting[17], [18].
Agricultural forecasting has undergone a significant transformation over the years, evolving from basic statistical methods to more complex time-series models [21]. This progression reflects the increasing need for accurate and reliable predictions in agriculture, driven by the complexities of environmental, technological, and economic influences on crop production. The critical role of forecasting in ensuring global food security and effective agricultural management is a recurring theme in the literature. Initial forays into agricultural forecasting largely relied on time-series models like the Autoregressive Integrated Moving Average (ARIMA). A study by [22] on wheat production effectively employed ARIMA, demonstrating its capability in stable environmental settings. However, they noted significant limitations in the model's ability to adapt to sudden climatic shifts or policy changes, leading to potential inaccuracies under volatile conditions. Similarly, [23] observed the effectiveness of ARIMA in capturing general trends in soybean yields. Nonetheless, the study highlighted the model's inadequacy in integrating external market or environmental factors, a critical shortfall during periods of economic or climatic turbulence. The limitations of traditional ARIMA models led to the exploration of more
sophisticated approaches. The work from [24] introduced Seasonal ARIMA (SARIMA) for crop yield predictions, marking a significant improvement in handling data with inherent seasonal patterns. Despite this advancement, their model did not incorporate external variables, limiting its effectiveness in scenarios where such factors are pivotal. Concurrently, the work of [25] underscored the impact of external factors, like weather anomalies and market dynamics, on crop yields. Their findings reiterated the need for forecasting models that could assimilate these external influences for more accurate predictions. The SARIMAX model, which extends SARIMA by including exogenous variables, has recently garnered attention in agricultural forecasting. A notable example is the study by [26], who applied SARIMAX to forecast rice production. By incorporating climatic data as exogenous variables, they achieved a significant improvement in predictive accuracy. However, their study was confined to rice production and did not explore its application to other crops or diverse geographical contexts. Despite the progress in modeling techniques, there remains a notable gap in applying SARIMAX to maize production forecasting.
While the utility of ARIMA models in predicting crop yields has been well-documented in studies such as those by [19], these models have limitations, particularly in addressing seasonal variability and external influences.
The advent of machine learning and data-driven approaches, as explored by [20], has opened new avenues for agricultural forecasting, offering enhanced accuracy but often at the cost of increased data and computational requirements.Our study introduces the SARIMAX model as a novel approach in the context of maize production forecasting. This model extends the traditional ARIMA framework by incorporating seasonal components and exogenous variables, thereby offering a more comprehensive analytical tool. Our research involves a thorough preprocessing of historical maize production data, detailed seasonal decomposition to understand underlying patterns, and an exhaustive grid search with cross-validation for optimal model configuration.
Maize, as a globally significant crop, is influenced by a wide array of factors, including global market trends, local agricultural practices, and the impacts of climate change. This complexity necessitates a forecasting model that can encompass both seasonal patterns and a diverse range of external factors. Our research addresses this gap by applying SARIMAX to maize production forecasting. We expand upon the methodologies employed in previous studies by integrating seasonality and a broader spectrum of external factors relevant to maize production. This approach is particularly innovative in the context of maize, offering a more comprehensive and accurate forecasting tool. By doing so, our study not only contributes to the agricultural forecasting literature but also provides a valuable tool for policymakers, farmers, and stakeholders in the agricultural sector. In addition, the application of SARIMAX in maize production forecasting is a relatively unexplored area in agricultural economics, positioning our study at the forefront of innovative forecasting methodologies. By integrating SARIMAX, we aim to address the gaps in traditional forecasting methods, particularly in accounting for seasonal variations and external influences. This study is geared towards enhancing the decision-making capabilities of stakeholders in the agricultural sector, including policymakers, farmers, and supply chain managers, ultimately contributing to more effective agricultural planning and food security strategies.The paper is structured as follows:
Section 2 presents in-depth exploration of the materials and methods, detailing the data collection, preprocessing techniques, and the specifics of the SARIMAX model. Section 3 offers a thorough analysis of the results, comparing the model’s forecasts with actual production data and discussing the implications of these findings.
Section 4 concludes the paper with a summary of the key contributions, potential applications of our methodology in agricultural forecasting, and suggestions for future research directions in this field.
2. RESEARCH METHODOLOGY
The research methodology as presented in the figure 1 provides a systematic and rigorous approach to forecasting maize production using the SARIMAX model. The process, from data preparation to model evaluation, ensures the reliability and validity of the forecasts, addressing the complex nature of agricultural data and its influencing factors. Using the provided activity diagram, our research methodology begins with the initial phase of "Start Research," where the overarching goal is to forecast maize production using the SARIMAX model. Subsequently, the process proceeds to the "Data Collection and Preparation" stage, involving critical steps such as data collection from the World Bank dataset, addressing missing values, handling outliers, applying data transformation techniques, and segmenting the data with an annual frequency. Following this, the "Seasonal Decomposition Analysis" phase utilizes a seasonal decomposition method to dissect the dataset into distinct components, enabling the identification of trends, seasonality, and residuals within the agricultural data. Finally, in the "Model Evaluation and Forecasting" stage, the SARIMAX model undergoes rigorous validation via diagnostic tests like the Ljung- Box test and Augmented Dickey-Fuller test, aided by visual analysis using ACF and PACF plots. Further refinement involves fine-tuning SARIMAX model parameters, allowing for the generation of forecasts with associated confidence intervals. These forecasts are evaluated for accuracy using metrics like RMSE, MAE, and MAPE, and different model configurations are compared using AIC and BIC criteria. This comprehensive research methodology ensures a systematic and robust approach to maize production forecasting, addressing the intricate nuances of agricultural data and providing reliable insights for agricultural planning and policy formulation.
Figure 1. Activity Diagram Sarimax 2.1 Data Collection and Preparation
The dataset is sourced from world bank dataset [27], covering a period from 1961 to 2023. The dataset utilized in this research provides a comprehensive global perspective on maize production, aligning with the annual cycle of cultivation to reflect the natural rhythm of agricultural practices. It includes crucial variables like production volumes, climate data (temperature, rainfall), and economic indicators (market prices, policy changes), offering a holistic view of the factors influencing maize yields. In preparing this dataset for time-series analysis, several key preprocessing steps are undertaken: Missing values are addressed through mean imputation, balancing the need for data continuity with minimal bias introduction; outliers, identified via the interquartile range method, are adjusted to maintain data integrity without skewing the overall trends; data transformation, particularly first-order differencing, is applied to stabilize the time series mean, ensuring stationarity essential for the SARIMA model;
finally, the data is segmented with a 'YS' frequency, mirroring the annual maize production cycle, thereby encapsulating the inherent seasonality of agricultural production. This rigorous data preparation process is vital for the effective application of the SARIMA model, enabling it to accurately interpret and forecast the complex dynamics of maize production, which is critical for informed agricultural planning and policy formulation.
2.2 Seasonal Decomposition Analysis
In this research, we delve into the intricate dynamics of maize production using the additive seasonal decomposition method, which meticulously segregates the dataset into distinct trend, seasonal, and residual components. This decomposition is pivotal in elucidating the cyclical patterns inherent in agricultural data, thereby laying a foundational understanding for the advanced modeling that follows. The chosen methodological protagonist for our study, the Seasonal Autoregressive Integrated Moving Average with eXogenous variables (SARIMAX), stands out for its proficiency in assimilating both the intricate seasonal fluctuations and the external variables - a characteristic quintessential for modeling the agricultural domain, where climatic conditions and economic policies are key influencers. Opting for SARIMAX over the traditional ARIMA model is a strategic decision, driven by SARIMAX's superior ability to navigate the complex seasonal variations that are characteristic of agricultural cycles. Our methodological rigor extends to the meticulous configuration of the SARIMAX model.
A comprehensive grid search sprawls across a spectrum of ARIMA (p, d, q) and seasonal (P, D, Q, S) parameters, meticulously scouring for the optimal combination that resonates with our dataset. This exhaustive search is not just about finding the best fit but about ensuring that every plausible combination is evaluated for its predictive prowess.
The robustness of our model is further fortified through a rolling window cross-validation strategy. This approach, rather than relying on a static portion of data for validation, progressively shifts the training and testing windows, thereby offering a more thorough and realistic assessment of the model's predictive capabilities. It’s a
testament to the model's adaptability and reliability in the face of evolving data patterns. Once the optimal parameter set is unearthed from the depths of the grid search, it becomes the cornerstone for the model fitting process. This best-performing set of parameters is meticulously applied to train the SARIMAX model, ensuring that the model is finely tuned to the nuances of our comprehensive dataset. This fitting process is not merely about statistical calculations; it is about weaving the mathematical intricacies with the real-world phenomena of maize production, aiming to produce forecasts that are not just numbers, but reflections of the agricultural landscape's potential future.
2.3 Model Evaluation and Forecasting
The process of building a robust time series forecasting model, such as the SARIMAX model for predicting maize production, is intricate and multifaceted. It involves not only the careful selection and integration of model components but also rigorous diagnostic testing to ensure the model's reliability and accuracy. Before delving into the forecasting procedure, it’s crucial to validate the model using diagnostic checks. The Ljung-Box test is one such method, which tests for autocorrelation in the residuals of the model at various lag lengths. Autocorrelation occurs when past values in the time series are correlated with future values. In a well-fitting model, we expect no autocorrelation; the residuals should be independently distributed. If the Ljung-Box test indicates significant autocorrelation, this suggests that the model may be missing some information that is captured by the lagged terms, and the model fit could be improved by incorporating additional lags.
The Augmented Dickey-Fuller (ADF) test is another critical diagnostic tool, which checks for stationarity in the time series data. A time series is stationary if its statistical properties, such as mean, variance, and autocorrelation, are constant over time. Stationarity is a requisite for many time series forecasting models because it implies that the underlying mechanisms generating the time series are stable and predictable. The ADF test assesses whether a unit root is present in a time series, which would indicate non-stationarity. If the test statistic is less than the critical value, the null hypothesis of a unit root can be rejected, confirming that the series is stationary.
Alongside these tests, visual plots such as the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots are employed. The ACF plot shows the correlation of the time series with its own lagged values, while the PACF plot shows the correlation of the time series with its own lagged values, but after removing the contributions of earlier lags. These plots are instrumental in identifying the appropriate lag values for the AR and MA components of the model. A sharp cut-off in the PACF plot suggests an AR term, while a gradual decline in the ACF plot suggests an MA term.
Once the model passes these diagnostic checks, forecasting can proceed. The forecasting procedure involves using the SARIMAX model to estimate future maize production over a predetermined horizon. This process considers the uncertainty inherent in agricultural time series, which can stem from myriad sources, including environmental factors, market dynamics, and policy changes. Confidence intervals are generated around the forecasts to quantify this uncertainty, providing upper and lower bounds within which the actual future values are expected to lie with a given level of confidence. To assess the model’s forecasting accuracy, various metrics can be used. The Root Mean Square Error (RMSE) measures the model's accuracy by computing the square root of the average of the squares of the errors, which are the differences between the predicted and actual values. The Mean Absolute Error (MAE) is a measure of the average magnitude of the errors in a set of forecasts, without considering their direction. The Mean Absolute Percentage Error (MAPE) expresses the average absolute error as a percentage of the actual values, providing a relative measure of error. Lower values of RMSE, MAE, and MAPE indicate higher accuracy and reliability of the model’s forecasts. These metrics provide a comprehensive evaluation of the model's performance, reflecting not only the magnitude of the errors but also their significance in the context of the data being modeled.
In addition to accuracy metrics, the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are utilized to compare the performance of different model configurations. Both criteria are based on the likelihood function but also include a penalty that increases with the number of estimated parameters to discourage overfitting. The AIC is more focused on finding the model that best explains the data, while the BIC includes a stronger penalty for models with more parameters. Lower values of AIC and BIC are indicative of a better model fit, balancing the complexity of the model with its ability to explain the data. Through these rigorous statistical methods and diagnostic tests, the SARIMAX model's parameters are fine-tuned to capture the underlying patterns in the maize production data accurately. This meticulous process ensures that the forecasts generated are not only statistically sound but also practically relevant, providing actionable insights for agricultural planning, policy- making, and strategic decision-making in the agricultural sector. The combination of SARIMAX's capacity to incorporate external influences and its robust diagnostic backing makes it a potent tool in the arsenal of econometric modeling, particularly in fields where prediction accuracy is paramount.
2.4 Software Tool
Python, a versatile programming language that has become a mainstay in data science due to its simplicity and the powerful ecosystem of libraries it supports. For time series analysis, Python offers specialized libraries such as pandas for data manipulation and statsmodels for statistical modeling, each contributing a set of tools that streamline the process of analyzing and modeling complex datasets like the one depicted in the image. Pandas is
an open-source library providing high-performance, easy-to-use data structures, and data analysis tools for Python.
It excels in handling and manipulating structured data, like time series, which is essential for preparing datasets before any analysis. It allows for cleaning, transforming, and aggregating data, which is crucial for ensuring that the time series data fed into the model is accurate and representative of the underlying phenomena.
Statsmodels, on the other hand, is a library that enables users to explore data, estimate statistical models, and perform statistical tests. An important feature of statsmodels is the 'seasonal_decompose' function, which automates the process of decomposing a time series into its core components: trend, seasonality, and residuals.
The function provides a clear framework for understanding the underlying patterns in the time series data, which can be critical for forecasting and planning purposes. The 'seasonal_decompose' function applies classical decomposition methods, which assume that the time series is additive or multiplicative. In an additive time series, the components are added together to make the time series, whereas, in a multiplicative time series, the components are multiplied. The choice between additive and multiplicative decomposition is determined by the nature of the seasonal variations in relation to the trend. If the seasonal effect is proportional to the level of the trend, a multiplicative model is appropriate; if the seasonal effect is constant over time, an additive model is used.
Upon decomposition using 'seasonal_decompose', the analyst can clearly visualize and quantify the trend component, which shows the long-term progression of the data, smoothing out short-term fluctuations. Similarly, the seasonal component that captures the cyclical patterns within the data can be closely examined to understand the periodic peaks and troughs that occur at regular intervals. Lastly, the residuals, which represent the noise or random fluctuations not explained by the trend and seasonal components, can be analyzed to identify outliers or unexpected events. For predictive modeling, the SARIMAX model (Seasonal AutoRegressive Integrated Moving Average with eXogenous variables) can be utilized. This model is an extension of the ARIMA model, which is designed to capture both non-seasonal and seasonal trends in time series data. 'SARIMAX' comes from the integration of the SARIMA (Seasonal ARIMA) and the addition of the possibility to include exogenous variables or outside factors, which can influence the time series. This model is powerful in that it can account for the complex behaviors of seasonal data, making it particularly useful in agricultural planning, financial forecasting, and other seasonal markets. Using statsmodels to implement the SARIMAX model allows the analyst to not only account for the seasonality and trend but also to incorporate the impact of external variables that could potentially affect maize production, such as average rainfall, temperature changes, commodity prices, or policy changes. This inclusivity of external factors can significantly improve the accuracy of the forecasts generated by the model.
2.5 SARIMAX Model
The Seasonal Autoregressive Integrated Moving Average with eXogenous variables (SARIMAX) model is a sophisticated and powerful statistical tool used for analyzing and forecasting time series data, particularly when the data are influenced by seasonality and external factors. It extends the Seasonal ARIMA (SARIMA) model, which itself generalizes the ARIMA model by incorporating seasonality, by allowing for the inclusion of exogenous variables, or outside factors, which can have significant effects on the series being modeled. To understand SARIMAX, it’s essential to first grasp the ARIMA model, which combines Autoregressive (AR) and Moving Average (MA) models and integrates them (I) to make the time series stationary. A stationary time series is one whose properties do not depend on the time at which the series is observed, meaning its mean, variance, and autocorrelation are constant over time. Stationarity is crucial because many time series forecasting methods assume that the series is stationary. The AR component (p) refers to the number of lags of the dependent variable included in the model. AR models work on the premise that current observations are related to past observations. The (I) integration component (d) involves differencing the time series a certain number of times (d) until it is stationary.
The MA component (q) incorporates the dependency between an observation and a residual error from a moving average model applied to lagged observations.
SARIMAX enhances the ARIMA model by incorporating the seasonality (S) within the dataset. The (P, D, Q, S) components of SARIMAX represent the seasonal aspects of the ARIMA model. P is the order of the seasonal autoregressive part, which accounts for the relationship between an observation and its seasonal lags. D is the degree of seasonal differencing, which is necessary to make the series stationary on seasonal terms. Q is the order of the seasonal moving average part, accounting for the relationship between the observation and the residual error terms from a seasonal moving average model for past seasonal periods. S is the length of the seasonal cycle (e.g., S=12 for monthly data with an annual cycle). The true power of SARIMAX lies in its eXogenous component, which allows the model to include external variables. These variables can be anything believed to affect the time series externally, such as economic indicators, weather data, or even policy changes. For instance, in agricultural forecasting, exogenous variables such as rainfall, temperature, or commodity prices can be included to improve the model's predictive accuracy.
In mathematical terms, the SARIMAX model can be written as a combination of polynomials, one for the non-seasonal components and another for the seasonal components, where the polynomials are applied to the data, differenced data, lagged error terms, and exogenous variables. The non-seasonal AR and MA parts of the model are defined by the polynomials Φ(L) for the AR part and Θ(L) for the MA part, where L is the lag operator. The seasonal parts are similarly defined by the polynomials Φs(LS) for the seasonal AR part and Θs(LS) for the seasonal MA part, where LS is the seasonal lag operator, and S is the periodicity of the seasons.
SARIMAX thus provides a dynamic framework that can model the complex behaviors observed in real- world time series data. It is capable of capturing the autocorrelation within the data, the impact of past errors on current predictions, the seasonality, and the influence of external factors, making it an invaluable tool in the forecaster's arsenal. The model's ability to integrate external factors makes it particularly useful for policy analysis and economic forecasting, where decisions or events can have immediate and significant impacts on the data being modeled. The agricultural sector relies heavily on SARIMAX models for crop forecasting, where seasonal patterns are prominent, and external factors such as weather conditions and market policies can significantly impact production. When implementing a SARIMAX model using statistical software like Python's statsmodels library, the process involves specifying the order of the ARIMA components, the seasonal components, and the external variables, followed by fitting the model to historical data. The fit of the model is then evaluated, and if it is satisfactory, it can be used to make forecasts.
The Seasonal Autoregressive Integrated Moving Average with eXogenous variables (SARIMAX) model represents a significant advancement in time series forecasting, especially relevant in areas such as agriculture where external factors are of paramount importance. This model extends the SARIMA (Seasonal ARIMA) model by including exogenous variables, enabling it to effectively handle scenarios where time series data is influenced not only by internal dynamics like trends and seasonality but also by external factors such as climatic conditions or economic policies. The SARIMAX model is denoted as SARIMAX(p, d, q) x (P, D, Q, S) with external variables. Here, 'p' represents the order of the autoregressive part, reflecting the influence of past values in the series. 'd' indicates the degree of differencing, a process crucial for achieving stationarity in the time series. 'q' denotes the order of the moving average part, incorporating the lagged forecast errors into the model. 'P', 'D', and 'Q' are the seasonal counterparts of 'p', 'd', and 'q', respectively, addressing the seasonal aspects of the time series.
They represent the seasonal autoregressive order, the degree of seasonal differencing, and the order of the seasonal moving average part. Lastly, 'S' specifies the length of the seasonal cycle, anchoring the model to the periodic nature of the time series. This comprehensive integration of both internal and external elements within the SARIMAX framework provides a robust and nuanced tool for forecasting, particularly suited for agricultural production predictions where accuracy can significantly impact decision-making and policy formulation. The Auto regressive (AR) part of the model, denoted by p, captures the dependency among observations spaced at different lag values. It is represented as:
AR(p): Yt = φ1 Yt-1 + φ2 Yt-2 + ... + φp Yt-p + εt (1)
Where Yt is the time series at time t, φ1, φ2, ..., φp are the parameters of the AR part, and εt is the white noise error. The Moving Average (MA) component, denoted by q, models the relationship between an observation and a residual error from a moving average model applied to lagged observations. It is represented as:
MA(q): Yt = θ1 εt-1 + θ2 εt-2 + ... + θq εt-q + εt (2)
Where θ1, θ2, ..., θq are the parameters of the MA part.
In addition, the integration part involves differencing the time series d times to make it stationary, represented as:
Δd Yt = Yt - Y{t-d} (3)
The seasonal components capture the seasonality in the time series data. The seasonal AR and MA parts are similar to the non-seasonal components but are applied to seasonal lag intervals. It includes the seasonal AR (SAR) and seasonal MA (SMA) components. Exogenous variables, represented as Xt, are external factors that might affect the time series Y_t. The SARIMAX model incorporates these variables directly into the prediction equation, enhancing the model's forecasting capability.
3. RESULT AND DISCUSSION
Figure 2 in our study illustrates the outcomes of applying various configurations of the SARIMA model to the maize production forecasting dataset. In assessing the performance of each model configuration, we rely on the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) as primary evaluation metrics.
These criteria serve as indicators of model fit, with lower values signifying a more precise alignment with the data trends. The analysis of these results reveals certain configurations as particularly effective. Notably, the SARIMA(1, 1, 2)x(2, 1, 2, 12) model emerges as a standout, marked by its low AIC and BIC scores. This indicates a robust fit to the data, suggesting that the model adeptly captures both the non-seasonal and seasonal aspects of maize production while addressing potential issues of over-differencing. In a similar vein, the SARIMA(1, 1, 1)x(2, 1, 2, 12) and SARIMA(1, 1, 0)x(2, 1, 2, 12) configurations also register low AIC and BIC values. This outcome underscores their efficacy in discerning the inherent patterns in the maize production data, further reinforcing the applicability of these specific SARIMA configurations in forecasting scenarios. The efficacy of these models, as evidenced by their AIC and BIC scores, highlights their potential utility in predictive analytics for agricultural production, particularly in anticipating maize yield trends which have significant implications for agricultural planning and policymaking.
Figure 2. Model Performance
The exploration of non-seasonal and seasonal components within the context of our study reveals intricate details about the underlying patterns in maize production data, as seen in the best-performing SARIMA models.
The presence of '1' in the non-seasonal AR (autoregressive) and MA (moving average) components, which occupy the first and third positions in the SARIMA order, indicates a significant lagged effect inherent in the data. This observation is critical as it implies that past values have a noteworthy influence on future predictions, a factor that is indispensable for accurate forecasting in agricultural contexts.
Furthermore, the seasonal components of these models bring additional insights, particularly noted in the value of '2' in the seasonal MA part, positioned in the third slot of the seasonal order. This element underscores that the seasonal patterns in maize production are not only persistent but also repeat annually. Such a repetition is a vital consideration in forecasting, as it allows for the anticipation of trends and variations that are consistent year over year. This cyclical nature of maize production, characterized by its seasonal repeatability, is a pivotal aspect of the forecasting process. It informs the model's ability to accurately capture and project future production figures, taking into account both the immediate past trends and the broader, annually recurring patterns. Understanding and integrating these non-seasonal and seasonal dynamics in the SARIMA model framework enhances the model's predictive power and reliability, making it a robust tool for forecasting maize production, a crucial element in agricultural planning and food security strategies.
The integration order of 1 (the second position in the SARIMA order) in the best models indicates that first differencing of the data is sufficient to achieve stationarity. This means that the changes in maize production from one period to the next are more consistent over time than the absolute values. While the more complex models (with higher orders of AR and MA components) perform well, it's essential to balance model complexity with parsimony. Overly complex models may fit the training data well but could suffer from overfitting, reducing their forecasting accuracy on new, unseen data. To conclude, the model with the lowest AIC and BIC should be selected for forecasting. However, the difference in these criteria between models should be significant enough to justify the increased complexity. A well-fitting SARIMA model suggests that past values and trends of maize production, along with their seasonal fluctuations, are reliable indicators for future production levels. This information is invaluable for planning in agricultural sectors, policy-making, and managing supply chains.
Figure 3. Forecasting Results
The exploration of non-seasonal and seasonal components within the context of our study reveals intricate details about the underlying patterns in maize production data, as seen in the best-performing SARIMA models.
The presence of '1' in the non-seasonal AR (autoregressive) and MA (moving average) components, which occupy the first and third positions in the SARIMA order, indicates a significant lagged effect inherent in the data. This observation is critical as it implies that past values have a noteworthy influence on future predictions, a factor that is indispensable for accurate forecasting in agricultural contexts.
Furthermore, the seasonal components of these models bring additional insights, particularly noted in the value of '2' in the seasonal MA part, positioned in the third slot of the seasonal order. This element underscores that the seasonal patterns in maize production are not only persistent but also repeat annually. Such a repetition is a vital consideration in forecasting, as it allows for the anticipation of trends and variations that are consistent year over year. This cyclical nature of maize production, characterized by its seasonal repeatability, is a pivotal aspect of the forecasting process. It shows the model's ability to accurately capture and project future production figures, taking into account both the immediate past trends and the broader, annually recurring patterns. Understanding and integrating these non-seasonal and seasonal dynamics in the SARIMA model framework enhances the model's predictive power and reliability, making it a robust tool for forecasting maize production, a crucial element in agricultural planning and food security strategies.
According to figure 3, it appears that the SARIMA model was able to capture the trend and some seasonality of the maize production data up until the point where the actual production data is available. The forecast shows that maize production is expected to decrease slightly in the immediate years following the last known data point.
The widening of the confidence interval as time goes on reflects increasing uncertainty in the forecast the further out, we go. This is typical for time series forecasts because the model is less certain about the future than the near past. It's important to note that the actual production deviates from the forecasted values, indicating that while the model may have captured the overall trend, there are discrepancies between predicted and actual values. These discrepancies could be due to factors not accounted for in the model or random variations that are not predictable.
In addition, we also explain about the trend, seasonal and residual from the prediction method.
Figure 4. Forecasting of Trend, Seasonal and Residual Results
Figure 4 appears to be a decomposition plot of a time series analysis, breaking down the historical data of maize production into its constituent parts: observed, trend, seasonal, and residual components. This kind of analysis is critical for understanding how different factors contribute to changes in maize production over time and for making informed predictions about future trends. Firstly, about observed component, the top panel represents the observed maize production data, which likely consists of time-stamped records of production volume. The vertical spikes suggest significant variation, which could indicate the cyclical nature of agricultural production, influenced by the timing of planting and harvesting seasons, as well as other factors such as yearly differences in weather conditions, pest infestations, or changes in crop management practices. These variations could also reflect
the influence of market conditions, such as changes in demand, price fluctuations, and economic policies that can affect farmers' decisions and thus the overall production output.
Secondly, trend component, the second panel shows the trend component extracted from the observed data.
The trend often represents long-term progression or regression in the dataset, which, in the case of agricultural production, could be influenced by factors such as technological advancements, changes in farming practices, expansion of cultivated land, or even long-term climate patterns. The trend in this plot appears to show an overall increase in maize production over the years, which could be a positive indicator of agricultural productivity growth, possibly due to improved crop yields, increased investment in agriculture, or government policies promoting farming. Thirdly, seasonal component, this third panel illustrates the seasonal component, which reveals the regular pattern that repeats over a known period – in this case, likely on an annual basis. The sharp peaks and troughs indicate that maize production has a strong seasonal cycle, which is expected as maize is typically planted and harvested once a year in many regions. The amplitude and frequency of these seasonal variations can be crucial for planning the agricultural calendar, from planting and application of fertilizers to harvest and distribution to markets.
Lastly, residual component, the bottom panel presents the residual component, which is the portion of the data that remains after the seasonal and trend components have been removed from the observed values. Residuals are essentially the unexplained noise left when known patterns have been accounted for. Ideally, the residuals should be random and show no pattern, indicating that the model has captured all systematic information in the observed data. However, this plot displays some volatility in the residuals, particularly towards the latter years where there is a notable increase in their range. This could suggest that there are other factors not accounted for by the model, such as economic events, policy changes, or other non-cyclical environmental factors that may influence maize production. A closer look at the numbers within the seasonal and residual components could provide additional insights. For example, if the seasonal spikes in production correspond to known harvest periods, their magnitude can inform us about the relative productivity of each season. If certain years show unusually high or low residuals, this could prompt further investigation into whether extraordinary events (like droughts or floods) or economic policies (such as subsidies or import/export restrictions) were in play during those times. Comparing the pattern and magnitude of these residuals across different time periods can also be indicative of the changing stability or volatility in maize production. If residuals become more pronounced over time, it could imply that the maize production process is subject to increasing uncertainty or that the model used for decomposition is becoming less adept at capturing the underlying pattern, possibly due to changing agricultural conditions or external economic factors.
4. CONCLUSION
This study embarked on the task of developing a reliable forecast model for maize production by employing the Seasonal Autoregressive Integrated Moving Average (SARIMA) methodology. Grounded in a comprehensive literature survey, the research underscored the significance of time-series forecasting in agricultural planning, with a particular focus on the SARIMA model's adeptness in capturing seasonal patterns in crop production.
Methodologically, the research involved a meticulous process to identify the most suitable SARIMA model parameters. This process was based on a comparative analysis using the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). Among various tested models, the SARIMA (1, 1, 2) x (2, 1, 2, 12) configuration emerged as the best-performing model, exhibiting the lowest AIC and BIC values, indicating its superior fit to the maize production data. The application of this selected SARIMA model to historical maize production data yielded promising results. Notably, the model proficiently mirrored the historical trends and seasonal fluctuations in maize yields, demonstrating its efficacy in capturing past production dynamics. When it came to forecast accuracy, the model showed remarkable alignment with actual production figures in the short- term horizon, underscoring its potential as a reliable tool for near-term agricultural planning. However, the divergence observed in long-term predictions highlighted the inherent complexities and unpredictability in agriculture, especially factors beyond the scope of historical data such as climatic variations, technological changes, and policy shifts. The insights from this study are invaluable for agricultural economists and policymakers. The ability of the SARIMA model to account for the cyclical nature of maize production makes it an essential component in strategic agricultural decision-making. For future endeavors, it is proposed that the model be enhanced by integrating exogenous variables—such as climatic conditions, soil health, and advancements in agricultural practices—to refine its predictive capabilities. Additionally, the amalgamation of SARIMA with advanced machine learning techniques could offer a sophisticated approach to tackle the nonlinear complexities of agricultural data. Integrative modeling holds the promise of delivering more nuanced and robust forecasting tools, setting a new direction for comprehensive agricultural data analysis in the future.
REFERENCES
[1] R. A. Fischer and D. J. Connor, “Issues for cropping and agricultural science in the next 20 years,” F. Crop. Res., vol.
222, pp. 121–142, 2018.
[2] S. Babu, K. P. Mohapatra, A. Das, G. S. Yadav, M. Tahasildar, R. Singh, A. S. Panwar, V. Yadav, and P. Chandra,
"Designing energy-efficient, economically sustainable and environmentally safe cropping system for the rainfed maize- -fallow land of the Eastern Himalayas," Science of The Total Environment, vol. 722, p. 137874, 2020.
[3] R. D. Norton, Agricultural development policy: Concepts and experiences. John Wiley \& Sons, 2004.
[4] U. Grote, A. Fasse, T. T. Nguyen, and O. Erenstein, “Food security and the dynamics of wheat and maize value chains in Africa and Asia,” Front. Sustain. Food Syst., vol. 4, p. 617009, 2021.
[5] S. A. Tanumihardjo, L. McCulley, R. Roh, S. Lopez-Ridaura, N. Palacios-Rojas, and N. S. Gunaratna, "Maize agro-food systems to ensure food and nutrition security in reference to the Sustainable Development Goals," Global Food Security, vol. 25, p. 100327, 2020.
[6] M. Kaushal, R. Sharma, D. Vaidya, A. Gupta, H. Kaur Saini, A. Anand, C. Thakur, A. Verma, M. Thakur, Priyanka, and others, "Maize: An underexploited golden cereal crop," Cereal Research Communications, vol. 51, no. 1, pp. 3-14, 2023.
[7] H. Jiang, H. Hu, R. Zhong, J. Xu, J. Xu, J. Huang, S. Wang, Y. Ying, and T. Lin, "A deep learning approach to conflating heterogeneous geospatial data for corn yield estimation: A case study of the US Corn Belt at the county level," Global change biology, vol. 26, no. 3, pp. 1754-1766, 2020.
[8] R. J. Henry, "Innovations in plant genetics adapting agriculture to climate change," Current Opinion in Plant Biology, vol. 56, pp. 168-173, 2020
[9] N. Kumar, A. Balamurugan, M. M. Shafreen, A. Rahim, S. Vats, and K. Vishwakarma, "Nanomaterials: emerging trends and future prospects for economical agricultural system," Biogenic Nano-Particles and their Use in Agro-ecosystems, pp.
281-305, 2020
[10] R. Sharma, S. S. Kamble, A. Gunasekaran, V. Kumar, and A. Kumar, “A systematic literature review on machine learning applications for sustainable agriculture supply chain performance,” Comput. \& Oper. Res., vol. 119, p. 104926, 2020.
[11] P. Greve et al., “Global assessment of water challenges under uncertainty in water scarcity projections,” Nat. Sustain., vol. 1, no. 9, pp. 486–494, 2018.
[12] L. E. Pozza and D. J. Field, "The science of soil security and food security," Soil Security, vol. 1, p. 100002, 2020.
[13] R. H. Hariri, E. M. Fredericks, and K. M. Bowers, “Uncertainty in big data analytics: survey, opportunities, and challenges,” J. Big Data, vol. 6, no. 1, pp. 1–16, 2019.
[14] L. Zhao, “Event prediction in the big data era: A systematic survey,” ACM Comput. Surv., vol. 54, no. 5, pp. 1–37, 2021.
[15] Y. Gil et al., “Artificial intelligence for modeling complex systems: taming the complexity of expert models to improve decision making,” ACM Trans. Interact. Intell. Syst., vol. 11, no. 2, pp. 1–49, 2021.
[16] T. Van Klompenburg, A. Kassahun, and C. Catal, “Crop yield prediction using machine learning: A systematic literature review,” Comput. Electron. Agric., vol. 177, p. 105709, 2020.
[17] S. I. Hassan, M. M. Alam, U. Illahi, M. A. Al Ghamdi, S. H. Almotiri, and M. M. Su’ud, “A systematic review on monitoring and advanced control strategies in smart agriculture,” Ieee Access, vol. 9, pp. 32517–32548, 2021.
[18] P. C. S. Reddy and A. Sureshbabu, “An applied time series forecasting model for yield prediction of agricultural crop,”
in Soft Computing and Signal Processing: Proceedings of 2nd ICSCSP 2019 2, 2020, pp. 177–187.
[19] P. K. Sharma, S. Dwivedi, L. Ali, and R. K. Arora, “Forecasting maize production in India using ARIMA model,” Agro- Economist, vol. 5, no. 1, pp. 1–6, 2018.
[20] H. Storm, K. Baylis, and T. Heckelei, “Machine learning in agricultural and applied economics,” Eur. Rev. Agric. Econ., vol. 47, no. 3, pp. 849–892, 2020.
[21] A. Pole, M. West, and J. Harrison, Applied Bayesian forecasting and time series analysis. Chapman and Hall/CRC, 2018.
[22] W. C. Labys, Commodity models for forecasting and policy analysis. Taylor & Francis, 2024.
[23] M. Rashid, B. S. Bari, Y. Yusup, M. A. Kamaruddin, and N. Khan, “A comprehensive review of crop yield prediction using machine learning approaches with special emphasis on palm oil yield prediction,” IEEE access, vol. 9, pp. 63406–
63439, 2021.
[24] C. S. Yarrington, Review of forecasting univariate time-series data with application to water-energy nexus studies \&
proposal of parallel hybrid SARIMA-ANN model. West Virginia University, 2021.
[25] E. Njuki, B. E. Bravo-Ureta, and C. J. O’Donnell, “A new look at the decomposition of agricultural productivity growth incorporating weather effects,” PLoS One, vol. 13, no. 2, p. e0192432, 2018.
[26] P. K. Singh, A. K. Pandey, S. Ahuja, and R. Kiran, “Multiple forecasting approach: a prediction of CO2 emission from the paddy crop in India,” Environ. Sci. Pollut. Res., pp. 1–12, 2022.
[27] Y. Ru, B. Blankespoor, U. Wood-Sichra, T. S. Thomas, L. You, and E. Kalvelagen, “Estimating local agricultural gross domestic product (AgGDP) across the world,” Earth Syst. Sci. Data, vol. 15, no. 3, pp. 1357–1387, 2023.