26 Using Grammars for Pattern Recognition in Images

The purpose of this article is to present and discuss the results of a systematic review to determine the state of the art in computer vision and pattern recognition using grammars. Potential gaps in the state of the art that can be investigated were also discussed.

Fig. 1. Number of papers published every year.

Object Recognition

Adding these edges creates an AND-OR graph that represents the grammar of the image. A syntax graph is a parsed tree of this grammar with an increment of relations between nodes. In an AND-OR graph, the AND node is used to represent the decomposition of the graph taking into account its subconfigurations, while the OR node is used as a switch to select one of the alternative AND nodes.

The results of using AND-OR graphs for clothing configurations can be seen in images in Chen et al. The terminal nodes of the generated grammar are planar rectangles projected onto the images. One of the techniques used creates a grammar based on the syntactic graphs of the segmented images (obtained by X-rays) of human bones.

The application of the proposed techniques achieved a recognition rate of about 93% when used to analyze and interpret lesions. The statistical part of the proposed framework is responsible for comparing the image features being analyzed with the information from the training examples. The shock graph of a query image is compared to the shock graphs of the database images.

Texture Recognition

The medial axis of the shape is considered to be the center point of the largest circles in the curve that touch the border of the shape. Generally, these points are internal parts of the skeleton and are robust to deformation. In the method proposed by Hingway and Bhurchandi [2011], each object in the database is converted into a binary image, thus obtaining its skeleton.

This idea was to develop an implementation to obtain accurate geometric estimates near discontinuities to locate shocks. The spacing of the graphs is characterized by the rules of shock graph grammar, which allows a shock graph to be reduced to a unique rooted shock tree. The fuzzy grammar generated in the learning phase is then used to evaluate this new feature vector and perform the classification.

The feasibility and effectiveness of the approach has been proven in experiments with more than 30 types of textures extracted from images with 640×480 pixels.

Object Construction

To test the proposed method, 184 images of Alternaria fungi from two datasets were used. The paper by Hemberg and OReilly [2004] presented an extension of the grammar used in the Genr8 system, which is used by architects to develop surface designs. A surface in Genr8 starts with a closed polygon and grows by simultaneous applications of production rules.

The developed system was used in the Architectural Design Association - AA undergraduate Design Emergent Design Technologies course in London in 2004. The initial step used image processing and pattern recognition techniques to obtain the geometric and morphological structures of the plant. The final step is to create an L-system grammar from the BHA transformation.

One of the limitations of the technique is the high cost of capturing tree structure from multiple images.

Image Segmentation

The second feature is the ability to simulate the plant growth when the proposed approach is able to represent a plant at different ages. To reproduce these two functions, the proposed method used L-system grammars, as they have recursive production rules capable of simulating the evolution of plants. 2009], a method is presented that combines rule-based techniques and images to create lightweight 3D models of trees.

The proposed model can be divided into four steps: (i) restoration of a tree trunk structure from a 2D image, (ii) 3D reconstruction of the trunk skeleton using the binocular vision method, (iii) extraction of the axioms and production rules from the skeleton using 3D L-system grammar, and (iv) using an L-system interpretation algorithm to create models that can be sent via the web to be rendered on the client machine. 2008] presented a rule-based grammar learning method to create an L-system grammar to model plant development. To test the proposed method, two experiments were performed using 35 images of leafless trees with 360◦ coverage around the plant.

Then each block is associated with a number that maps the image of the displayed structure to create a numerical matrix representing the image.

Change of Scales

Next, a top-down approach is used to describe how objects and region models (texture, shading, etc.) generate the image intensity. Finally, to perform the parameter estimates, bottom-up suggestions are made to guide the search through the parameter space. 2012], a method is presented to segment a given image and estimate the area using regular grammars.

An image is defined as a set of words based on an alphabet, and an object, similar to a word, is recognized by an automaton. The image is divided into small blocks, each with a predefined structure that represents the terminal node of a grammar. The automaton equivalent to this regular expression is used to analyze the matrix to find submatrices with these contours and perform the image segmentation.

The researchers performed tests with images of the heart and show an accuracy of 93.22% for estimating the segmented areas.

Layout Recognition

The grammar performs semantic grouping and interpretation of segmented objects from the screenshot. The approach is less complex than analysis based on the HTML source code in terms of the amount of generated nodes. The authors treat document parsing as a parsing problem, since the order of components on a document page and their relationships can be modeled as a grammar that represents a component-level page in terms of blocks and regions.

The proposed model was divided into two parts: i) a hidden semi-Markov model was used to describe the grouping of page areas into rectangular blocks; (ii) aK-d tree grammar was used to represent the hierarchical decomposition of pages. Initially, the document image is divided into parallel strips, and then the number of black pixels in each strip is counted. The boundaries between different strip groups are marked by changes in the state of the model that characterizes the strip groups.

The suggested vocabulary of grammar symbols is formed by these labels, which are used to find physical layouts of the page.

Others

A spatial graph is constructed where each node represents a known object and edges represent spatial relationships between objects (text, buttons, etc.). The approach was tested on many web pages and object recognition occurred satisfactorily in less than 25 milliseconds.

DISCUSSION

Advantages and Applications of Grammars in Computer Vision

For example, Kanungo and Mao [2003] used a stochastic regular grammar to describe the layout structure of documents, Hamdi et al. For example, context-free grammars can represent different lesions in medical images with angular interval rules between bones, vessels, and so on [Ogiela et al.2008,2009]. Although conventional and context-free grammars can be used, additional representational power may be required to better characterize complex visual patterns.

Spatial random tree grammars augment stochastic context-free grammars by explicitly incorporating spatial information as part of the grammar [Wang et al. 2005b, 2006; Siskind et al. Alternatively, stochastic context-free grammars can be combined with Markov random field, where the former represents the variability of the object configuration and the latter represents the spatial relationship between parts [Reddy et al.2009; Zhu et al. 2009]. However, grammar learning rules for these proposed models are only addressed in Zhu et al.

Therefore, this model has been widely used to construct images such as plants, trees, and so on, favoring the definition of models consistent with those used in developmental morphology and physiology [Prusinkiewicz et al.1988].

Disadvantages of Using Grammars in Computer Vision

In fact, most of the papers reporting success in complex object recognition have addressed this issue. Once the advantages and disadvantages of the general use of grammar are presented, it is important to outline a comparison with other nongrammatical methods. In general, the use of grammars allows a more flexible way of representing images, mainly when the images have a well-established hierarchical pattern, as mentioned before.

This happens because a direct mapping between image structure and syntactic rules can be performed. 2000] stated that the use of structural features of syntactically represented images provides greater flexibility in some real-world applications, such as the extraction of deformable image patterns from complex backgrounds, compared to statistical methods. In the context of three-dimensional object recognition, Lin and Fu [1986] stated that it is easier to identify visible primitive surface patches in a syntactic representation than to directly recognize an object.

In addition, stochastic grammars provide a probability that a given object should be considered in each class, unlike other discriminant-based methods, such as decision trees, neural networks, support vector machines or even deterministic grammars, which only provide the classification result [Stuckelberg and Doermann 1999] .

Challenges and Possible Future Directions

Learning the grammars in the higher levels of the Chomsky hierarchy presents more challenges than the grammars at lower levels. For example, while there are known algorithms for learning finite automata from only positive or from positive and negative samples [Angluin1992; Ron et al. Some approaches for learning this class of grammars include learning context-free subclasses [M ¨akinen1992], learning from structured samples [Sakakibara1992], and using heuristics [Sakakibara and Muramatsu2000].

Finally, there is the issue of the small number of studies in the three-dimensional domain. In fact, working in this context poses a number of additional problems, including occlusion problems and the need to understand the perspectives of the objects, among others. However, technological advances in recent years have led to an increase in three-dimensional interactive systems using computer graphics and virtual reality technologies in many application areas.

1988] indicated a dozen directions related to L-systems using graphical object modeling, such as texture addition, complex surface modeling, and simulation complexity analysis.

CONCLUSIONS

In Proceedings of the 4th International Conference on Emerging Trends in Engineering and Technology (ICETET’11). InProceedings van die IEEE Computer Society Conference on Computer Visie en Patroonherkenning Werkswinkels (CVPRW’12). InProceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops’09).7.

InProceedings of the IEEE Computer Soci- Conference on Computer Vision and Pattern Recognition (CVPR. InProceedings of the 8th International Conference on Virtual Reality Continuum and its Applications in Industry (VRCAI’09).