ӘОЖ 621.396 DOI 10.52167/1609-1817-2023-126-3-267-274 T.K. Baitokova1 , M. Spabekova2
1Kazakh British Technical University, Almaty, Kazakhstan
2Academy of logistics and transport, Almaty, Kazakhstan E-mail: [email protected]
CREATION OF A BIBLIOGRAPHIC AND ABSTRACT DATABASE OF REVIEWED SCIENTIFIC LITERATURE OF KAZAKHSTAN
Abstract. Access to reliable and comprehensive scientific literature is crucial for researchers and scientists to develop new ideas, explore innovative solutions to existing problems, and support research and development initiatives. However, a lack of a comprehensive bibliographic and abstract database of peer-reviewed scientific literature in Kazakhstan makes it challenging for researchers and scholars to access and analyze research in the country. To address this issue, this article proposes the creation of a comprehensive bibliographic and abstract database of scientific literature in Kazakhstan, utilizing Python parsing to collect and organize data in an SQL database. The database will provide an efficient and effective way for researchers, academics, and policy makers to access reliable and comprehensive information about scientific research in the country. The article discusses the methodology used to collect and organize data and the tools that can be used to analyze and present the data. This database will be a valuable resource for researchers, academics, and policy makers working on R&D initiatives in Kazakhstan, enabling them to easily access articles by author, journal, or keywords.
In addition, the database can be supplemented with various tools and programs to facilitate data analysis and presentation. Therefore, it is imperative to establish a comprehensive bibliographic and abstract database of scientific literature in Kazakhstan to support research and development initiatives in the country.
Keywords. Bibliographic database, scientific literature, Kazakhstan, peer–reviewed, Scopus, Web of Science, SQL.
Introduction.
Access to reliable and comprehensive scientific literature is critical for researchers and scientists to develop new ideas, explore innovative solutions to existing problems, and support research and development initiatives in various fields. However, despite the significant growth of scientific research in Kazakhstan in recent years, the lack of a comprehensive bibliographic and abstract database of Kazakhstan's peer-reviewed scientific literature makes it difficult for researchers and scholars to access and analyze research in the country [3].
While popular bibliographic databases such as Scopus and Web of Science provide access to a wide range of scientific literature from different countries, they may not provide full coverage of scientific literature from Kazakhstan [1]. This is due to a variety of factors, including the language barrier, limited visibility of smaller or more niche publications, and lack of interest from foreign publishers in including Kazakhstani journals in their databases [4].
Therefore, the creation of a comprehensive bibliographic and abstract database of the scientific literature of Kazakhstan is necessary for the researchers and scientists of Kazakhstan to have access to reliable and comprehensive information about scientific research in the country.
Such a database will not only help researchers and scientists access relevant and timely information, but it will also promote collaboration and knowledge-sharing within the scientific community, as well as foster international recognition of Kazakhstan's scientific achievements [3].
In this work, we aim to create a comprehensive bibliographic and abstract database of the peer-reviewed scientific literature of Kazakhstan using data collection techniques such as web scraping and data parsing. To achieve this goal, we will use Python programming language and the BeautifulSoup library for web scraping, and SQL for data storage and management [7]. By developing an efficient and effective data collection and storage system, we hope to contribute to the development of the scientific community in Kazakhstan and provide a valuable resource for researchers and scholars interested in Kazakhstani science [1].
Materials and methods.
The database can be supplemented with various tools and programs that can help researchers and scientists analyze and present data. Interactive and informative data visualizations can be created using tools like Tableau and Power BI. Similarly, statistical analysis tools such as R and Python can be used to analyze data and extract insights and patterns that may not be immediately visible from the raw data [2, 3].
This database will be created using Python parsing to collect data from various sources and SQL to work with the data. The resulting database will allow scientists and professors in Kazakhstan to access all articles published by a particular researcher.
Python parsing is a technique used to extract data from various sources, including websites, databases, and APIs. In the case of creating a bibliographic and abstract database of peer-reviewed scientific literature of Kazakhstan, Python parsing will be used to extract data from various sources, including scientific journal websites, university libraries and government databases [7].
The Python parsing technique will involve creating a script that will pull data from these sources and store it in a database. The script will use various libraries and tools, including BeautifulSoup and Requests, to access and retrieve data from websites. Once the data is extracted, it will be cleaned and processed to ensure its accuracy and consistency [4].
SQL, also known as Structured Query Language, is a computer programming language designed for managing and manipulating databases [5]. In the case of creating a bibliographic and abstract database of peer-reviewed scientific literature of Kazakhstan, SQL will be used to work with data extracted using Python parsing [7].
SQL will be used to create and manage the database, including creating tables, defining relationships between tables, and inserting, updating, and deleting data. SQL queries will also be used to retrieve data from the database and create reports [5].
In the Table 1 here is a list of Python libraries and SQL commands that are proposed in the project along with brief descriptions of their functions and purposes [2]:
Table 1 - Quick reference guide using Python parsing and SQL database management Python Libraries Purpose Description
BeautifulSoup
to extract data from HTML and XML files
An easy-to-use functions to navigate and search through a website's HTML or XML document tree.
Requests
to send HTTP requests and handle the server's responses in Python
An easy way to access web pages and interact with web applications.
Pandas
to manipulate and analyze data in Python
A data structures for efficiently storing and manipulating large datasets, as well as functions for data cleaning, preparation, and analysis.
Table 1 continued
SQLAlchemy
to interact with relational databases in Python
A set of high-level APIs for managing database connections, transactions, and data manipulation.
SQL Commands Purpose Description CREATE
TABLE
to create a new table in a relational database
The table's columns and their data types, as well as any constraints or indexes that should be applied to the table.
INSERT INTO
to add new information to a table in a database
The data that will be inserted into individual columns of the table.
SELECT
to extract data from a table
The specific columns to be extracted and any filtering conditions to be applied to the data.
WHERE
to filter data retrieved by a SELECT statement
A condition that must be met for a row to be included in the query results.
JOIN
to merge information from two or more tables into a unified dataset
It allows you to query related data from multiple tables in a single query.
Results.
The use of web scraping and parsing techniques to collect data on scholarly articles has become increasingly popular in recent years. In this study, we described a methodology for collecting bibliographic and referential data on scientific literature related to Kazakhstan, utilizing the Python programming language and the Beautiful Soup library.
The methodology involves several steps, including identifying the relevant sources of data, accessing the website using Python's requests library, and parsing the HTML content of the webpage using the Beautiful Soup library. Specific information, such as author names, publication titles, ISSN, DOI, and publication dates, is extracted using Beautiful Soup’s functions to navigate the HTML structure of the webpage [4, 5].
To obtain additional information, such as abstracts, keywords, and references, the full text of the article is downloaded using the requests library. The collected data is then cleaned and organized using Python's data manipulation libraries, such as Pandas, to create a structured dataset [6].
The final step in the methodology involves storing and organizing the data using a database management system, such as SQL, which allows for easy search and retrieval of information. This methodology provides a comprehensive and efficient way to collect data on scholarly articles related to Kazakhstan [5].
In addition to collecting bibliographic and referential data, web scraping and parsing techniques can also be used to analyze and visualize trends in scientific literature. For example, the collected data can be used to generate co-authorship networks, keyword co-occurrence maps, and citation networks, which can provide valuable insights into the structure and evolution of scientific fields [7].
Furthermore, the availability of large-scale bibliographic and referential datasets can facilitate the development of machine learning and natural language processing models for automated literature review, topic modeling, and recommendation systems [8].
However, it is important to note that web scraping and parsing techniques should be used ethically and with respect to copyright laws and terms of service of the targeted websites.
Moreover, the quality and accuracy of the collected data depend on the reliability and consistency of the data sources and the parsing algorithms used.
Overall, the methodology described in this study can serve as a useful tool for researchers and practitioners in the field of scholarly communication, providing a scalable and customizable approach for collecting, analyzing, and visualizing bibliographic and referential data on scientific literature related to Kazakhstan.
In conclusion, this study demonstrates the effectiveness of using web scraping and parsing techniques in conjunction with Python and the Beautiful Soup library to collect bibliographic and referential data on scientific literature related to Kazakhstan. This methodology can be applied to other contexts and provides a valuable tool for researchers and scholars in the field of scholarly communication [6].
Figure 1 - Python parsing process
In order to manage and store the bibliographic and referential data collected in this dissertation, we will utilize SQL (Structured Query Language). SQL is a programming language designed for managing and manipulating data stored in relational databases. It is widely used in various applications, including scientific research, to efficiently store, organize, and retrieve data [3, 5].
By using SQL, we created a relational database that will allow us to store data in tables with clearly defined relationships between them. For example, we created a table for authors, a table for publications, and a table for keywords, and define relationships between these tables.
This enabled us to easily retrieve specific information, such as all publications by a particular author, or all publications related to a certain keyword [2].
We used SQL queries to extract information from the database, such as the number of publications by year or the most commonly used keywords. These queries allow us to analyze and interpret the data collected in this dissertation more effectively [4].
To further expand on the use of SQL in managing and storing bibliographic and referential data, it is important to note that SQL provides a standardized language for database management, which enables researchers to easily collaborate and share data across different platforms and systems. The relational structure of the database also facilitates data integrity and consistency, as well as scalability and adaptability for future data needs and updates [2, 7].
In creating the database for this dissertation, we followed standard database design principles to ensure its efficiency and effectiveness. This included using an Entity-Relationship Diagram (ERD) to visualize the relationships between different entities or tables in the database,
as shown in Figure 2. The ERD helps to clarify the relationships and dependencies between different data points, and serves as a blueprint for the overall database structure and functionality [4].
Furthermore, SQL offers a wide range of analytical and reporting tools that can be used to gain insights and draw conclusions from the collected data. For example, statistical analyses and data visualization techniques can be employed to identify trends, patterns, and relationships between different variables. These insights can then be used to inform further research and scholarship in the field.
Overall, the use of SQL in managing and analyzing bibliographic and referential data provides a powerful tool for researchers and scholars to efficiently collect, store, and analyze data in a structured and organized manner. The insights gained from such analysis can lead to a better understanding of the scholarly landscape and facilitate future research in the field [6, 7].
Figure 2 - Entity-Relationship Diagram (ERD) Discussion.
The creation of a bibliographic and abstract database of peer-reviewed scientific literature in Kazakhstan will have a number of advantages for the country's scientific community. The use of web scraping and parsing techniques in conjunction with Python and the Beautiful Soup library allowed us to collect bibliographic and referential data on scientific literature related to Kazakhstan, which was then stored and managed using SQL.
Firstly, the database will make it easier for researchers and scientists to find and access peer-reviewed scientific literature in Kazakhstan. With all the data in one place, researchers no longer have to spend hours searching for the right articles in various sources. The methodology we developed for collecting and organizing the data using Python and SQL can be applied to
other contexts and provides a valuable tool for researchers and scholars in the field of scholarly communication [5, 6].
Secondly, the database will facilitate cooperation between researchers and scientists in Kazakhstan. With easy access to each other's work, researchers can more easily collaborate on projects and share ideas. This will contribute to the development of an active scientific community in Kazakhstan, promoting innovation and research.
Thirdly, the database will make it easier for universities and research institutes in Kazakhstan to evaluate the scientific results of their teachers. By clicking on the name of the researcher, all their articles in all the journals in which they were published will be available.
This will make it easier to assess the impact of the researcher's work and the quality of their research results.
Fourthly, the database will also be useful to students and beginning researchers in Kazakhstan. With easier access to peer-reviewed scientific literature, students will be able to conduct more in-depth research for their projects. This will help nurture a new generation of researchers and scientists in the country, contributing to a culture of innovation and research.
Finally, the creation of a bibliographic and abstract database of peer-reviewed scientific literature of Kazakhstan will contribute to the visibility and recognition of Kazakhstani scientific research on the world stage [3]. By having all the results of their research in one place, Kazakh researchers and scientists will have more opportunities to showcase their work to the international scientific community, which will facilitate collaboration and opportunities for funding and recognition.
In conclusion, the creation of a bibliographic and abstract database of peer-reviewed scientific literature in Kazakhstan using web scraping and parsing techniques, Python programming language, and SQL management system provides a comprehensive and efficient way to collect, organize, manage, and retrieve data on scholarly articles related to Kazakhstan [5, 6]. This methodology can be applied to other contexts and provides a valuable tool for researchers and scholars in the field of scholarly communication.
Conclusion.
In conclusion, the creation of a comprehensive bibliographic and abstract database of peer-reviewed scientific literature in Kazakhstan is a crucial step towards providing researchers and scientists with reliable and comprehensive information about scientific research in the country. The methodology proposed in this project, which involves using Python parsing to collect and organize data in an SQL database, provides an efficient and effective way to create such a database that can be used to support research and development initiatives in Kazakhstan.
As a result of this project, a database has been created that stores all scientific articles published in various peer-reviewed journals in Kazakhstan. This database enables researchers to easily access articles by author, journal, or keywords, facilitating the search for up-to-date and relevant information on specific topics. The user can click on an author's last name to access all of their articles in all the journals in which they have published their work. This database, which brings all the data together in one place, will be a valuable resource for researchers, academics, and policy makers working on R&D initiatives in Kazakhstan.
Moreover, the database can be complemented with various tools and programs that can help researchers and scientists analyze and present data effectively. Therefore, establishing a comprehensive bibliographic and abstract database of scientific literature in Kazakhstan should be given high priority to support research and development initiatives in the country.
In summary, the proposed methodology in this project provides an effective and efficient way to create a comprehensive database of peer-reviewed scientific literature in Kazakhstan, which can promote collaboration, innovation, and research in the country.
REFERENCES
[1] Adambekov, S. Development of science and technology in Kazakhstan. Journal of Scientific Research. –№6(2). – 2019. – P. 10–23.
[2] Presnyakova, G.V. Designing Integrated Relational Databases: Textbook//G.V.
Presnyakov. - M.: KDU, 2007. - 224 p.
[3] Bekbolatov T., Tleuova A. Analysis of bibliometric indicators of scientific research in Kazakhstan. Journal of Scientometric Research. - No. 7 (3). - 2018. - S. 145-152.
[4] Karamanova Z., Kalieva M. Scientific publications in Kazakhstan: current trends and prospects. Journal of Library Science and Information Technology. -No. 5 (2). – 2017. – P. 45–
58.
[5] Martishin, S.A. Design and implementation of databases in MySQL DBMS using MySQL Workbench: Methods and tools for designing information systems and technologies.
Tools of information systems: Textbook// S.A. Martyshyn, V.L. Simonov. - M.: ID FORUM, NIC Infra-M, 2012. - 160 p.
[6] Sweigart, El. Automating routine tasks with Python: A practical guide for beginners.
Per. from English. - M.: Williams, 2016. - 592 p.
[7] Parsing in Python with Beautiful Soup - https://pythonru.com/biblioteki/parsing-na- python-s-beautiful-soup
Толғанай Байтокова, магистр, Қазақстан-Британ техникалық университеті, Алматы, Қазақстан, [email protected]
Маржан Спабекова, сениор-лектор, Логистика және көлік академиясы, Алматы, Қазақстан, [email protected]
ҚАЗАҚСТАННЫҢ РЕЦЕНЗИЯЛАНАТЫН ҒЫЛЫМИ ӘДЕБИЕТІНІҢ БИБЛИОГРАФИЯЛЫҚ ЖӘНЕ РЕФЕРАТТЫҚ БАЗАСЫН ҚҰРУ
Аңдатпа. Сенімді және жан-жақты ғылыми әдебиеттерге қол жеткізу зерттеушілер мен ғалымдар үшін жаңа идеяларды әзірлеу, бар проблемалардың инновациялық шешімдерін зерттеу және ғылыми-зерттеу және тәжірибелік-конструкторлық бастамаларды қолдау үшін өте маңызды. Дегенмен, Қазақстанда рецензияланған ғылыми әдебиеттердің толық библиографиялық және абстрактілі деректер базасының жоқтығы зерттеушілер мен ғалымдарға елдегі зерттеулерге қол жеткізу және талдауды қиындатады.
Осы мәселені шешу үшін бұл мақалада SQL дерекқорында деректерді жинау және ұйымдастыру үшін Python талдауын қолдану арқылы Қазақстандағы ғылыми әдебиеттердің жан-жақты библиографиялық және абстрактілі дерекқорын құру ұсынылады. Деректер базасы зерттеушілерге, академиктерге және саясаткерлерге елдегі ғылыми зерттеулер туралы сенімді және жан-жақты ақпаратқа қол жеткізудің тиімді және тиімді әдісін қамтамасыз етеді. Мақалада деректерді жинау және ұйымдастыру әдістемесі және деректерді талдау және ұсыну үшін пайдалануға болатын құралдар қарастырылады.
Бұл дерекқор Қазақстандағы ҒЗТКЖ бастамаларымен жұмыс істейтін зерттеушілер, ғалымдар және саясаткерлер үшін автор, журнал немесе кілт сөздер бойынша мақалаларға оңай қол жеткізуге мүмкіндік беретін құнды ресурс болады. Сонымен қатар, мәліметтер қорын деректерді талдау мен ұсынуды жеңілдету үшін әртүрлі құралдармен және бағдарламалармен толықтыруға болады. Сондықтан еліміздегі ғылыми-зерттеу және тәжірибелік-конструкторлық бастамаларды қолдау үшін Қазақстанда ғылыми әдебиеттердің жан-жақты библиографиялық және абстрактілі мәліметтер базасын құру өте қажет.
Түйінді сөздер. библиографиялық деректер базасы, ғылыми әдебиеттер, Қазақстан, рецензияланған, Scopus, Web of Science, SQL.
Толғанай Байтокова, магистр, Казахстанско-Британский технический университет, Алматы, Казахстан, [email protected]
Маржан Спабекова, сениор-лектор, Академия логистики и транспорта, Алматы, Казахстан, [email protected]
СОЗДАНИЕ БИБЛИОГРАФИЧЕСКОЙ И РЕФЕРАТИВНОЙ БАЗЫ ДАННЫХ РЕЦЕНЗИРУЕМОЙ НАУЧНОЙ ЛИТЕРАТУРЫ КАЗАХСТАНА
Аннотация. Доступ к надежной и всеобъемлющей научной литературе имеет решающее значение для исследователей и ученых в плане разработки новых идей, изучения инновационных решений существующих проблем и поддержки инициатив в области исследований и разработок. Однако отсутствие всеобъемлющей библиографической и реферативной базы данных рецензируемой научной литературы в Казахстане затрудняет доступ исследователей и ученых к исследованиям в стране и их анализ. Для решения этой проблемы в данной статье предлагается создание комплексной библиографической и реферативной базы данных научной литературы в Казахстане с использованием синтаксического анализа Python для сбора и организации данных в базе данных SQL. База данных предоставит исследователям, ученым и политикам эффективный и действенный способ доступа к надежной и всеобъемлющей информации о научных исследованиях в стране. В статье обсуждается методология, используемая для сбора и организации данных, а также инструменты, которые можно использовать для анализа и представления данных. Эта база данных будет ценным ресурсом для исследователей, ученых и политиков, работающих над инициативами в области НИОКР в Казахстане, позволяя им легко получать доступ к статьям по авторам, журналам или ключевым словам. Кроме того, базу данных можно дополнить различными инструментами и программами, облегчающими анализ и представление данных. Поэтому необходимо создать комплексную библиографическую и реферативную базу данных научной литературы в Казахстане для поддержки инициатив в области исследований и разработок в стране.
Ключевые слова. Библиографическая база данных, научная литература, Казахстан, рецензируемые, Scopus, Web of Science, SQL.
*****************************************************************************