The technological advancements and emergence of new kinds of communication mediums, especially social media and networks, have brought an era of unprecedented connectivity, which can be leveraged for better science communication. This paper explores social media activity around Indian research papers with the objective of evaluating if the quantum of activity is sufficient enough to indicate that social media can be an effective medium of science communication in India. In the absence of any existing survey of social media usage by scientists in India, the paper uses altmetrics as a proxy measure to capture; science communication activities around two major classes, namely, science-science connect and science-society connect. Results indicate that social media activity around Indian research papers is relatively low as compared to the developed countries and also the world average. There is a higher activity in science-science connect (Mendeley) whereas science-society connect is less pronounced (other social media and news). The paper argues that there is a need to expose Indian research community to the opportunities that social media presents and that an appropriate use can be helpful for improved science-science and science-society connects. Full Version
Over the years, social media has emerged as one of the most popular platforms where people express their views and share thoughts about various aspects. The social media content now includes a variety of components such as text, images, videos etc. One type of interest is memes, which often combine text and images. It is relevant to mention here that, social media being an unregulated platform, sometimes also has instances of discriminatory, offensive and hateful content being posted. Such content adversely affects the online well-being of the users. Therefore, it is very important to develop computational models to automatically detect such content so that appropriate corrective action can be taken. Accordingly, there have been research efforts on automatic detection of such content focused mainly on the texts. However, the fusion of multimodal data (as in memes) creates various challenges in developing computational models that can handle such data, more so in the case of low-resource languages. Among such challenges, the lack of suitable datasets for developing computational models for handling memes in low-resource languages is a major problem. This work attempts to bridge the research gap by providing a large-sized curated dataset comprising 5,054 memes in Hindi-English code-mixed language, which are manually annotated by three independent annotators. It comprises two subtasks: (i) Subtask-1 (Binary classification involving tagging a meme as misogynous or non-misogynous), and (ii) Subtask-2 (multi-label classification of memes into different categories). The data quality is evaluated by computing Krippendorffs alpha. Different computational models are then applied on the data in three settings: text-only, image-only, and multimodal models using fusion techniques. The results show that the proposed multimodal method using the fusion technique may be the preferred choice for the identification of misogyny in multimodal Internet content and that the dataset is sui Full Version
During the last decade, social media has gained significant popularity as a medium for individuals to express their views on various topics. However, some individuals also exploit the social media platforms to spread hatred through their comments and posts, some of which target individuals, communities or religions. Given the deep emotional connections people have to their religious beliefs, this form of hate speech can be divisive and harmful, and may result in issues of mental health as social disorder. Therefore, there is a need of algorithmic approaches for the automatic detection of instances of hate speech. Most of the existing studies in this area focus on social media content in English, and as a result several low-resource languages lack computational resources for the task. This study attempts to address this research gap by providing a high-quality annotated dataset designed specifically for identifying hate speech against religions in the Hindi-English code-mixed language. This dataset “Targeted Hate Speech Against Religion” (THAR)) consists of 11,549 comments and has been annotated by five independent annotators. It comprises two subtasks: (i) Subtask-1 (Binary classification), (ii) Subtask-2 (multi-class classification). To ensure the quality of annotation, the Fleiss Kappa measure has been employed. The suitability of the dataset is then further explored by applying different standard deep learning, and transformer-based models. The transformer-based model, namely Multilingual Representations for Indian Languages (MuRIL), is found to outperform the other implemented models in both subtasks, achieving macro average and weighted average F1 scores of 0.78 and 0.78 for Subtask-1, and 0.65 and 0.72 for Subtask-2, respectively. The experimental results obtained not only confirm the suitability of the dataset but also advance the research towards automatic detection of hate speech, particularly in the low-resource Hindi-English code-mixed language. Full Version
Artificial intelligence (AI) has emerged as a transformative technology with applications across multiple domains. The corpus of work related to the field of AI has grown significantly in volume as well as in terms of the application of AI in wider domains. However, given the wide application of AI in diverse areas, the measurement and characterization of the span of AI research is often a challenging task. Bibliometrics is a well-established method in the scientific community to measure the patterns and impact of research. It however has also received significant criticism for its overemphasis on the macroscopic picture and the inability to provide a deep understanding of growth and thematic structure of knowledge-creation activities. Therefore, this study presents a framework comprising of two techniques, namely, Bradford’s distribution and path analysis to characterize the growth and thematic evolution of the discipline. While the Bradford distribution provides a macroscopic view of artificial intelligence research in terms of patterns of growth, the path analysis method presents a microscopic analysis of the thematic evolutionary trajectories, thereby completing the analytical framework. Detailed insights into the evolution of each subdomain are drawn, major techniques employed in various AI applications are identified, and some relevant implications are discussed to demonstrate the usefulness of the analyses. Full Version
The recently launched large language models have the capability to generate text and engage in human-like conversations and question-answering. Owing to their capabilities, these models are now being widely used for a variety of purposes, ranging from question answering to writing scholarly articles. These models are producing such good outputs that it is becoming very difficult to identify what texts are written by human beings and what by these programs. This has also led to different kinds of problems such as out-of-context literature, lack of novelty in articles, and issues of plagiarism and lack of proper attribution and citations to the original texts. Therefore, there is a need for suitable computational resources for developing algorithmic approaches that can identify and discriminate between human and machine generated texts. This work contributes towards this research problem by providing a large sized curated and annotated corpus comprising of 44,162 text articles sourced from Wikipedia and ChatGPT. Some baseline models are also applied on the developed dataset and the results obtained are analyzed and discussed. The curated corpus offers a valuable resource that can be used to advance the research in this important area and thereby contribute to the responsible and ethical integration of AI language models into various fields. Full Version
G-20 refers to an organization of 20 member countries/ units founded in 1999. Over the years, it has become an important political and economic platform to address various developmental concerns. The member countries collectively represent about 75% of global population, 85% of the global gross domestic product and 75% of the global trade. Given that the G-20 has 88.8% of the world’s researchers and accounts for 93.2% of research spending and 90.6% of scientific publications at the global level, it would be interesting to analyse the international research collaboration patterns among the G-20 countries, including assessment of benefits and impact of such collaboration. The present study utilizes the publication data of these countries to estimate their collaborative research levels. A positive growth is observed in research collaboration along with a positive correlation with the national expenditure on R&D. Some countries (e.g. Saudi Arabia and South Africa) are found to have benefitted significantly from such collaborative research, as observed by a boost in productivity and citations. The results comprehensively account for international research collaboration among the G-20 countries. Full Version
The advancements in computer vision and image processing techniques have led to emergence of new application in the domain of visual surveillance, targeted advertisement, content-based searching, human–computer interaction, etc. Out of the various techniques in computer vision, face analysis, in particular, has gained much attention. Several previous studies have tried to explore different applications of facial feature processing for a variety of tasks, including age and gender classification. However, despite several previous studies having explored the problem, the age and gender classification of in-wild human faces is still far from achieving the desired levels of accuracy required for real-world applications. This paper, therefore, attempts to bridge this gap by proposing a hybrid model that combines self-attention and BiLSTM approaches for age and gender classification problems. The proposed model’s performance is compared with several state-of-the-art models proposed so far. An improvement of approximately 10% and 6% over the state-of-the-art implementations for age and gender classification, respectively, is noted for the proposed model. The proposed model is thus found to achieve superior performance and is found to provide a more generalized learning. The model can, therefore, be applied as a core classification component in various image processing and computer vision problems. Full Version
Scholarly databases are now being increasingly used for search and retrieval of research articles in different subject areas. Several previous studies have shown that different databases vary in their coverage of publication sources, and therefore, one may expect that for a given query, they may retrieve different results. However, how do these databases compare in terms of relevance of the retrieved results is relatively unexplored. This study, therefore, attempts to bridge this research gap by carrying out a systematic study of retrieval relevance of the three scholarly databases – Web of Science, Scopus and Dimensions. Five selected queries are used for this purpose. The retrieved results from the three databases for the given queries are first analysed in terms of volume of retrieved records, language of retrieved records, etc. Thereafter, a user-based annotation scheme is used to assess and compare the relevance of retrieved results. The standard measure of normalised discounted cumulative gain (NDCG) and Spearman rank correlation coefficient (SRCC) is computed for the purpose. Results indicate that although the number of retrieved results for the same query differs significantly in the three databases, the databases differ only marginally in retrieval relevance, with Web of Science having a slight edge over other two. Full Version
Institutional performance assessment had always been and will remain as a key challenge to a multitude of stakeholders. Most of the existing indicators including h-type indicators and many others do not reflect the expertise of institutions that defines their research portfolio. Recently, a set of expertise measures such as x and x(g) indices were introduced to reflect the expertise of institutions with respect to a specific discipline/field considering strengths in different finer level thematic areas. In this work, an adaptation of the x-index, namely the x_d-index is proposed to reflect the overall scholarly expertise of an institution considering its publication pattern and strength in different coarse thematic areas. This indicator is supposed to reflect the core expertise areas and also the diversity of the research portfolio of the institution. This indicator and associated framework are demonstrated on a dataset of 135 institutions. The ability to reflect diversity is validated by determining the correlation with a prominent diversity indicator. x_d-index is found to correlate with the major diversity indicator (Shannon’s diversity index) with better values as compared to institutional h-index and g-index. Further, the possibilities for effective management of the research portfolio of an institution by expanding its diversity is discussed in this work, that may aid concerned stakeholders. Full Version
India is now one of the major knowledge producers in the world, ranking among the top five countries in total research output. The research output is contributed by various institutions located in different states and regions of the country. There are, however, no existing studies on the amount of research output contributed by each state. Therefore, in this study, we undertook a territorial mapping of research output from India at the level of different states. Research output data for the country for the last 20 years (2001–20) were obtained from the Web of Science database, and publications were tagged to different states based on the location of the affiliating institution of the publication. The results show that Tamil Nadu, Maharashtra, Delhi, West Bengal, Karnataka, and Uttar Pradesh are among the top contributors, in that order. Almost all major states showed an increase in absolute research output during 2001–20. However, in relative terms, Tamil Nadu, Bihar and Punjab showed interesting growth patterns. Chandigarh and Puducherry had high total publications/gross state domestic product values. The analytical results of this study present useful quantitative measures of the research contributions of different states in India. Some probable reasons for the observed patterns and certain policy suggestions are also discussed in this study. Full Version
Scientific collaboration at regional and international levels has increased manifolds during the last 2 decades. The South Asian region, comprising of Afghanistan, Bangladesh, Bhutan, India, Maldives, Pakistan and Sri Lanka, habitats a significant part of the world population, and is emerging as a major knowledge producer. These South Asian countries are not only connected through shared history, language and culture, but also through an intergovernmental organization called South Asian Association for Regional Cooperation (SAARC). This article attempts to measure and characterize the research collaboration in the SAARC countries during 2001–2019. The research publication data for analysis is obtained from the Web of Science database. Different kinds of collaboration- inter, mixed, intra and domestic- among the SAARC countries are measured and analyzed through a computational analysis. Results indicate that SAARC countries collaborate more with countries outside the region than within the region. The within region collaboration has grown in volume but is still less than 1% of the total research output from the region. The collaboration is also found to vary across subject areas, with Social Science & Mathematics having higher proportion of international collaboration, Medical Science & Physics having higher mixed-collaboration, and Social Science & Environment Science having higher intra collaboration. Major implications of the results are discussed. Full Version
Indian Institutes of Management (IIMs) are among the most prestigious business schools in India, mainly offering postgraduate, doctoral and executive education programmes in the fields of Management and Business Education. They also contribute significantly to research in the area. This article attempts to analyse the bibliometric patterns in research output of IIMs. The data for research publications indexed in Scopus during 2010-19 is downloaded and analysed to identify important patterns and trends of research output, citations, international collaboration, open access, gender distribution and social media visibility. The results are also compared with three top internationally renowned business schools (Harvard Business School, MIT Sloan School of Management and NUS Business School). Results indicate that the older IIMs like Ahmedabad and Bangalore are placed at the top in terms of publication counts and citations. Newer IIMs like Rohtak and Raipur are found to be doing well in publications as compared to other IIMs of their generation. IIM Udaipur has more than 40 % of its research output internationally collaborated and also highest citations per paper value amongst all the IIMs. However, when the IIMs are compared with three well-known international schools (two of which have mentored the initial two IIMs), there appears a large gap in several indicators, such as h-index. The paper, thus, indicates that IIMs need to improve their research output and quality to be at par with the top business schools of the world. Research themes like ‘sustainability’, ‘emerging markets’ and ‘supply chain management’ are the most prominent thematic areas observed in the research output from IIMs, which indicates that IIMs are working on research topics of contemporary relevance. Full Version
The shift from ‘trust-based funding’ to ‘performance-based funding’ is one of the factors that has forced institutions to strive for continuous improvement of performance. Several studies have established the importance of collaboration in enhancing the performance of paired institutions. However, identification of suitable institutions for collaboration is sometimes difficult and therefore institutional collaboration recommendation systems can be vital. Currently, there are no well-developed institutional collaboration recommendation systems. In order to bridge this gap, we design a framework that recognizes the thematic strengths and core competencies of institutions, which can in turn be used for collaboration recommendations. The framework, based on NLP and network analysis techniques, is capable of determining the strengths of an institution in different thematic areas within a field and thereby determining the core competency and potential core competency areas of that institution. It makes use of recently proposed expertise indices such as x and x(g) indices for determination of core and potential core competency areas and can toss two kinds of recommendations: (i) for enhancement of strength of strong areas or core competency areas of an institution and (ii) for complementing the potentially strong areas or potential core competency areas of an institution. A major advantage of the system is that it can help to determine and improve the research portfolio of an institution within a field through suitable collaboration, which may lead to the overall improvement of the performance of the institution in that field. The framework is demonstrated by analyzing the performance of 195 Indian institutions in the field of ‘Computer Science’. Upon validation using standard metrics for novelty, coverage and diversity of recommendation systems, the framework is found to be of sufficient coverage and capable of tossing novel and diverse recommendations. The article thus p Full Version
Since the adoption of Sustainable Development Goals (SDGs) in 2015, various countries across the world have started programmes to achieve the relevant targets under SDGs. The advancements in research and development play a crucial role in achieving these targets. Motivated by this a few studies have tried to map the research publications with their relevance to specific SDGs. However, there are no existing detailed studies with reference to India. Therefore, this article attempts to measure the research activities on SDGs in India. It utilises standard bibliometrics approach and textual analysis of data collected from Dimensions database for a five-year period (2016–2020). The results show a positive response from the Indian research community towards the SDGs. About 12 percent of the total research output from India is found directly related to SDGs. The three SDGs namely SDG 3 (Good Health and Well-being), SDG 7 (Affordable and Clean Energy) and SDG 13 (Climate Change) have received most attention from the Indian research community. Technical subjects such as, Engineering, Medical and Health Sciences, and Chemical Sciences are the main contributors. The major contributing institutions, authors and journals are identified. Full Version
The access to knowledge is an important requirement for advancement of scientific and technological research and development of a country. Availability of resources is a crucial bottleneck for universities and colleges in developing countries. This leads to frequent use of pirate access sites like Sci-Hub by researchers. For instance, India has more than 900 universities and 40,000 colleges, and Africa has more than 1200 universities. Only a few of these institutions would have access to most of the research journals their scholars require. It is in this context that we tried to find out if there exist some resources which can provide links to open and free to download versions of scientific papers. Google Scholar, a heavily used resource for research article searches, is explored to see how effective it is in providing links to open access freely downloadable copies of scientific articles. The complete set of global scientific publications for the year 2016 are computationally analyzed through a web-mining approach, as an example, to see if Google Scholar is able to point to freely downloadable open text versions of scientific articles. Results show that Google Scholar points to full-text sources for about 69% of the articles queried, with about 43% of the articles having openly accessible full-texts. The results, thus, indicate that Google Scholar can be a useful tool for locating open access full-text versions of close to about half of the scientific articles of the world, which has special significance for under-developed and developing countries. Full Version
Collaboration in scientific research is believed to produce more useful and impactful research. The collaboration may involve multiple researchers from one institution or researchers from different institutions. Many times, such collaborations involve institutions belonging to different categories (say University, Industry or Government). This paper attempts to analyse the University–Industry–Government collaboration in research to find out whether such collaborated research outputs attract higher bibliometric and altmetric impact. Research output data of Indian institutions for the period 2010–2018 obtained from Web of Science database is used to demonstrate the analysis. The institutions are programmatically and manually tagged into one of the three categories (University, Industry or Government) depending on their type, and the research output involving different kinds of collaboration are identified and analysed. The results indicate that research papers involving University–Industry–Government collaboration do not differ significantly in terms of citations as compared to non-collaborated papers. However, an advantage in terms of social media mentions is found for different types of University-Industry-Government collaborated papers. Collaboration between U and I category entities, G and I category entities and the UIG collaboration is seen to get advantage in terms of mentions as compared to papers that do not involve such collaboration. Probable reasons for the observed patterns and implications of the results are discussed towards the end of the paper. Full Version
Power Laws are a characteristic distribution found in both natural as well as in man-made systems. Previous studies have shown that citations to scientific articles follow a power law, i.e., the number of papers having a certain level of citation x are proportional to x raised to some negative power. However, the distributional character of altmetrics (such as reads, likes, mentions, etc.) has not been studied in much detail, particularly with respect to existence of power law behaviours. This article, therefore, attempts to do an empirical analysis of altmetric mention data of a large set of scholarly articles to see if they exhibit power law. The individual and the composite data series of ‘mentions’ on the various platforms are fit to a power law distribution, and the parameters and goodness of fit are determined, both using least squares regression as well as the Maximum Likelihood Estimate (MLE) approach. We also explore the fit of the mention data to other distribution families like the Log-normal and exponential distributions. Results obtained confirm the existence of power law behaviour in social media mentions to scholarly articles. The Log-normal distribution also looks plausible but is not found to be statistically significant, and the exponential distribution does not show a good fit. Major implications of power law in altmetrics are given and interesting research questions are posed in pursuit of enhancing the reliability of altmetrics for research evaluation purposes. Full Version
The article presents an introduction to a newly created scientometric portal called Indian Science Reports, available at www.indianscience.net. The portal is designed to fulfil the need for a single integrated resource for analytical data about research competencies of India at an overall level as well as Indian institutions at an individual level. India’s research performance in terms of research output, citations, highly cited papers, international collaboration, open access levels, gender distribution and social media visibility etc. are computationally analysed using publication metadata collected from Dimensions database. The portal also provides a mechanism to look for research performance of all major Indian higher education institutions, on various standard parameters, through an institutional search. Further, a concept-based search is integrated to identify top performing Indian institutions on a given research topic. The portal, thus, provides an invaluable resource of Indian scientific research data and information, which can be used for various purposes ranging from scientometric evaluation to thrust area-based funding decisions. Full Version
Scientific journals are currently the primary medium used by researchers to report their research findings. The transformation of print journals into e-journals during the last two decades has not only simplified the process of submissions to journals but has also increased their access across the world. It is well-known that there are significant differences in the total number of journals indexed from different countries. It is, however, not very concretely known whether the lack of appropriate number of publication venues in a country (including in one or more subject areas) may inhibit its publication propensity in one way or other. This article, therefore, attempts to explore the relationship between the number of journals indexed from a country and its research output. Scopus database is used as reference database and the master journal list of Scopus is analysed to identify number of journals indexed from 50 selected countries, that have significant volume of research output. The publication data for the countries is obtained from Scopus. The following major relationships are observed: (a) number of journals from a country and its research output, (b) growth rate of journals and research output for different countries, (c) global share of journals and research output for different countries, and (d) subject area-wise number of journals and research output in that subject area for different countries. The results show that for majority of the countries, the number of journals indexed is positively correlated to their research output volume. A similar relationship is also observed in the subject area-wise analysis, confirming existence of the positive correlations between number of journals in a subject area and the research output in that subject area. However, several countries do not fully conform to the observed relationship, indicating that there are several other factors driving the research output of a country. The study, at the end, presents a discussio Full Version
The IEEE Access journal started in 2013, and in a short period, it has attained recognition for being a preferred multidisciplinary journal, with characteristics of rapid and continuous publishing. It is now ranked among the top journals in Engineering and Computer Science (General) by Scopus. Recognizing the distinctive nature of the journal and its contributions in the broader area of Engineering and Computer Science, this article attempts to present a detailed bibliometric analysis of the journal to identify publishing patterns, authorship and collaboration structure, citation impact, funding patterns of the published research, and the thematic structure of the publication. The gender distribution is also computed to identify papers published by male and female authors. The social media visibility of the articles and the Sustainable Development Goals (SDG) connections of articles were also identified. The results indicate that the IA journal can attract novel, high-quality multidisciplinary research, which aligns with the relevant and the most pressing SDGs. Furthermore, the journal has experienced increased multi-authored multidisciplinary research, and it is publishing a more significant percentage of articles with female first authors. Full Version
ResearchGate has emerged as a popular professional network for scientists and researchers in a very short span. Similar to Google Scholar, the ResearchGate indexing uses an automatic crawling algorithm that extracts bibliographic data, citations, and other information about scholarly articles from various sources. However, it has been observed that the two platforms often show different publication and citation data for the same institutions, journals, and authors. While several previous studies analysed different aspects of ResearchGate and Google Scholar, the quantum of differences in publications, citations, and metrics between the two and the probable reasons for the same are not explored much. This article, therefore, attempts to bridge this research gap by analysing and measuring the differences in publications, citations, and different metrics of the two platforms for a large data set of highly cited authors. The results indicate that there are significantly high differences in publications and citations for the same authors captured by the two platforms, with Google Scholar having higher counts for a vast majority of the cases. The different metrics computed by the two platforms also differ in their values, showing different degrees of correlation. The coverage policy, indexing errors, author attribution mechanism, and strategy to deal with predatory publishing are found to be the main probable reasons for the differences in the two platforms. Full Version
The currently prevailing international ranking systems for institutions are limited in their assessment as they only provide assessments either at an overall level or at very broad subject levels such as Science, Engineering, Medicine, etc. While these rankings have their own usage, they cannot be used to identify best institutions in a specific subject (say Computer Science) by taking into account their performance in different thematic areas of research of the given subject (say Artificial Intelligence or Machine Learning or Computer Vision etc. for the subject Computer Science). This paper tries to bridge this gap by proposing a framework that uses the NLP and Network approach for identifying the core competency of institutions and their thematic research strengths. The core competency can be viewed as a measure of breadth of research capability of an institution in a given subject, whereas thematic research strength can be viewed as depth of research of the institution in a specific theme of a subject. The working of the framework is demonstrated in the area of Computer Science for 195 Indian institutions. The framework can be useful for institutions and the scientometrics research community as a system providing a detailed assessment of the core competency and the research strengths of institutions in different thematic areas. The framework and outcomes can also be useful for funding agencies in devising programs for ‘performance-based funding’ in ‘thrust areas’ or ‘national priority areas’. Full Version
Linguistic resources for commonly used languages such as English and Mandarin Chinese are available in abundance, hence the existing research in these languages. However, there are languages for which linguistic resources are scarcely available. One of these languages is the Hindi language. Hindi, being the fourth-most popular language, still lacks in richly populated linguistic resources, owing to the challenges involved in dealing with the Hindi language. This article first explores the machine learning-based approaches—Naïve Bayes, Support Vector Machine, Decision Tree, and Logistic Regression—to analyze the sentiment contained in Hindi language text derived from Twitter. Further, the article presents lexicon-based approaches (Hindi Senti-WordNet, NRC Emotion Lexicon) for sentiment analysis in Hindi while also proposing a Domain-specific Sentiment Dictionary. Finally, an integrated convolutional neural network (CNN)—Recurrent Neural Network and Long Short-term Memory—is proposed to analyze sentiment from Hindi language tweets, a total of 23,767 tweets classified into positive, negative, and neutral. The proposed CNN approach gives an accuracy of 85%. Full Version
Coronavirus is a pandemic that has become a concern for the whole world. This disease has stepped out to its greatest extent and is expanding day by day. Coronavirus, termed as a worldwide disease, has caused more than 8 lakh deaths worldwide. The foremost cause of the spread of coronavirus is SARS-CoV and SARS-CoV-2, which are part of the coronavirus family. Thus, predicting the patients suffering from such pandemic diseases would help to formulate the difference in inaccurate and infeasible time duration. This paper mainly focuses on the prediction of SARS-CoV and SARS-CoV-2 using the B-cells dataset. The paper also proposes different ensemble learning strategies that came out to be beneficial while making predictions. The predictions are made using various machine learning models. The numerous machine learning models, such as SVM, Naïve Bayes, K-nearest neighbors, AdaBoost, Gradient boosting, XGBoost, Random forest, ensembles, and neural networks are used in predicting and analyzing the dataset. The most accurate result was obtained using the proposed algorithm with 0.919 AUC score and 87.248% validation accuracy for predicting SARS-CoV and 0.923 AUC and 87.7934% validation accuracy for predicting SARS-CoV-2 virus. Full Version
Indian classical dance (ICD) classification is an interesting subject because of its complex body posture. It provides a stage to experiment with various computer vision and deep learning concepts. With a change in learning styles, automated teaching solutions have become inevitable in every field, from traditional to online platforms. Additionally, ICD forms an essential part of a rich cultural and intangible heritage, which at all costs must be modernized and preserved. In this paper, we have attempted an exhaustive classification of dance forms into eight categories. For classification, we have proposed a deep convolutional neural network (DCNN) model using ResNet50, which outperforms various state-of-the-art approaches. Additionally, to our surprise, the proposed model also surpassed a few recently published works in terms of performance evaluation. The input to the proposed network is initially pre-processed using image thresholding and sampling. Next, a truncated DCNN based on ResNet50 is applied to the pre-processed samples. The proposed model gives an accuracy score of 0.911. Full Version
Network analysis is found to be effective for mining various aspects from technological and scientific literature. Path analysis, a major tool in the network analysis trio, consists of two crucial steps-weight assignment and search method. An innovative search scheme namely, key-route search method is found to be capable of retrieving multiple paths. Recently, a framework that explored the power of combined networks or NoNs (Network of networks) for interdisciplinarity assessment was proposed. In this work, a framework that utilizes the strength of NoNs and the integrated approach to path analysis is developed for mining the interdisciplinary trajectories in techno-scientific literature. Among the two major weight assignment schemes such as SPC and FV gradient schemes, FV gradient is found to better leverage key-route search scheme in mining multiple evolutionary trajectories with interdisciplinary interactions. This framework can serve as a handy tool for a multitude of beneficiaries including policy makers. Full Version
Emotion is an instinctive or intuitive feeling as distinguished from reasoning or knowledge. It varies over time, since it is a natural instinctive state of mind deriving from one’s circumstances, mood, or relationships with others. Since emotions vary over time, it is important to understand and analyze them appropriately. Existing works have mostly focused well on recognizing basic emotions from human faces. However, the emotion recognition from cartoon images has not been extensively covered. Therefore, in this paper, we present an integrated Deep Neural Network (DNN) approach that deals with recognizing emotions from cartoon images. Since state-of-works do not have large amount of data, we collected a dataset of size 8 K from two cartoon characters: ‘Tom’ & ‘Jerry’ with four different emotions, namely happy, sad, angry, and surprise. The proposed integrated DNN approach, trained on a large dataset consisting of animations for both the characters (Tom and Jerry), correctly identifies the character, segments their face masks, and recognizes the consequent emotions with an accuracy score of 0.96. The approach utilizes Mask R-CNN for character detection and state-of-the-art deep learning models, namely ResNet-50, MobileNetV2, InceptionV3, and VGG 16 for emotion classification. In our study, to classify emotions, VGG 16 outperforms others with an accuracy of 96% and F1 score of 0.85. The proposed integrated DNN outperforms the state-of-the-art approaches. Full Version
Institutions are known to have varying research strengths in different thematic areas. While some thematic areas are within the core competence of an institution, there may be other areas in which the institution is considered relatively week. This work proposes an expertise-based recommendation framework that can determine the stronger and weaker thematic areas of an institution based on their expertise and toss recommendations. The framework uses bibliometric and text data and applies methods from Network Science and Text Analytics. The recommendations provided can be useful for various purposes ranging from suggestions for institutional collaborations for improving an institution’s research performance in a weaker thematic area (by pairing with an institution stronger in the corresponding thematic area) to research place recommendations to prospective researchers. This unique capability of the framework is demonstrated using 196 research institutions in India. Results are compared with available evidence from different international rankings and the ability of the framework to provide novel recommendations is established. Full Version
Precise and timely information about opportunities for potential collaborations is very vital for the collaboration-intense research environment prevailing in innovation ecosystems. As the identification of suitable inventors for collaboration will be decisive for inventors in different phases of their careers, inventor collaboration recommendation systems are of great importance. Existing recommendation system frameworks for collaboration recommendations for academic authors and inventors are slightly intensive on the usage of link semantics. Like academic collaboration through co-authorship, collaborations of inventors through co-inventorship of patents can be found in almost all industrial areas in various degrees. Network representation of co-inventorship can be used to retrieve many insights that can even be vital for policymaking. In this work, for inventor collaboration recommendations, a minimal link semantics (MLS) approach based framework is built to overcome these major drawbacks and to improve usability. The case of inventors in the area ‘Wireless power transmission’ is analyzed using patent data for the demonstration of the MLS framework and on evaluation, the framework is found to be capable of retrieving novel and diverse recommendations to and from inventors that belong to different phases of a career. Full Version
Research management in academia and industry is a field which has so many opportunities due to the advancement in information and storage technologies. It also hoards as much challenges as that of the opportunities. Like academic collaborations through co-authorship, collaboration of inventors through co-inventorship of patents exists in almost all industrial areas in varying degrees. Understanding the nature of inventor collaboration network is vital for many applications including policy related ones. Identification of suitable inventors for collaboration will be decisive for inventors in different phases of their career, making collaboration management one of the concerns of research management. In this work, through a network based predictive approach, a preliminary design of an inventor collaboration recommendation system is proposed. As a case study, patent data from ‘Wireless Power Transmission’ is analysed and various implications are discussed. Full Version
Potential of complex network analysis to address complex systems such as stock markets is steadily gaining recognition. In this study, an approach for data mining of stock market based on complex networks is done as a preliminary for development of stock recommendation and/or portfolio management systems. Lobbying power of players is identified based on unweighted and weighted stock market networks that are created from United States stock data dynamics. Also, a criterion to check whether strength of correlation can significantly impact the assessment of local influence and lobbying power of players is devised. Portfolio analysis based on lobbying power and weighted lobbying power is carried out to reveal crucial industrial sections of the market. Our study revealed the affordability of offering financial services for firms belonging to such industrial sections for systemic risk reduction. Weighted lobby analysis is found to reveal in-depth structural insights for portfolio analysis than lobby centrality. Full Version
Purpose The main purpose of this study is to explore and validate the question “whether altmetric mentions can predict citations to scholarly articles”. The paper attempts to explore the nature and degree of correlation between altmetrics (from ResearchGate and three social media platforms) and citations. Design/methodology/approach A large size data sample of scholarly articles published from India for the year 2016 is obtained from the Web of Science database and the corresponding altmetric data are obtained from ResearchGate and three social media platforms (Twitter, Facebook and blog through Altmetric.com aggregator). Correlations are computed between early altmetric mentions and later citation counts, for data grouped in different disciplinary groups. Findings Results show that the correlation between altmetric mentions and citation counts are positive, but weak. Correlations are relatively higher in the case of data from ResearchGate as compared to the data from the three social media platforms. Further, significant disciplinary differences are observed in the degree of correlations between altmetrics and citations. Research limitations/implications The results support the idea that altmetrics do not necessarily reflect the same kind of impact as citations. However, articles that get higher altmetric attention early may actually have a slight citation advantage. Further, altmetrics from academic social networks like ResearchGate are more correlated with citations, as compared to social media platforms. Originality/value The paper has novelty in two respects. First, it takes altmetric data for a window of about 1–1.5 years after the article publication and citation counts for a longer citation window of about 3–4 years after the publication of article. Second, it is one of the first studies to analyze data from the ResearchGate platform, a popular academic social network, to understand the type and degree of correlations. Full Version
I propose an overall productivity index for actors in science and technology like authors, inventors, journals, assignees, etc., namely the ?-index. An actor has a ?-index value of ? if ? is the unique largest number such that citations earned by his/her/its ? top-cited papers/patents averages at least to (??+?1)/2. A major limitation of the h-index that led to the development of g-index is the motivation for ?-index too. h-index ignores the effect of massive citations yielded by papers/patents in the so-called h-core. Though g-index captures the effect of massive citations by top-cited papers/patents, the ability of such massive citations to offset the less cited papers/patents down the list is not substantially reflected. ?-index, on the other hand, reflects to the greatest possible extent, such an ability. The existence of three different kinds of actor profiles and important aspects related to each profile type is discussed. A framework for determination of the type of profile and the aspects associated with that profile is developed using ? and g indices. Application of the framework is demonstrated using the scholarly profiles of selected prominent contributors from two research fields—‘Network science’ and ‘Scientometrics’. Full Version
Social media platforms have now emerged as an important medium for wider dissemination of research articles; with authors, readers and publishers creating different kinds of social media activity about the article. Some research studies have even shown that articles that get more social media attention may get higher visibility and citations. These factors are now persuading journal publishers to integrate social media plugins in their webpages to facilitate sharing and dissemination of articles in social media platforms. Many past studies have analyzed several factors (like journal impact factor, open access, collaboration etc.) that may impact social media attention of scholarly articles. However, there are no studies to analyze whether the presence of social media plugin in a journal could result in higher social media attention of articles published in the journal. This paper aims to bridge this gap in knowledge by analyzing a sufficiently large-sized sample of 99,749 articles from 100 different journals. Results obtained show that journals that have social media plugins integrated in their webpages get significantly higher social media mentions and shares for their articles as compared to journals that do not provide such plugins. Authors and readers visiting journal webpages appear to be a major contributor to social media activity around articles published in such journals. The results suggest that publishing houses should actively provide social media plugin integration in their journal webpages to increase social media visibility (altmetric impact) of their articles. Full Version
Open Access is emerging as an important movement worldwide since last few years, triggered mainly by the high subscription cost of pay walled journals that create barriers in universal dissemination of knowledge reported in those journals. The paywall barriers to access of knowledge has become so problematic that even institutions in the developed countries are not only cancelling subscriptions but also mandating it for their researchers to either publish in open access journals or at least deposit their research papers in Institutional Repositories. The high subscription cost of journals is a more serious issue for developing countries, as it takes away institutional resources that can be used for other productive purposes. India has taken several steps in promoting open access, including release of an open access policy by Ministry of Science and Technology, however, it is not very clear that how effective these initiatives have been. This paper intends to address this issue. It examines published output, indexed in Web of Science, from 100 most productive institutions in India and analyze how much research output coming from them are available in Open Access (OA). The paper further analyzes availability of research papers from these institutions in the popular pirate site Sci-Hub. It is interesting to observe that legal OA percentages are significantly lesser than the Sci-Hub availability for all the institutions, an indication that the existing systems for promoting open access in India are not working efficiently. At the end, the paper also presents statistics about number of papers deposited in three central institutional repositories in India Full Version
Classification of research articles into different subject areas is an extremely important task in bibliometric analysis and information retrieval. There are primarily two kinds of subject classification approaches used in different academic databases: journal-based (aka source-level) and article-based (aka publication-level). The two popular academic databases- Web of Science and Scopus- use journal-based subject classification scheme for articles, which assigns articles into a subject based on the subject category assigned to the journal in which they are published. On the other hand, the recently introduced Dimensions database is the first large academic database that uses article-based subject classification scheme that assigns the article to a subject category based on its contents. Though the subject classification schemes of Web of Science have been compared in several studies, no research studies have been done on comparison of the article-based and journal-based subject classification systems in different academic databases. This paper aims to compare the accuracy of subject classification system of the three popular academic databases: Web of Science, Scopus and Dimensions through a large-scale user-based study. Results show that the commonly held belief of superiority of article-based subject classification over the journal-based subject classification scheme does not hold at least at the moment, as Web of Science appears to have the most accurate subject classification. Full Version
In recent times, sentiment analysis research has achieved tremendous impetus on English textual data, however, a very less amount of research has been focused on Nepali textual data. This work is focused towards Nepali textual data. We have explored machine learning approaches and proposed a lexicon-based approach using linguistic features and lexical resources to perform sentiment analysis for tweets written in Nepali language. This lexicon-based approach, first pre-process the tweet, locate the opinion-oriented features and then compute the sentiment polarity of tweet. We have investigated both conventional machine learning models (Multinomial Naïve Bayes (NB), Decision Tree, Support Vector Machine (SVM) and logistic regression) and deep learning models (Convolution Neural Network (CNN), Long Short-Term Memory (LSTM) and CNN-LSTM) for sentiment analysis of Nepali text. These machine learning models and lexicon-based approach have been evaluated on tweet dataset related to Nepal Earthquake 2015 and Nepal blockade 2015. Lexicon based approach has outperformed than conventional machine learning models. Deep learning models have outperformed than conventional machine learning models and lexicon-based approach. We have also created Nepali SentiWordNet and Nepali SenticNet sentiment lexicon from existing English language resources as by-product. Full Version
Open Access has emerged as an important movement worldwide during the last decade. There are several initiatives now that persuade researchers to publish in open access journals and to archive their pre- or post-print versions of papers in repositories. Institutions and funding agencies are also promoting ways to make research outputs available as open access. This paper looks at open access levels and patterns in research output from India by computationally analyzing research publication data obtained from Web of Science for India for the last 5 years (2014–2018). The corresponding data from other connected platforms—Unpaywall and Sci-Hub—are also obtained and analyzed. The results obtained show that about 24% of research output from India, during last 5 years, is available in legal forms of open access as compared to world average of about 30%. More articles are available in gold open access as compared to green and bronze. On the contrary, more than 90% of the research output from India is available for free download in Sci-Hub. We also found disciplinary differentiation in open access, but surprisingly these patterns are different for gold–green and black open access forms. Sci-Hub appears to be complementing the legal gold–green open access for less covered disciplines in them. The central institutional repositories in India are found to have low volume of research papers deposited. Full Version
Higher participation of women in higher education and research is a very important development goal in many countries across the world, with several countries creating special initiatives and schemes to increase participation of women in higher education and research. This article looks at a case study from India and aims to characterize the participation of women in research, by analysing the parameters of institution-type, discipline, citation impact and international collaboration. Research publication data from 50 most productive Indian institutions, along with data for 5 major institution systems, for a period of 10 years (2008 - 2017), as indexed in Web of Science, is obtained as sample data and analysed. Results obtained show that participation of women is found to vary in different disciplines, with biology (37%), agriculture science (32%), social science (31%) and medical science (32%) having relatively higher number of female 1st authored papers as compared to engineering (20%), information science (21%) and mathematics (22%). It is also observed that institutions specializing in medical sciences and social science have relatively better participation of women. Full Version
In the modern age of connected world and social media, research outcomes that are of direct interest & relevance to society are increasingly being shared and disseminated in news sources and social media platforms. Some studies have found that social media mentions of research papers can be an early indicator of their impact. India, which is now among the top 10 knowledge producers in the world, has more than 900 Universities that contribute to its research output. This paper tries to analyze as to what proportion of research output from the 100 most productive Indian institutions gets social media coverage. It is found that, while average social media coverage for India is around 28.5%, the coverage varies between 5% to 60% for different institutions. It is also observed that research output from institutions in some specific disciplines (such as Medical Science and Biological Science) attract more social media coverage as compared to others. The possible impact of geographical location (in a metro city) of an institution on social media coverage of its research output is analyzed as well. The findings present useful insight about social media coverage of research output of Indian institutions, which may be a proxy for societal relevance of the research work, and also indicate that suitable mechanisms need to be designed to promote dissemination of research results from Indian institutions in popular social media platforms. Full Version
Productivity assessment of various actors is one of the major concerns of Scientometrics and is vital for many applications that include policymaking. Popular productivity indices are not suitable for the determination of productivity of actors within a research context. A framework for the generation of metrics for contextual productivity assessment based on network approach has been recently proposed. However, that framework used full counting or full credit allocation, which incurs inflationary and equalizing bias. Schemes such as fractional and harmonic counting could reduce inflationary bias and harmonic counting has a repute of minimizing equalizing bias. As the existing framework for contextual productivity assessment is prone to inflationary and equalizing bias, empowering it with the provision to determine the right credit allocation scheme might take us closer to the achievement of a bias-free framework. In this work, a method to quantify the biases and to decide the right credit allocation scheme is introduced and using this we revamp the existing framework. As a case study, the productivity of inventors in the field ‘Wireless Power Transmission’ is determined. Implications from the real-world case study signify the effectiveness of the framework. Full Version
Open access (OA) has emerged as an important movement worldwide during the last decade. There are several calls now that not only persuade researchers to publish in OA journals, to archive their pre- or post-print versions of papers in repositories, but also institutions and funding agencies to promote OA of research publications. This article examines OA levels and patterns in research output by computationally analysing research publication data obtained from the Web of Science for India during the last five years (2014–2018). Results obtained show that about 24% of research output from India, during the last five years, is available in OA compared to world average of about 30%. More articles are available in gold OA compared to green and bronze OA. Furthermore, OA levels vary in different disciplines, with medical science, physics and biology having higher percentage of their articles available as OA as compared to those like arts and humanities, social science and (surprisingly) information science. Full Version
Scholarly articles are now increasingly being mentioned and discussed in social media platforms, sometimes even as pre- or post-print version uploads. Measures of social media mentions and coverage are now emerging as an alternative indicator of impact of scholarly articles. This article aims to explore how much scholarly research output from India is covered in different social media platforms, and how similar or different it is from the world average. It also analyses the discipline-wise variations in coverage and altmetric attention for Indian research output, including a comparison with the world average. Results obtained show interesting patterns. Only 28.5% of the total research output from India is covered in social media platforms, which is about 18% less than the world average. ResearchGate and Mendeley are the most popular social media platforms in India for scholarly article coverage. In terms of discipline-wise variation, medical sciences and biological sciences have relatively higher coverage across different platforms compared to disciplines like information science and engineering. Full Version
This paper tries to map the research work carried out in the field of Big Data through a detailed analysis of scholarly articles published on the theme during 2010-16, as indexed in Scopus. We have collected and analyzed all relevant publications on Big Data, as indexed in Scopus, through a quantitative as well as textual characterization. The analysis attempts to dwell into parameters like research productivity, growth of research and citations, thematic trends, top publication sources and emerging topics in this field. The analytical study also investigates country-wise publications output and impact in terms of average citations per paper, country-level collaboration patterns, authorship and leading contributors (countries, institutions) etc. The scholarly publication data is also subjected to a detailed textual analysis method to identify key themes in Big Data research, disciplinary variations and thematic trends and patterns. The results produce interesting inferences. Quantitative measures show that there has been a tremendous increase in number of publications related to Big Data during last few years. Research work in Big Data, though primarily considered a sub-discipline of Computer Science, is now carried out by researchers in many disciplines. Thematic analysis of publications in Big Data show that it’s a discipline involving research interest from fields as diverse as Medicine to Social Sciences. The paper also identifies major keywords now associated with Big Data research such as Cloud Computing, Deep Learning, Social Media and Data Analytics. This helps in a thorough understanding and visualization of the Big Data research area. Full Version
E-commerce websites provide an easy platform for users to put forth their viewpoints on different topics-ranging from a news item to any product in the market. Such online content encourages authors to express opinions on various aspects of an entity. Aspect based sentiment analysis deals with analyzing this textual content to look for the aspect in question. After locating the aspects, corresponding sentiment bearing words are looked for. This paper describes an integrated system that generates the opinionated aspect based graphical and extractive summaries from a large set of mobile reviews. The system focuses on three tasks (a) identification of aspects in given field, (b) computation of sentiment polarity of each aspect, and (c) generates opinionated aspect based graphical and extractive summaries. The system has been evaluated on three mobile-reviews dataset and obtains better precision and recall than baseline approach. The system generates summaries from reviews without any training. Full Version
During the last two decades the number of private universities in India has increased significantly. According to AISHE report of 2016, out of 799 universities in India, 277 are private universities, ie one out of every three universities in India is a private university. A significant proportion of colleges (about 78%) are also privately managed, as they do not contribute much to research activities and hence are not included in this analysis. Private universities are now becoming a major component of the Indian higher education system. Some of the private universities are exclusively positioning and projecting themselves as universities for high quality research and innovation. A few of them are now well placed in the national-level NIRF ranking framework. It is in this context that this paper presents a comparative account of research performance of the 25 most productive private universities with the set of Indian Institutes of Technology (IITs), Central Universities (CUs) and National Institutes of Technology (NITs), all of which have a well-established environment and culture of research. A set-based comparison methodology is followed. The results show good performance of private universities in research, especially in terms of output and rate of growth of output. However, on quality and productivity per capita and per rupee spent, they have a long way to go to match the performance levels of well-established centrally funded higher education institutions of India. This study presents detailed scientometric assessment of some most productive private universities in India. Full Version
Nanotechnology is a research field that has potential to drive the progress of mankind for the next few decades. Its application is found in every discipline, ranging from material science to space communication. Owing to its potential for ubiquity, and capability of replacing many general purpose technologies, co-existence of several paradigms are expected in nanotechnology. Flow Vergence (FV) gradient has been recently introduced as a metric to mine the network of scientific literature for detecting the paradigm shifts. In this paper, we have performed citation network analysis of scientific publications in nanotechnology from research area ‘engineering’ for identification of paradigms related to the same. Flow vergence gradient revealed 18 subnetworks that deal with 25 likely pivots of paradigm shifts. Major paradigm shifts can be found in the field of targeted drug delivery. Nanonetworks, a crossover of IT, BT and nanotechnology is the another interesting paradigm shift identified. An extended subnetwork analysis has been conducted to identify the competing or complementary nature of the emerging paradigms in the subnetworks. A framework for this has also been introduced. This analysis revealed that most of the paradigms in the targeted delivery are competing paradigms. Complementary paradigms are also identified in nano electronics and targeted drug delivery. Policy implications from this identification for various target groups are also discussed. Full Version
This article presents a bibliometric assessment of research performance of the National Institutes of Technology (NITs) in India. While many of these institutes were originally established in 1960s as Regional Engineering Colleges (RECs), they were upgraded to NITs around 2002 and later. Initially NITs offered only undergraduate programmes in engineering. However, during the last decade, several NITs have started postgraduate teaching and are focusing more on research activities. It is in this context that this article assesses the research performance of NITs during 2005–2016. The performance assessment uses research publication data obtained from the Web of Science index. The data collected are computationally analysed to identify productivity, productivity per capita, rate of growth of research, international collaboration pattern, citation impact and discipline-wise distribution of the research output for the NITs. The performance of NITs is also viewed vis-à-vis two top-performing Indian institutions, namely Indian Institute of Science, Bengaluru and Indian Institute of Technology Bombay, Mumbai. A simple single-value composite ranking of research performance of NITs is also presented by combining quantity and quality factors. The study presents an informative and useful account of assessment of research work in the NITs. Full Version
Amid the enormous volume of knowledge generated due to knowledge explosion, to a great extent, scientific literature mining can play a crucial role in research evaluation and tracking of important developments. Linked through citation relations, scientific literature forms network of papers or citation networks. Citation network analysis is gaining recognition as effective tool for research evaluation. Paradigm shift detection is of great importance to a multitude of beneficiaries and prediction of such paradigm shifts is of much greater gravity towards policy making. Recently, a metric named Flow Vergence (FV) gradient is proposed to detect paradigm shift pivots in scientific literature using the property named as FV effect. In this paper, its predictive power is investigated and tested statically and dynamically using publications in the field Nanotechnology for Engineering for a period 1989–2014. Validation included a post analysis validation of the field in 2017. As predictive power of FV gradient is confirmed, it can be regarded as an effective method for prediction of likely paradigm shifts and added to the tool kit of research evaluators and policy makers. Full Version
Profuse growth of scientometrics as a research field owes a discernible attribution to the introduction of citation networks and other scientograms. Centrality analysis, path analysis and cluster analysis are three major network analysis tools. Hummon and Doreian’s introduction of path retrieval methods based on (1) traversal count as weight assignment (for arcs) method and (2) search methods such as local (forward) search and global search, marked the commencement of path analysis. Original Hummon–Doreian traversal count based weight assignment methods such as Search Path Link Count and Search Path Node Pair were computationally complex. Along with the computational improvement of these weights, Batagelj added another computationally efficient traversal count method to the path analysis literature known as Search Path Count. A major development in search methods was seen recently with the introduction of innovative search methods such as backward (local) search and key-route (local and global) search by Liu and Lu. They also powered the available and new local search methods with a parameter to control the search. Major advantage of Liu–Lu methods lies in the fact that these can reveal more paths or more papers that are usually missed out in classical methods. All these contributions considered unweighted citation networks as the object of analysis. Despite being a tool of tremendous potential, path analysis is much underexplored relative to other network analysis tools. Inspired by these, we generalise Liu–Lu integrated approach, the present state-of-art in path analysis to an integrated approach for weighted networks. We demonstrate a manifold improvement in analysis opportunities with the generalized integrated approach using FV gradient weights for weight assignment, on a case study of the field ‘IT for engineering’. Integrated approach for weighted networks do not need additional implementation effort in PAJEK and this will be beneficial for a multitude of ana Full Version
Wireless Power Transmission is an emerging technology that enables the transmission of electricity without the use of artificial conducting mediums. Lack of significant technological breakthroughs widely affected its progress. In this paper, we analyze the early progress of WPT and attempt to forecast its growth using Pearl curve model. Fisher-Pry substitution model analysis indicates that technology substitution of wired transmission has just begun and half the substitution will be over by 2028. We conducted patent landscaping to identify hot domains and specific technology areas. Implications for technology developers and manufacturers, electric utility providers and national policymakers are also identified. Full Version
Books are an important source of knowledge to disseminate information. Researchers and academicians write books to propagate their innovative research or teachings amongst academic as well as non-academic audience. The number of books written every year is increasing rapidly. According to International Publisher Association (IPA) annual report 2015–2016, around 150 million different books were published worldwide in 2014–2015. Many e-commerce websites are also involved in selling books. A recent addition to book publishing world is e-books, which have really made it very simple to publish. While, availability of large number of books is good for readers, at the same time it is challenging to find a good book, particularly in scholarly settings. Researchers in the area of Scientometrics have attempted to view assessment of goodness of a scholarly book by measuring citations that a book receive. However, citations alone are not a true measure of a book’s impact. Many a times people use the knowledge in a book without actually citing it. Also use of books in classroom settings or for general reading often is not reflected in terms of citations. Therefore, it is important to obtain users’s opinion about a book from other forms of data. Fortunately, we have now some data of this sort available in form of reviews, downloads and social media mentions etc. Amazon and Goodreads, both of which provide the readers’ views about a book, are two good examples. This paper presents an exploratory research work on using these non-traditional data about books to assess impact of a book. A set of Scopus-indexed computer science books with good citations as well as some other popular books in computer science domain are used for analysis. The reviews of books have been crawled in an automated fashion from Amazon and Goodreads. Thereafter sentiment analysis is carried out the text of reviews. Results of sentiment analysis are compared and correlated with traditional impact assessment me Full Version
Scholarly articles are considered one of the primary medium for dissemination of inventions and discoveries. Traditionally, usefulness and popularity of a scholarly article has been measured in terms of citations it receives. However, in the changed research publishing landscape, where most of the publications are now available in digital form accessible through various digital libraries; new measures of measuring usefulness of scholarly articles have emerged. Nowadays, scholarly articles are easily available for access and download from various digital access portals. The use and popularity of these digital access portals has also made it possible to integrate various social media platforms with journal access and use. Most of the journals now maintain statistics about reads, number of downloads, social profile shares etc. Several newer platforms like ResearchGate, Academia and Mendeley have also become popular. Researchers now often share their articles on various such platforms and also use social media channels to disseminate their article to a wider audience. This transformed environment has allowed to track and measure usefulness and popularity of scholarly articles through alternative metrics (now popularly known as Altmetrics) as compared to traditional citation impact measures. Altmetrics attempts to derive impact of a scholarly article by using data from different kinds such as social network share, mentions, tweets etc. The use of Altmetrics varies widely from country to country and discipline to discipline. This paper attempts to present findings of an exploratory analysis of relevance of Altmetrics data through a case study of scholarly articles from India published during 2016 and indexed in Web of Science and also updated on ResearchGate. The results obtained provide an interesting insight on relatedness and correlation of presence of scholarly articles in Web of Science and ResearchGate. It is observed that about 61% papers indexed in Web of Science Full Version
This paper presents an integrated framework to generate extractive aspect-based opinion summary from a large volume of free-form text reviews. The framework has three major components: (a) aspect identifier to determine the aspects in a given domain; (b) sentiment polarity detector for computing the sentiment polarity of opinion about an aspect; and (c) summary generator to generate opinion summary. The framework is evaluated on SemEval-2014 dataset and obtains better results than several other approaches. Full Version
As more and more interdisciplinary areas are emerging at a quick pace, analysis of interdisciplinary interactions among disciplines is of great importance. Citation network analysis is advancing as a tool to extricate policy implications from scientific as well as patent literature. Most of the studies related to interdisciplinarity under the lens of network analysis were concentrated mainly on journal–journal citation networks. Citation networks of articles reflect the accumulation as well as flow of knowledge. Therefore, specific developments that might cause interdisciplinary evolution can be identified. Due to this underexplored potential, we attempt to investigate interdisciplinarity at the level of published articles. As the interdisciplinary interactions among two disciplines is reflected by common/boundary papers and the arcs of cross-disciplinary citations, a quantitative methodology for assessing the strength of interdisciplinary interactions, dominant mode of interaction, mutual contribution, etc., is developed based on these. Interdisciplinary interactions among the fields ‘biotechnology for energy’ and ‘nanotechnology for energy’ is chosen as a case study. The existence of ‘mutual contribution’ among these fields is identified. ‘Biotechnology for energy’ is found to contribute more to the development of ‘nanotechnology for energy’ than vice versa. Important specific developments associated with interdisciplinary interactions are also explored using qualitative decision rules. Quantitative as well as qualitative methods devised in this paper form a framework for interdisciplinarity assessment that can be used by various decision makers. Full Version
This article presents the research performance of the 39 central universities in India. The research publication data, indexed in the Web of Science, for the 39 central universities for a 25-year period (1990–2014) are used for analysis. The data are computationally analysed to identify productivity, productivity per capita, productivity per crore rupees grant, rate of growth of research output, authorship and collaboration pattern, citation impact and discipline-wise research strength of these institutions. Research performance of the central universities is measured and compared with two top-ranking world universities, namely University of Cambridge and Stanford University. While older well-established big universities such as University of Delhi and Banaras Hindu University perform better than newer universities, some relatively smaller universities, such as the university of Hyderabad have impressive research performance. What is disturbing is that combined research output of all central universities taken together is less than that of either of University of Cambridge or Stanford University alone. The results also provide discipline-wise research strengths of all the universities. Full Version
Aspect-level sentiment analysis refers to sentiment polarity detection from unstructured text at a fine-grained feature or aspect level. This paper presents our experimental work on aspect-level sentiment analysis of movie reviews. Movie reviews generally contain user opinion about different aspects such as acting, direction, choreography, cinematography, etc. We have devised a linguistic rule-based approach which identifies the aspects from movie reviews, locates opinion about that aspect and computes the sentiment polarity of that opinion using linguistic approaches. The system generates an aspect-level opinion summary. The experimental design is evaluated on datasets of two movies. The results achieved good accuracy and shows promise for deployment in an integrated opinion profiling system. Full Version
This paper describes an integrated aspect level opinion summary generation system for movie reviews. The system, named as Movie Prism, analyses each movie review, locates aspect term in it, identifies opinion about those aspects and then generates a visual aspect based opinion summary of the movie in question. At present, the movie reviews and other related information is being automatically fetched from IMDb for all the movies released during the years 2010 to 2014. The system has an integrated crawler for this purpose. Further, ontology for the movie domain is created for better aspect identification. We have evaluated the system on three annotated movie review datasets. The system obtains good accuracy. Overall the designed system is capable of producing visual aspect level opinion summaries from unstructured textual reviews, without any need of training and results have a reasonable degree of accuracy. Full Version
This article presents a computational analysis of the research performance of 16 relatively older Indian Institutes of Technology (IITs) in India. The research publication data indexed in Web of Science for all the 16 IITs is used for the analysis. The data is computationally analysed to identify productivity, productivity per capita, rate of growth of research output, authorship and collaboration pattern, citation impact and discipline-wise research strengths of the different IITs. The research performances of the IITs have been compared with those of two top ranking engineering and technology institutions of the world (MIT-USA and NTU-Singapore) and most cited papers from these IITs have also been identified. The analytical results are expected to provide a informative, up-to-date and useful account of research performance assessment of the IITs Full Version
The new transformed read-write Web has resulted in a rapid growth of user generated content on the Web resulting into a huge volume of unstructured data. A substantial part of this data is unstructured text such as reviews and blogs. Opinion mining and sentiment analysis (OMSA) as a research discipline has emerged during last 15 years and provides a methodology to computationally process the unstructured data mainly to extract opinions and identify their sentiments. The relatively new but fast growing research discipline has changed a lot during these years. This paper presents a scientometric analysis of research work done on OMSA during 2000–2016. For the scientometric mapping, research publications indexed in Web of Science (WoS) database are used as input data. The publication data is analyzed computationally to identify year-wise publication pattern, rate of growth of publications, types of authorship of papers on OMSA, collaboration patterns in publications on OMSA, most productive countries, institutions, journals and authors, citation patterns and an year-wise citation reference network, and theme density plots and keyword bursts in OMSA publications during the period. A somewhat detailed manual analysis of the data is also performed to identify popular approaches (machine learning and lexicon-based) used in these publications, levels (document, sentence or aspect-level) of sentiment analysis work done and major application areas of OMSA. The paper presents a detailed analytical mapping of OMSA research work and charts the progress of discipline on various useful parameters. Full Version
Information networks, especially citation networks, have many proven and potential applications in scientometrics. Identification of productivity of authors and journals is one of the prime concern of analysts. While there are many indices to measure the productivity of author or journal, there is no known index to determine productivity with respect to a particular research context. A network scientometric approach is devised to address the identification of contextual productivity. Work-author and Work-journal affiliations modelled as 2 mode networks provide effective means to assess the productivity of authors and journals in a particular research context. In this work, weighted 2 mode networks are created for the analysis of affiliations networks such that weights reflect some citation characteristics of the works in their original citation network. A set of network indices are proposed for the assessment of contextual importance of authors and journals which are illustrated in the case study of Biotechnology for Engineering. Online databases and digital libraries can use these indices to gather insights about most productive authors and journals, along with the search results. Full Version
It is now generally accepted that institutions of higher education and research, largely publicly funded, need to be subjected to some benchmarking process or performance evaluation. Currently there are several international ranking exercises that rank institutions at the global level, using a variety of performance criteria such as research publication data, citations, awards and reputation surveys etc. In these ranking exercises, the data are combined in specified ways to create an index which is then used to rank the institutions. These lists are generally limited to the top 500–1000 institutions in the world. Further, some criteria (e.g., the Nobel Prize), used in some of the ranking exercises, are not relevant for the large number of institutions that are in the medium range. In this paper we propose a multidimensional ‘Quality–Quantity’ Composite Index for a group of institutions using bibliometric data, that can be used for ranking and for decision making or policy purposes at the national or regional level. The index is applied here to rank Central Universities in India. The ranks obtained compare well with those obtained with the h-index and partially with the size-dependent Leiden ranking and University Ranking by Academic Performance. A generalized model for the index using other variables and variable weights is proposed. Full Version
This article describes our effort to measure the research competitiveness of Indian Institutes of Science Education and Research (IISERs) through a scientometric analysis of their research output during the last five years (2010–14). The research output indexed in Web of Science of the five recently established IISERs has been obtained and analysed computationally to identify growth trends, per capita output, authorship and collaboration patterns, citation impact, average citation per paper, etc. The research performance of IISERs is also compared with the Indian Institute of Science and the Indian Institute of Technology system to obtain an assessment of their research potential. Thus the article presents a useful and detailed analytical account of research potential and competitiveness of IISERs. Full Version
This paper presents a scientometric analysis of research work done on the emerging area of ‘Big Data’ during the recent years. Research on ‘Big Data’ started during last few years and within a short span of time has gained tremendous momentum. It is now considered one of the most important emerging areas of research in computational sciences and related disciplines. We have analyzed the research output data on ‘Big Data’ during 2010–2014 indexed in both, the Web of Knowledge and Scopus. The analysis maps comprehensively the parameters of total output, growth of output, authorship and country-level collaboration patterns, major contributors (countries, institutions and individuals), top publication sources, thematic trends and emerging themes in the field. The paper presents an elaborate and one of its kind scientometric mapping of research on ‘Big Data’. Full Version
This paper presents our experimental work towards detecting sentiment polarity of free-form texts: first by using an ensemble of sentiment lexicons and then through a lexicon pooled machine learning classifier. In the ensemble design, we combined four different sentiment lexicons in different ways to determine sentiment polarities of different text data. The ensemble approach, however, did not achieve superior performance as initially thought. Therefore, in the second design, we tried to pool the sentiment lexicon knowledge into the machine learning classification process itself of a multinomial naive Bayes classifier. The experimental designs are evaluated on three document and two sentence datasets. The lexicon pooled approach obtains superior accuracy levels as compared to standard naive Bayes classifier as well as lexicon-based methods. Furthermore, as the amount of training data decreases, the accuracy levels of lexicon pooled machine learning classifier decays slowly as compared to standalone naive Bayes classifier. The framework presented proves useful and robust and can be extended to any classification task. Full Version
Biotechnology, ever since its inception has had a huge impact on the society and its various applications have been intricately woven into the human web of life. Its evolution amidst all the other research realms vital to mankind is remarkable. In this paper, we intend to identify the radical innovations in Biotechnology for Engineering using network analyses. Centrality analysis and Path analysis are used for identifying important works. Existence of Flow Vergence effect in the scientific literature is revealed. Flow Vergence gradient, an arc metric derived from FV model, is utilised for Path analysis which detects pivotal papers of paradigm shift more accurately. A major paradigm shift has been identified in the business models of Biotechnology for Engineering — ‘Capability to Connectivity’ model. Evidence towards the adoption of business practices in BT firms by nanotechnology start-ups is also identified. The notion of critical divergence is introduced and the exhibition of interdisciplinary interaction in emerging fields due to critical divergence is discussed. Implications of above analyses which target: (i) Science and technology policy makers, (ii) industrialists and investors, (iii) researchers in academia as well as industry, are also discussed. Full Version
This paper presents our algorithmic design for a lexicon pooled approach for opinion mining from course feedbacks. The proposed method tries to incorporate lexicon knowledge into the machine learning classification process through a multinomial process. The algorithmic formulations have been evaluated on three datasets obtained from ratemyprofessor.com. The results have also been compared with standalone machine learning and lexicon based approaches. The experimental results show that the lexicon pooled approach obtains higher accuracy than both the standalone implementations. The paper, thus proposes and demonstrates how a lexicon pooled hybrid approach may be a preferred technique for opinion mining from course feedbacks and hence suitable for develpment in a practical caurse feedback mining system. Full Version
Detection of emerging fields in any industry is of great importance to the industrialists, engineers and policy makers of business as well as state administration. Exact awareness of the paradigm which governs current research activities and chances of likely paradigm shifts which could redefine the research approaches, is very crucial for the actors of scientific community and policy makers. Excellent technologies in IT, even accelerated the scientific and applied ontological pursuit in both academia as well as industry. In this work, network approach is advocated for the identification of innovations, new paradigms and emerging fields in the IT industry in the research area ‘engineering’. The network is a scientific network of research publications which reflects the volume and flow of scientific activities. Centrality analysis, path analysis, cluster analysis, etc. are used to identify the key papers of paradigm shifts, emerging fields, relatively important clusters and works respectively. A new metric, flow vergence index is devised for cluster analysis. The paradigm shift identified from this network is RFID technology, related with the supply chain management. With proper economic and policy supports, there are some good reasons to look forward for more wonders from the industry. Full Version
This paper presents our experimental work on two aspects of sentiment analysis. First, we evaluate the performance of different machine learning as well as lexicon based methods for sentiment analysis of texts obtained from variety of sources. Our performance evaluation results are on six different datasets of different kinds, including movie reviews, blog posts and twitter feeds. To the best of our knowledge no such work on comprehensive evaluative account involving different techniques on variety of datasets have been reported earlier. The second major work that we report here is about the heuristic based scheme that we devised for aspect-level sentiment profile generation of movies. Our algorithmic formulation parses the user reviews for a movie and generates a sentiment polarity profile of the movie based on opinion expressed on various aspects in the user reviews. The results obtained for the aspect-level computation are also compared with the corresponding results obtained from the document-level approach. In summary, the paper makes two important contributions: (a) it presents a detailed evaluative account of both supervised and unsupervised algorithmic formulations on six datasets of different varieties, and (b) it proposes a new heuristic based aspect-level sentiment computation approach for movie reviews, which results in a more focused and useful sentiment profile for the movies. Full Version
This paper presents our algorithmic approach for information and relation extraction from unstructured texts (such as from eBook sections or webpages), performing other useful analytics on the text, and automatically generating a semantically meaningful structure (RDF schema). Our algorithmic formulation parses the unstructured text from eBooks and identifies key concepts described in the eBook along with relationship between the concepts. The extracted information is then used for four purposes: (a) for generating some computed metadata about the text source (such as readability of an eBook), (b) generate a concept profile for each distinct part of text, (c) identifying and plotting relationship between key concepts described in the text, and (d) to generate RDF representation for the text source. We have done our experiments on eBook texts from Computer Science domain; however, the approach can be applied to work on different forms of text in other domains as well. The results are not only useful for concept based tagging and navigation of unstructured text documents (such as eBook) but can also be used to design a comprehensive and sophisticated learning recommendation system. Full Version
This paper presents our experimental work to design a content-based recommendation system for eBook readers. The system automatically identifies a set of relevant eResources for a reader, reading a particular eBook, and presents them to the user through an integrated interface. The system involves two different phases. In the first phase, we parse the textual content of the eBook currently read by the user to identify learning concepts being pursued. This requires analysing the text of relevant part(s) of the eBook to extract concepts and subsequently filter them to identify learning concepts of interest to Computer Science domain. In the second phase, we identify a set of relevant eResources from the World Wide Web. This involves invoking publicly available APIs from Slideshare, LinkedIn, YouTube etc. to retrieve relevant eResources for the learning concepts identified in the first part. The system is evaluated through a multi-faceted process involving tasks like sentiment analysis of user reviews of the retrieved set of eResources for recommendations. We strive to obtain an additional wisdom-of-crowd kind of evaluation of our system by hosting it on a public Web platform. Full Version
In this paper, we present an algorithmic formulation to automatically extract learning concepts and their relationships from eBook texts and to generate an RDF data that can be used for a number of purposes. Our algorithmic approach first extracts various parts of an eBook (such as chapters and sections) and then through a sentence-level parsing scheme identifies learning concepts described in the eBook text. We have programmed for the identification and extraction of relationships between different learning concepts occurring in a section. We have also been able to extract some general data about the eBooks such as author, price, and reviews (through eBook content mining and web crawling). The learning concepts, their relationships and other useful information extracted from the eBooks; is then programmatically transformed into a machine readable RDF data. The automated process of concept and relation extraction and their subsequent storage into RDF data, makes our effort important and useful for tasks like Information Extraction, Concept-based Search and Machine Reading. Full Version
Scholarly databases are now being increasingly used for search and retrieval of research articles in different subject areas. Several previous studies have shown that different databases vary in their coverage of publication sources, and therefore, one may expect that for a given query, they may retrieve different results. However, how do these databases compare in terms of relevance of the retrieved results is relatively unexplored. This study, therefore, attempts to bridge this research gap by carrying out a systematic study of retrieval relevance of the three scholarly databases – Web of Science, Scopus and Dimensions. Five selected queries are used for this purpose. The retrieved results from the three databases for the given queries are first analysed in terms of volume of retrieved records, language of retrieved records, etc. Thereafter, a user-based annotation scheme is used to assess and compare the relevance of retrieved results. The standard measure of normalised discounted cumulative gain (NDCG) and Spearman rank correlation coefficient (SRCC) is computed for the purpose. Results indicate that although the number of retrieved results for the same query differs significantly in the three databases, the databases differ only marginally in retrieval relevance, with Web of Science having a slight edge over other two. Full Version