Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
SubscribeEvolution of ESG-focused DLT Research: An NLP Analysis of the Literature
As Distributed Ledger Technologies (DLTs) rapidly evolve, their impacts extend beyond technology, influencing environmental and societal aspects. This evolution has increased publications, making manual literature analysis increasingly challenging. We address this with a Natural Language Processing (NLP)-based systematic literature review method to explore the intersection of Distributed Ledger Technology (DLT) with its Environmental, Social, and Governance (ESG) aspects. Our approach involves building and refining a directed citation network from 107 seed papers to a corpus of 24,539 publications and fine-tuning a transformer-based language model for Named Entity Recognition (NER) on DLT and ESG domains. Applying this model, we distilled the corpus to 505 key publications, enabling an inaugural literature review and temporal graph analysis of DLT's evolution in ESG contexts. Our contributions include an adaptable and scalable NLP-driven systematic literature review methodology and a unique NER dataset of 54,808 entities, tailored for DLT and ESG research. Our inaugural literature review demonstrates their applicability and effectiveness in analyzing DLT's evolution and impacts, proving invaluable for stakeholders in the DLT domain.
Advanced Unstructured Data Processing for ESG Reports: A Methodology for Structured Transformation and Enhanced Analysis
In the evolving field of corporate sustainability, analyzing unstructured Environmental, Social, and Governance (ESG) reports is a complex challenge due to their varied formats and intricate content. This study introduces an innovative methodology utilizing the "Unstructured Core Library", specifically tailored to address these challenges by transforming ESG reports into structured, analyzable formats. Our approach significantly advances the existing research by offering high-precision text cleaning, adept identification and extraction of text from images, and standardization of tables within these reports. Emphasizing its capability to handle diverse data types, including text, images, and tables, the method adeptly manages the nuances of differing page layouts and report styles across industries. This research marks a substantial contribution to the fields of industrial ecology and corporate sustainability assessment, paving the way for the application of advanced NLP technologies and large language models in the analysis of corporate governance and sustainability. Our code is available at https://github.com/linancn/TianGong-AI-Unstructure.git.
Deep Reinforcement Learning for ESG financial portfolio management
This paper investigates the application of Deep Reinforcement Learning (DRL) for Environment, Social, and Governance (ESG) financial portfolio management, with a specific focus on the potential benefits of ESG score-based market regulation. We leveraged an Advantage Actor-Critic (A2C) agent and conducted our experiments using environments encoded within the OpenAI Gym, adapted from the FinRL platform. The study includes a comparative analysis of DRL agent performance under standard Dow Jones Industrial Average (DJIA) market conditions and a scenario where returns are regulated in line with company ESG scores. In the ESG-regulated market, grants were proportionally allotted to portfolios based on their returns and ESG scores, while taxes were assigned to portfolios below the mean ESG score of the index. The results intriguingly reveal that the DRL agent within the ESG-regulated market outperforms the standard DJIA market setup. Furthermore, we considered the inclusion of ESG variables in the agent state space, and compared this with scenarios where such data were excluded. This comparison adds to the understanding of the role of ESG factors in portfolio management decision-making. We also analyze the behaviour of the DRL agent in IBEX 35 and NASDAQ-100 indexes. Both the A2C and Proximal Policy Optimization (PPO) algorithms were applied to these additional markets, providing a broader perspective on the generalization of our findings. This work contributes to the evolving field of ESG investing, suggesting that market regulation based on ESG scoring can potentially improve DRL-based portfolio management, with significant implications for sustainable investing strategies.
Statements: Universal Information Extraction from Tables with Large Language Models for ESG KPIs
Environment, Social, and Governance (ESG) KPIs assess an organization's performance on issues such as climate change, greenhouse gas emissions, water consumption, waste management, human rights, diversity, and policies. ESG reports convey this valuable quantitative information through tables. Unfortunately, extracting this information is difficult due to high variability in the table structure as well as content. We propose Statements, a novel domain agnostic data structure for extracting quantitative facts and related information. We propose translating tables to statements as a new supervised deep-learning universal information extraction task. We introduce SemTabNet - a dataset of over 100K annotated tables. Investigating a family of T5-based Statement Extraction Models, our best model generates statements which are 82% similar to the ground-truth (compared to baseline of 21%). We demonstrate the advantages of statements by applying our model to over 2700 tables from ESG reports. The homogeneous nature of statements permits exploratory data analysis on expansive information found in large collections of ESG reports.
EaSyGuide : ESG Issue Identification Framework leveraging Abilities of Generative Large Language Models
This paper presents our participation in the FinNLP-2023 shared task on multi-lingual environmental, social, and corporate governance issue identification (ML-ESG). The task's objective is to classify news articles based on the 35 ESG key issues defined by the MSCI ESG rating guidelines. Our approach focuses on the English and French subtasks, employing the CerebrasGPT, OPT, and Pythia models, along with the zero-shot and GPT3Mix Augmentation techniques. We utilize various encoder models, such as RoBERTa, DeBERTa, and FinBERT, subjecting them to knowledge distillation and additional training. Our approach yielded exceptional results, securing the first position in the English text subtask with F1-score 0.69 and the second position in the French text subtask with F1-score 0.78. These outcomes underscore the effectiveness of our methodology in identifying ESG issues in news articles across different languages. Our findings contribute to the exploration of ESG topics and highlight the potential of leveraging advanced language models for ESG issue identification.
Enhancing Retrieval for ESGLLM via ESG-CID -- A Disclosure Content Index Finetuning Dataset for Mapping GRI and ESRS
Climate change has intensified the need for transparency and accountability in organizational practices, making Environmental, Social, and Governance (ESG) reporting increasingly crucial. Frameworks like the Global Reporting Initiative (GRI) and the new European Sustainability Reporting Standards (ESRS) aim to standardize ESG reporting, yet generating comprehensive reports remains challenging due to the considerable length of ESG documents and variability in company reporting styles. To facilitate ESG report automation, Retrieval-Augmented Generation (RAG) systems can be employed, but their development is hindered by a lack of labeled data suitable for training retrieval models. In this paper, we leverage an underutilized source of weak supervision -- the disclosure content index found in past ESG reports -- to create a comprehensive dataset, ESG-CID, for both GRI and ESRS standards. By extracting mappings between specific disclosure requirements and corresponding report sections, and refining them using a Large Language Model as a judge, we generate a robust training and evaluation set. We benchmark popular embedding models on this dataset and show that fine-tuning BERT-based models can outperform commercial embeddings and leading public models, even under temporal data splits for cross-report style transfer from GRI to ESRS
ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge
We introduce ESGenius, a comprehensive benchmark for evaluating and enhancing the proficiency of Large Language Models (LLMs) in Environmental, Social and Governance (ESG) and sustainability-focused question answering. ESGenius comprises two key components: (i) ESGenius-QA, a collection of 1 136 multiple-choice questions generated by LLMs and rigorously validated by domain experts, covering a broad range of ESG pillars and sustainability topics. Each question is systematically linked to its corresponding source text, enabling transparent evaluation and supporting retrieval-augmented generation (RAG) methods; and (ii) ESGenius-Corpus, a meticulously curated repository of 231 foundational frameworks, standards, reports and recommendation documents from seven authoritative sources. Moreover, to fully assess the capabilities and adaptation potential of the model, we implement a rigorous two-stage evaluation protocol -- Zero-Shot and RAG. Extensive experiments across 50 LLMs (ranging from 0.5 B to 671 B parameters) demonstrate that state-of-the-art models achieve only moderate performance in zero-shot settings, with accuracies typically around 55--70\%, highlighting ESGenius's challenging nature for LLMs in interdisciplinary contexts. However, models employing RAG show significant performance improvements, particularly for smaller models. For example, "DeepSeek-R1-Distill-Qwen-14B" improves from 63.82\% (zero-shot) to 80.46\% with RAG. These results underscore the necessity of grounding responses in authoritative sources for enhanced ESG understanding. To the best of our knowledge, ESGenius is the first benchmark curated for LLMs and the relevant enhancement technologies that focuses on ESG and sustainability topics.
Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias
Diagnosing deep neural networks (DNNs) through the eigenspectrum of weight matrices has been an active area of research in recent years. At a high level, eigenspectrum analysis of DNNs involves measuring the heavytailness of the empirical spectral densities (ESD) of weight matrices. It provides insight into how well a model is trained and can guide decisions on assigning better layer-wise training hyperparameters. In this paper, we address a challenge associated with such eigenspectrum methods: the impact of the aspect ratio of weight matrices on estimated heavytailness metrics. We demonstrate that matrices of varying sizes (and aspect ratios) introduce a non-negligible bias in estimating heavytailness metrics, leading to inaccurate model diagnosis and layer-wise hyperparameter assignment. To overcome this challenge, we propose FARMS (Fixed-Aspect-Ratio Matrix Subsampling), a method that normalizes the weight matrices by subsampling submatrices with a fixed aspect ratio. Instead of measuring the heavytailness of the original ESD, we measure the average ESD of these subsampled submatrices. We show that measuring the heavytailness of these submatrices with the fixed aspect ratio can effectively mitigate the aspect ratio bias. We validate our approach across various optimization techniques and application domains that involve eigenspectrum analysis of weights, including image classification in computer vision (CV) models, scientific machine learning (SciML) model training, and large language model (LLM) pruning. Our results show that despite its simplicity, FARMS uniformly improves the accuracy of eigenspectrum analysis while enabling more effective layer-wise hyperparameter assignment in these application domains. In one of the LLM pruning experiments, FARMS reduces the perplexity of the LLaMA-7B model by 17.3% when compared with the state-of-the-art method.
An Updated Line List for Spectroscopic Investigation of G Stars II: Refined Solar Abundances via Extended Wavelength Coverage to 10 000 Å
This study introduces a line list for the abundance analysis of F and G type stars across the 4080-9675 A wavelength range. A systematic search employing lower excitation potentials, accurate log gf values, and an updated multiplet table led to the identification of 592 lines across 33 species (25 elements), including C, O, Mg (ionized), Al, P, S, Cu, Zr (neutral), and La. To determine the uncertainties in log gf values, we assessed solar abundance using a very high-resolution (R=1000000) disk-integrated solar spectrum. These lines were confirmed to be blend-free in the solar spectrum. The line list was further validated by analyzing the metal-poor star HD 218209 (G6V), which is notable for its well-documented and reliable abundance in literature. The abundances were obtained using the equivalent width (EW) method and further refined by applying the spectrum synthesis method. A comparative analysis with the Gaia ESO line list v.6, provided by the Gaia ESO collaboration, revealed additional neutral and ionized Fe lines. This extensively refined line list will facilitate precise stellar parameter determinations and accurate abundance analyses of spectra within the PolarBASE spectral library.
Development of Bayesian Component Failure Models in E1 HEMP Grid Analysis
Combined electric power system and High-Altitude Electromagnetic Pulse (HEMP) models are being developed to determine the effect of a HEMP on the US power grid. The work relies primarily on deterministic methods; however, it is computationally untenable to evaluate the E1 HEMP response of large numbers of grid components distributed across a large interconnection. Further, the deterministic assessment of these components' failures are largely unachievable. E1 HEMP laboratory testing of the components is accomplished, but is expensive, leaving few data points to construct failure models of grid components exposed to E1 HEMP. The use of Bayesian priors, developed using the subject matter expertise, combined with the minimal test data in a Bayesian inference process, provides the basis for the development of more robust and cost-effective statistical component failure models. These can be used with minimal computational burden in a simulation environment such as sampling of Cumulative Distribution Functions (CDFs).
ML-EcoLyzer: Quantifying the Environmental Cost of Machine Learning Inference Across Frameworks and Hardware
Machine learning inference occurs at a massive scale, yet its environmental impact remains poorly quantified, especially on low-resource hardware. We present ML-EcoLyzer, a cross-framework tool for measuring the carbon, energy, thermal, and water costs of inference across CPUs, consumer GPUs, and datacenter accelerators. The tool supports both classical and modern models, applying adaptive monitoring and hardware-aware evaluation. We introduce the Environmental Sustainability Score (ESS), which quantifies the number of effective parameters served per gram of CO_2 emitted. Our evaluation covers over 1,900 inference configurations, spanning diverse model architectures, task modalities (text, vision, audio, tabular), hardware types, and precision levels. These rigorous and reliable measurements demonstrate that quantization enhances ESS, huge accelerators can be inefficient for lightweight applications, and even small models may incur significant costs when implemented suboptimally. ML-EcoLyzer sets a standard for sustainability-conscious model selection and offers an extensive empirical evaluation of environmental costs during inference.
CLaC at SemEval-2025 Task 6: A Multi-Architecture Approach for Corporate Environmental Promise Verification
This paper presents our approach to the SemEval-2025 Task~6 (PromiseEval), which focuses on verifying promises in corporate ESG (Environmental, Social, and Governance) reports. We explore three model architectures to address the four subtasks of promise identification, supporting evidence assessment, clarity evaluation, and verification timing. Our first model utilizes ESG-BERT with task-specific classifier heads, while our second model enhances this architecture with linguistic features tailored for each subtask. Our third approach implements a combined subtask model with attention-based sequence pooling, transformer representations augmented with document metadata, and multi-objective learning. Experiments on the English portion of the ML-Promise dataset demonstrate progressive improvement across our models, with our combined subtask approach achieving a leaderboard score of 0.5268, outperforming the provided baseline of 0.5227. Our work highlights the effectiveness of linguistic feature extraction, attention pooling, and multi-objective learning in promise verification tasks, despite challenges posed by class imbalance and limited training data.
The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report
This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. A robust participation saw 244 registered entrants, with 43 teams submitting valid entries. This report meticulously analyzes these methods and results, emphasizing groundbreaking advancements in state-of-the-art single-image ESR techniques. The analysis highlights innovative approaches and establishes benchmarks for future research in the field.
Tree-based Forecasting of Day-ahead Solar Power Generation from Granular Meteorological Features
Accurate forecasts for day-ahead photovoltaic (PV) power generation are crucial to support a high PV penetration rate in the local electricity grid and to assure stability in the grid. We use state-of-the-art tree-based machine learning methods to produce such forecasts and, unlike previous studies, we hereby account for (i) the effects various meteorological as well as astronomical features have on PV power production, and this (ii) at coarse as well as granular spatial locations. To this end, we use data from Belgium and forecast day-ahead PV power production at an hourly resolution. The insights from our study can assist utilities, decision-makers, and other stakeholders in optimizing grid operations, economic dispatch, and in facilitating the integration of distributed PV power into the electricity grid.
SusGen-GPT: A Data-Centric LLM for Financial NLP and Sustainability Report Generation
The rapid growth of the financial sector and the rising focus on Environmental, Social, and Governance (ESG) considerations highlight the need for advanced NLP tools. However, open-source LLMs proficient in both finance and ESG domains remain scarce. To address this gap, we introduce SusGen-30K, a category-balanced dataset comprising seven financial NLP tasks and ESG report generation, and propose TCFD-Bench, a benchmark for evaluating sustainability report generation. Leveraging this dataset, we developed SusGen-GPT, a suite of models achieving state-of-the-art performance across six adapted and two off-the-shelf tasks, trailing GPT-4 by only 2% despite using 7-8B parameters compared to GPT-4's 1,700B. Based on this, we propose the SusGen system, integrated with Retrieval-Augmented Generation (RAG), to assist in sustainability report generation. This work demonstrates the efficiency of our approach, advancing research in finance and ESG.
PCB-Vision: A Multiscene RGB-Hyperspectral Benchmark Dataset of Printed Circuit Boards
Addressing the critical theme of recycling electronic waste (E-waste), this contribution is dedicated to developing advanced automated data processing pipelines as a basis for decision-making and process control. Aligning with the broader goals of the circular economy and the United Nations (UN) Sustainable Development Goals (SDG), our work leverages non-invasive analysis methods utilizing RGB and hyperspectral imaging data to provide both quantitative and qualitative insights into the E-waste stream composition for optimizing recycling efficiency. In this paper, we introduce 'PCB-Vision'; a pioneering RGB-hyperspectral printed circuit board (PCB) benchmark dataset, comprising 53 RGB images of high spatial resolution paired with their corresponding high spectral resolution hyperspectral data cubes in the visible and near-infrared (VNIR) range. Grounded in open science principles, our dataset provides a comprehensive resource for researchers through high-quality ground truths, focusing on three primary PCB components: integrated circuits (IC), capacitors, and connectors. We provide extensive statistical investigations on the proposed dataset together with the performance of several state-of-the-art (SOTA) models, including U-Net, Attention U-Net, Residual U-Net, LinkNet, and DeepLabv3+. By openly sharing this multi-scene benchmark dataset along with the baseline codes, we hope to foster transparent, traceable, and comparable developments of advanced data processing across various scientific communities, including, but not limited to, computer vision and remote sensing. Emphasizing our commitment to supporting a collaborative and inclusive scientific community, all materials, including code, data, ground truth, and masks, will be accessible at https://github.com/hifexplo/PCBVision.
The Application of Artificial Neural Network Model to Predicting the Acid Mine Drainage from Long-Term Lab Scale Kinetic Test
Acid mine drainage (AMD) is one of the common environmental problems in the coal mining industry that was formed by the oxidation of sulfide minerals in the overburden or waste rock. The prediction of acid generation through AMD is important to do in overburden management and planning the post-mining land use. One of the methods used to predict AMD is a lab-scale kinetic test to determine the rate of acid formation over time using representative samples in the field. However, this test requires a long-time procedure and large amount of chemical reagents lead to inefficient cost. On the other hand, there is potential for machine learning to learn the pattern behind the lab-scale kinetic test data. This study describes an approach to use artificial neural network (ANN) modeling to predict the result from lab-scale kinetic tests. Various ANN model is used based on 83 weeks experiments of lab-scale kinetic tests with 100\% potential acid-forming rock. The model approaches the monitoring of pH, ORP, conductivity, TDS, sulfate, and heavy metals (Fe and Mn). The overall Nash-Sutcliffe Efficiency (NSE) obtained in this study was 0.99 on training and validation data, indicating a strong correlation and accurate prediction compared to the actual lab-scale kinetic tests data. This show the ANN ability to learn patterns, trends, and seasonality from past data for accurate forecasting, thereby highlighting its significant contribution to solving AMD problems. This research is also expected to establish the foundation for a new approach to predict AMD, with time efficient, accurate, and cost-effectiveness in future applications.
Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement
Electrocardiograms (ECGs) are non-invasive diagnostic tools crucial for detecting cardiac arrhythmic diseases in clinical practice. While ECG Self-supervised Learning (eSSL) methods show promise in representation learning from unannotated ECG data, they often overlook the clinical knowledge that can be found in reports. This oversight and the requirement for annotated samples for downstream tasks limit eSSL's versatility. In this work, we address these issues with the Multimodal ECG Representation Learning (MERL}) framework. Through multimodal learning on ECG records and associated reports, MERL is capable of performing zero-shot ECG classification with text prompts, eliminating the need for training data in downstream tasks. At test time, we propose the Clinical Knowledge Enhanced Prompt Engineering (CKEPE) approach, which uses Large Language Models (LLMs) to exploit external expert-verified clinical knowledge databases, generating more descriptive prompts and reducing hallucinations in LLM-generated content to boost zero-shot classification. Based on MERL, we perform the first benchmark across six public ECG datasets, showing the superior performance of MERL compared against eSSL methods. Notably, MERL achieves an average AUC score of 75.2% in zero-shot classification (without training data), 3.2% higher than linear probed eSSL methods with 10\% annotated training data, averaged across all six datasets. Code and models are available at https://github.com/cheliu-computation/MERL
More than Carbon: Cradle-to-Grave environmental impacts of GenAI training on the Nvidia A100 GPU
The rapid expansion of AI has intensified concerns about its environmental sustainability. Yet, current assessments predominantly focus on operational carbon emissions using secondary data or estimated values, overlooking environmental impacts in other life cycle stages. This study presents the first comprehensive multi-criteria life cycle assessment (LCA) of AI training, examining 16 environmental impact categories based on detailed primary data collection of the Nvidia A100 SXM 40GB GPU. The LCA results for training BLOOM reveal that the use phase dominates 11 of 16 impact categories including climate change (96\%), while manufacturing dominates the remaining 5 impact categories including human toxicity, cancer (99\%) and mineral and metal depletion (85\%). For training GPT-4, the use phase dominates 10 of 16 impact categories, contributing about 96\% to both the climate change and resource use, fossils category. The manufacturing stage dominates 6 of 16 impact categories including human toxicity, cancer (94\%) and eutrophication, freshwater (81\%). Assessing the cradle-to-gate environmental impact distribution across the GPU components reveals that the GPU chip is the largest contributor across 10 of 16 of impact categories and shows particularly pronounced contributions to climate change (81\%) and resource use, fossils (80\%). While primary data collection results in modest changes in carbon estimates compared to database-derived estimates, substantial variations emerge in other categories. Most notably, minerals and metals depletion increases by 33\%, demonstrating the critical importance of primary data for non-carbon accounting. This multi-criteria analysis expands the Sustainable AI discourse beyond operational carbon emissions, challenging current sustainability narratives and highlighting the need for policy frameworks addressing the full spectrum of AI's environmental impact.
Planetary Causal Inference: Implications for the Geography of Poverty
Earth observation data such as satellite imagery can, when combined with machine learning, have profound impacts on our understanding of the geography of poverty through the prediction of living conditions, especially where government-derived economic indicators are either unavailable or potentially untrustworthy. Recent work has progressed in using EO data not only to predict spatial economic outcomes, but also to explore cause and effect, an understanding which is critical for downstream policy analysis. In this review, we first document the growth of interest in EO-ML analyses in the causal space. We then trace the relationship between spatial statistics and EO-ML methods before discussing the four ways in which EO data has been used in causal ML pipelines -- (1.) poverty outcome imputation for downstream causal analysis, (2.) EO image deconfounding, (3.) EO-based treatment effect heterogeneity, and (4.) EO-based transportability analysis. We conclude by providing a workflow for how researchers can incorporate EO data in causal ML analysis going forward.
Signatures of the Shock Interaction as an Additional Power Source in the Nebular Spectra of SN 2023ixf
Red supergiants may lose significant mass through steady winds and episodic eruptions in the final 100-1000 years before the core collapses, shaping their circumstellar environment. Interaction between supernova (SN) ejecta and distant circumstellar material (CSM) can generate shocks, which can energize the ejecta and serve as a key power source during the nebular phase of the SN. In the present work, we investigate the nebular spectrum of SN 2023ixf, observed one year post-explosion (at +363 d) with the recently commissioned WEAVE instrument on the 4.2m William Herschel Telescope. This marks the first supernova spectrum captured with WEAVE. In this spectrum, Halpha exhibits a peculiar evolution, flanked by blueward and redward broad components centred at simpm 5650,km,s^{-1} from the rest velocity of Halpha, which are seen for only a few SNe to date. These features indicate energy deposition from shocks generated by the interaction of ejecta with a CSM expelled nearly 350 - 640 years pre-explosion. Comparisons of the +363 d spectrum with model spectra from the literature, that include varying shock powers, suggest a shock power of at least sim 5 times 10 ^{40},erg,s^{-1} at this epoch. Additionally, analysis of the [O I] doublet, along with other prominent emission lines, provides evidence for clumpiness, dust formation, and asymmetry within the ejecta and/or the surrounding CSM. These emission lines also helped to constrain the oxygen mass (approx0.19^{scriptscriptstyle +0.08}_{scriptscriptstyle -0.04} M_odot), He-core mass (<3 M_odot) and the zero-age main sequence mass (lesssim 12 M_odot) of the progenitor of SN 2023ixf. The comparison with other Type II SNe highlights SN 2023ixf's unique shock interaction signatures and evidence of dust formation, setting it apart in terms of evolution and dynamics.
GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images
While recent multimodal large language models (MLLMs) have advanced automated ECG interpretation, they still face two key limitations: (1) insufficient multimodal synergy between time series signals and visual ECG representations, and (2) limited explainability in linking diagnoses to granular waveform evidence. We introduce GEM, the first MLLM unifying ECG time series, 12-lead ECG images and text for grounded and clinician-aligned ECG interpretation. GEM enables feature-grounded analysis, evidence-driven reasoning, and a clinician-like diagnostic process through three core innovations: a dual-encoder framework extracting complementary time series and image features, cross-modal alignment for effective multimodal understanding, and knowledge-guided instruction generation for generating high-granularity grounding data (ECG-Grounding) linking diagnoses to measurable parameters (e.g., QRS/PR Intervals). Additionally, we propose the Grounded ECG Understanding task, a clinically motivated benchmark designed to comprehensively assess the MLLM's capability in grounded ECG understanding. Experimental results on both existing and our proposed benchmarks show GEM significantly improves predictive performance (CSN 7.4% uparrow), explainability (22.7% uparrow), and grounding (24.8% uparrow), making it more suitable for real-world clinical applications. GitHub repository: https://github.com/lanxiang1017/GEM.git
ESPORT: Electronic Sports Professionals Observations and Reflections on Training
Esports and high performance human-computer interaction are on the forefront of applying new hardware and software technologies in practice. Despite that, there is a paucity of research on how semi-professional and professional championship level players approach aspects of their preparation. To address that, we have performed, transcribed, and analyzed interviews with top-tournament players, coaches, and managers across multiple game titles. The interviews range from competitive events occuring between 2015-2020. Initial processing included transcription and manual verification. The pre-processed interview data were then organized and structured into relevant categories, touching on psychological, physical, and nutritional aspects of esports preparation. Further, where applicable, interview responses where rated and quantified via consensus judgement by a panel of experts. The results indicate that physical training was most often mentioned as a relevant or consistent activity, while nutrition was indicated as relatively unimportant. Qualitative analysis also indicated that consistency and resiliency were noted as the most key factors recommended for upcoming esports competitors. It is also clear that many players put emphasis on balancing their gameplay time and with activities. Lastly, we identified important areas of inquiry towards a deeper understanding of the mental and physical demands of professional esports players.
The Carbon Emissions of Writing and Illustrating Are Lower for AI than for Humans
As AI systems proliferate, their greenhouse gas emissions are an increasingly important concern for human societies. We analyze the emissions of several AI systems (ChatGPT, BLOOM, DALL-E2, Midjourney) relative to those of humans completing the same tasks. We find that an AI writing a page of text emits 130 to 1500 times less CO2e than a human doing so. Similarly, an AI creating an image emits 310 to 2900 times less. Emissions analysis do not account for social impacts such as professional displacement, legality, and rebound effects. In addition, AI is not a substitute for all human tasks. Nevertheless, at present, the use of AI holds the potential to carry out several major activities at much lower emission levels than can humans.
Solar Irradiation Forecasting using Genetic Algorithms
Renewable energy forecasting is attaining greater importance due to its constant increase in contribution to the electrical power grids. Solar energy is one of the most significant contributors to renewable energy and is dependent on solar irradiation. For the effective management of electrical power grids, forecasting models that predict solar irradiation, with high accuracy, are needed. In the current study, Machine Learning techniques such as Linear Regression, Extreme Gradient Boosting and Genetic Algorithm Optimization are used to forecast solar irradiation. The data used for training and validation is recorded from across three different geographical stations in the United States that are part of the SURFRAD network. A Global Horizontal Index (GHI) is predicted for the models built and compared. Genetic Algorithm Optimization is applied to XGB to further improve the accuracy of solar irradiation prediction.
Pattern Recognition of Illicit E-Waste Misclassification in Global Trade Data
The global trade in electronic and electrical goods is complicated by the challenge of identifying e-waste, which is often misclassified to evade regulations. Traditional analysis methods struggle to discern the underlying patterns of this illicit trade within vast datasets. This research proposes and validates a robust, data-driven framework to segment products and identify goods exhibiting an anomalous "waste signature" a trade pattern defined by a clear 'inverse price-volume'. The core of the framework is an Outlier-Aware Segmentation method, an iterative K-Means approach that first isolates extreme outliers to prevent data skewing and then re-clusters the remaining products to reveal subtle market segments. To quantify risk, a "Waste Score" is developed using a Logistic Regression model that identifies products whose trade signatures are statistically similar to scrap. The findings reveal a consistent four-tier market hierarchy in both Malaysian and global datasets. A key pattern emerged from a comparative analysis: Malaysia's market structure is defined by high-volume bulk commodities, whereas the global market is shaped by high-value capital goods, indicating a unique national specialization. The framework successfully flags finished goods, such as electric generators (HS 8502), that are traded like scrap, providing a targeted list for regulatory scrutiny.
Extracting SASI signatures from Gravitational Waves of Core-Collapse Supernovae using the Hilbert-Huang Transform
Core collapse supernovae are among the most energetic astrophysical events in the Universe. Despite huge efforts on understanding the main ingredients triggering such explosions, we still lack of compelling evidences for the precise mechanism driving those phenomena. They are expected to produce gravitational waves due to asymmetric mass motions in the collapsing core, and emit in the meanwhile neutrinos as a result of the interactions in their high-density environment. The combination of these two cosmic messengers can provide a unique probe to study the inner engine of these processes and unveil the explosion mechanism. Among the possible detectable signature, standing accretion shock instabilities (SASI) are particularly relevant in this context as they establish a direct connection between gravitational wave emission and the outcoming neutrino flux. In this work, Hilbert-Huang transform is applied to a selected sample of 3D numerical simulations, with the aim of identifying SASI contribution and extract its instantaneous frequency. The performance of the method is evaluated in the context of Einstein Telescope.
Beyond Good Intentions: Reporting the Research Landscape of NLP for Social Good
With the recent advances in natural language processing (NLP), a vast number of applications have emerged across various use cases. Among the plethora of NLP applications, many academic researchers are motivated to do work that has a positive social impact, in line with the recent initiatives of NLP for Social Good (NLP4SG). However, it is not always obvious to researchers how their research efforts are tackling today's big social problems. Thus, in this paper, we introduce NLP4SGPAPERS, a scientific dataset with three associated tasks that can help identify NLP4SG papers and characterize the NLP4SG landscape by: (1) identifying the papers that address a social problem, (2) mapping them to the corresponding UN Sustainable Development Goals (SDGs), and (3) identifying the task they are solving and the methods they are using. Using state-of-the-art NLP models, we address each of these tasks and use them on the entire ACL Anthology, resulting in a visualization workspace that gives researchers a comprehensive overview of the field of NLP4SG. Our website is available at https://nlp4sg.vercel.app . We released our data at https://huggingface.co/datasets/feradauto/NLP4SGPapers and code at https://github.com/feradauto/nlp4sg .
CHECK-MAT: Checking Hand-Written Mathematical Answers for the Russian Unified State Exam
This paper introduces a novel benchmark, EGE-Math Solutions Assessment Benchmark, for evaluating Vision-Language Models (VLMs) on their ability to assess hand-written mathematical solutions. Unlike existing benchmarks that focus on problem solving, our approach centres on understanding student solutions, identifying mistakes, and assigning grades according to fixed criteria. We compile 122 scanned solutions from the Russian Unified State Exam (EGE) together with official expert grades, and evaluate seven modern VLMs from Google, OpenAI, Arcee AI, and Alibaba Cloud in three inference modes. The results reveal current limitations in mathematical reasoning and human-rubric alignment, opening new research avenues in AI-assisted assessment. You can find code in https://github.com/Karifannaa/Auto-check-EGE-math
Exploring the sustainable scaling of AI dilemma: A projective study of corporations' AI environmental impacts
The rapid growth of artificial intelligence (AI), particularly Large Language Models (LLMs), has raised concerns regarding its global environmental impact that extends beyond greenhouse gas emissions to include consideration of hardware fabrication and end-of-life processes. The opacity from major providers hinders companies' abilities to evaluate their AI-related environmental impacts and achieve net-zero targets. In this paper, we propose a methodology to estimate the environmental impact of a company's AI portfolio, providing actionable insights without necessitating extensive AI and Life-Cycle Assessment (LCA) expertise. Results confirm that large generative AI models consume up to 4600x more energy than traditional models. Our modelling approach, which accounts for increased AI usage, hardware computing efficiency, and changes in electricity mix in line with IPCC scenarios, forecasts AI electricity use up to 2030. Under a high adoption scenario, driven by widespread Generative AI and agents adoption associated to increasingly complex models and frameworks, AI electricity use is projected to rise by a factor of 24.4. Mitigating the environmental impact of Generative AI by 2030 requires coordinated efforts across the AI value chain. Isolated measures in hardware efficiency, model efficiency, or grid improvements alone are insufficient. We advocate for standardized environmental assessment frameworks, greater transparency from the all actors of the value chain and the introduction of a "Return on Environment" metric to align AI development with net-zero goals.
AGBD: A Global-scale Biomass Dataset
Accurate estimates of Above Ground Biomass (AGB) are essential in addressing two of humanity's biggest challenges, climate change and biodiversity loss. Existing datasets for AGB estimation from satellite imagery are limited. Either they focus on specific, local regions at high resolution, or they offer global coverage at low resolution. There is a need for a machine learning-ready, globally representative, high-resolution benchmark. Our findings indicate significant variability in biomass estimates across different vegetation types, emphasizing the necessity for a dataset that accurately captures global diversity. To address these gaps, we introduce a comprehensive new dataset that is globally distributed, covers a range of vegetation types, and spans several years. This dataset combines AGB reference data from the GEDI mission with data from Sentinel-2 and PALSAR-2 imagery. Additionally, it includes pre-processed high-level features such as a dense canopy height map, an elevation map, and a land-cover classification map. We also produce a dense, high-resolution (10m) map of AGB predictions for the entire area covered by the dataset. Rigorously tested, our dataset is accompanied by several benchmark models and is publicly available. It can be easily accessed using a single line of code, offering a solid basis for efforts towards global AGB estimation. The GitHub repository github.com/ghjuliasialelli/AGBD serves as a one-stop shop for all code and data.
NESTLE: a No-Code Tool for Statistical Analysis of Legal Corpus
The statistical analysis of large scale legal corpus can provide valuable legal insights. For such analysis one needs to (1) select a subset of the corpus using document retrieval tools, (2) structuralize text using information extraction (IE) systems, and (3) visualize the data for the statistical analysis. Each process demands either specialized tools or programming skills whereas no comprehensive unified "no-code" tools have been available. Especially for IE, if the target information is not predefined in the ontology of the IE system, one needs to build their own system. Here we provide NESTLE, a no code tool for large-scale statistical analysis of legal corpus. With NESTLE, users can search target documents, extract information, and visualize the structured data all via the chat interface with accompanying auxiliary GUI for the fine-level control. NESTLE consists of three main components: a search engine, an end-to-end IE system, and a Large Language Model (LLM) that glues the whole components together and provides the chat interface. Powered by LLM and the end-to-end IE system, NESTLE can extract any type of information that has not been predefined in the IE system opening up the possibility of unlimited customizable statistical analysis of the corpus without writing a single line of code. The use of the custom end-to-end IE system also enables faster and low-cost IE on large scale corpus. We validate our system on 15 Korean precedent IE tasks and 3 legal text classification tasks from LEXGLUE. The comprehensive experiments reveal NESTLE can achieve GPT-4 comparable performance by training the internal IE module with 4 human-labeled, and 192 LLM-labeled examples. The detailed analysis provides the insight on the trade-off between accuracy, time, and cost in building such system.
Spectrophotometry in the integrated light of multiple populations in globular clusters
There is vast evidence from observations of multiple stellar populations (MPs) in globular clusters (GCs). To explore the issue theoretically, this work considers two subsolar metallicities, two ages, and two initial abundance patterns: a first population of standard alpha-enhanced metal mixture stars and a second stellar population displaying C-N and Na-O anticorrelations chemical abundance patterns, along with an enhanced helium fraction. Analysing the predictions for these extreme compositions, we provide insights into the observability of not-resolved MPs into individual stars of GCs. We use colours and spectrophotometric indices measurable with modern facilities (e.g. Euclid, LSST, DES, JWST).
Designing a sector-coupled European energy system robust to 60 years of historical weather data
As energy systems transform to rely on renewable energy and electrification, they encounter stronger year-to-year variability in energy supply and demand. However, most infrastructure planning is based on a single weather year, resulting in a lack of robustness. In this paper, we optimize energy infrastructure for a European energy system designed for net-zero CO_2 emissions in 62 different weather years. Subsequently, we fix the capacity layouts and simulate their operation in every weather year, to evaluate resource adequacy and CO_2 emissions abatement. We show that interannual weather variability causes variation of pm10\% in total system cost. The most expensive capacity layout obtains the lowest net CO_2 emissions but not the highest resource adequacy. Instead, capacity layouts designed with years including compound weather events result in a more robust and cost-effective design. Deploying CO_2-emitting backup generation is a cost-effective robustness measure, which only increase CO_2 emissions marginally as the average CO_2 emissions remain less than 1\% of 1990 levels. Our findings highlight how extreme weather years drive investments in robustness measures, making them compatible with all weather conditions within six decades of historical weather data.
Inteligencia Artificial jurídica y el desafío de la veracidad: análisis de alucinaciones, optimización de RAG y principios para una integración responsable
This technical report analyzes the challenge of "hallucinations" (false information) in LLMs applied to law. It examines their causes, manifestations, and the effectiveness of the RAG mitigation strategy, highlighting its limitations and proposing holistic optimizations. The paper explores the ethical and regulatory implications, emphasizing human oversight as an irreplaceable role. It concludes that the solution lies not in incrementally improving generative models, but in adopting a "consultative" AI paradigm that prioritizes veracity and traceability, acting as a tool to amplify, not replace, professional judgment. -- Este informe t\'ecnico analiza el desaf\'io de las "alucinaciones" (informaci\'on falsa) en los LLMs aplicados al derecho. Se examinan sus causas, manifestaciones y la efectividad de la estrategia de mitigaci\'on RAG, exponiendo sus limitaciones y proponiendo optimizaciones hol\'isticas. Se exploran las implicaciones \'eticas y regulatorias, enfatizando la supervisi\'on humana como un rol insustituible. El documento concluye que la soluci\'on no reside en mejorar incrementalmente los modelos generativos, sino en adoptar un paradigma de IA "consultiva" que priorice la veracidad y la trazabilidad, actuando como una herramienta para amplificar, y no sustituir, el juicio profesional.
A Scalable Framework for Table of Contents Extraction from Complex ESG Annual Reports
Table of contents (ToC) extraction centres on structuring documents in a hierarchical manner. In this paper, we propose a new dataset, ESGDoc, comprising 1,093 ESG annual reports from 563 companies spanning from 2001 to 2022. These reports pose significant challenges due to their diverse structures and extensive length. To address these challenges, we propose a new framework for Toc extraction, consisting of three steps: (1) Constructing an initial tree of text blocks based on reading order and font sizes; (2) Modelling each tree node (or text block) independently by considering its contextual information captured in node-centric subtree; (3) Modifying the original tree by taking appropriate action on each tree node (Keep, Delete, or Move). This construction-modelling-modification (CMM) process offers several benefits. It eliminates the need for pairwise modelling of section headings as in previous approaches, making document segmentation practically feasible. By incorporating structured information, each section heading can leverage both local and long-distance context relevant to itself. Experimental results show that our approach outperforms the previous state-of-the-art baseline with a fraction of running time. Our framework proves its scalability by effectively handling documents of any length.
From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP
Interpretability and analysis (IA) research is a growing subfield within NLP with the goal of developing a deeper understanding of the behavior or inner workings of NLP systems and methods. Despite growing interest in the subfield, a commonly voiced criticism is that it lacks actionable insights and therefore has little impact on NLP. In this paper, we seek to quantify the impact of IA research on the broader field of NLP. We approach this with a mixed-methods analysis of: (1) a citation graph of 185K+ papers built from all papers published at ACL and EMNLP conferences from 2018 to 2023, and (2) a survey of 138 members of the NLP community. Our quantitative results show that IA work is well-cited outside of IA, and central in the NLP citation graph. Through qualitative analysis of survey responses and manual annotation of 556 papers, we find that NLP researchers build on findings from IA work and perceive it is important for progress in NLP, multiple subfields, and rely on its findings and terminology for their own work. Many novel methods are proposed based on IA findings and highly influenced by them, but highly influential non-IA work cites IA findings without being driven by them. We end by summarizing what is missing in IA work today and provide a call to action, to pave the way for a more impactful future of IA research.
Esports Training, Periodization, and Software -- a Scoping Review
Electronic sports (esports) and research on this emerging field are interdisciplinary in nature. By extension, it is essential to understand how to standardize and structure training with the help of existing tools developed by years of research in sports sciences and informatics. Our goal in this article was to verify if the current body of research contains substantial evidence of the training systems applied to training esports players. To verify the existing sources, we have applied a framework of scoping review to address the search from multiple scientific databases with further local processing. We conclude that the current research on esports dealt mainly with describing and modeling performance metrics spanned over multiple fragmented research areas (psychology, nutrition, informatics), and yet these building blocks were not assembled into an existing well-functioning theory of performance in esports by providing exercise regimes, and ways of periodization for esports.
Statistics of X-Ray Polarization Measurements
The polarization of an X-ray beam that produces electrons with velocity components perpendicular to the beam generates an azimuthal distribution of the ejected electrons. We present methods for simulating and for analyzing the angular dependence of electron detections which enable us to derive simple analytical expressions for useful statistical properties of observable data. The derivations are verified by simulations. While we confirm the results of previous work on this topic, we provide an extension needed for analytical treatment of the full range of possible polarization amplitudes.
Group Reasoning Emission Estimation Networks
Accurate greenhouse gas (GHG) emission reporting is critical for governments, businesses, and investors. However, adoption remains limited particularly among small and medium enterprises due to high implementation costs, fragmented emission factor databases, and a lack of robust sector classification methods. To address these challenges, we introduce Group Reasoning Emission Estimation Networks (GREEN), an AI-driven carbon accounting framework that standardizes enterprise-level emission estimation, constructs a large-scale benchmark dataset, and leverages a novel reasoning approach with large language models (LLMs). Specifically, we compile textual descriptions for 20,850 companies with validated North American Industry Classification System (NAICS) labels and align these with an economic model of carbon intensity factors. By reframing sector classification as an information retrieval task, we fine-tune Sentence-BERT models using a contrastive learning loss. To overcome the limitations of single-stage models in handling thousands of hierarchical categories, we propose a Group Reasoning method that ensembles LLM classifiers based on the natural NAICS ontology, decomposing the task into multiple sub-classification steps. We theoretically prove that this approach reduces classification uncertainty and computational complexity. Experiments on 1,114 NAICS categories yield state-of-the-art performance (83.68% Top-1, 91.47% Top-10 accuracy), and case studies on 20 companies report a mean absolute percentage error (MAPE) of 45.88%. The project is available at: https://huggingface.co/datasets/Yvnminc/ExioNAICS.
Solar System Elemental Abundances from the Solar Photosphere and CI-Chondrites
Solar photospheric abundances and CI-chondrite compositions are reviewed and updated to obtain representative solar system abundances of the elements and their isotopes. The new photospheric abundances obtained here lead to higher solar metallicity. Full 3D NLTE photospheric analyses are only available for 11 elements. A quality index for analyses is introduced. For several elements, uncertainties remain large. Protosolar mass fractions are H (X = 0.7060), He (Y = 0.2753), and for metals Li to U (Z = 0.0187). The protosolar (C+N)/H agrees within 13% with the ratio for the solar core from the Borexino experiment. Elemental abundances in CI-chondrites were screened by analytical methods, sample sizes, and evaluated using concentration frequency distributions. Aqueously mobile elements (e.g., alkalis, alkaline earths, etc.) often deviate from normal distributions indicating mobilization and/or sequestration into carbonates, phosphates, and sulfates. Revised CI-chondrite abundances of non-volatile elements are similar to earlier estimates. The moderately volatile elements F and Sb are higher than before, as are C, Br and I, whereas the CI-abundances of Hg and N are now significantly lower. The solar system nuclide distribution curves of s-process elements agree within 4% with s-process predictions of Galactic chemical evolution models. P-process nuclide distributions are assessed. No obvious correlation of CI-chondritic to solar elemental abundance ratios with condensation temperatures is observed, nor is there one for ratios of CI-chondrites/solar wind abundances.
Rethinking Evaluation Metric for Probability Estimation Models Using Esports Data
Probability estimation models play an important role in various fields, such as weather forecasting, recommendation systems, and sports analysis. Among several models estimating probabilities, it is difficult to evaluate which model gives reliable probabilities since the ground-truth probabilities are not available. The win probability estimation model for esports, which calculates the win probability under a certain game state, is also one of the fields being actively studied in probability estimation. However, most of the previous works evaluated their models using accuracy, a metric that only can measure the performance of discrimination. In this work, we firstly investigate the Brier score and the Expected Calibration Error (ECE) as a replacement of accuracy used as a performance evaluation metric for win probability estimation models in esports field. Based on the analysis, we propose a novel metric called Balance score which is a simple yet effective metric in terms of six good properties that probability estimation metric should have. Under the general condition, we also found that the Balance score can be an effective approximation of the true expected calibration error which has been imperfectly approximated by ECE using the binning technique. Extensive evaluations using simulation studies and real game snapshot data demonstrate the promising potential to adopt the proposed metric not only for the win probability estimation model for esports but also for evaluating general probability estimation models.
Analytical sensitivity curves of the second-generation time-delay interferometry
Forthcoming space-based gravitational-wave (GW) detectors will employ second-generation time-delay interferometry (TDI) to suppress laser frequency noise and achieve the sensitivity required for GW detection. We introduce an inverse light-path operator P_{i_{1}i_{2}i_{3}ldots i_{n-1}i_{n}}, which enables simple representation of second-generation TDI combinations and a concise description of light propagation. Analytical expressions and high-accuracy approximate formulas are derived for the sky- and polarization-averaged response functions, noise power spectral densities (PSDs), and sensitivity curves of TDI Michelson, (alpha,beta,gamma), Monitor, Beacon, Relay, and Sagnac combinations, as well as their orthogonal A, E, T channels. Our results show that: (i) second-generation TDIs have the same sensitivities as their first-generation counterparts; (ii) the A, E, T sensitivities and the optimal sensitivity are independent of the TDI generation and specific combination; (iii) the A and E channels have equal averaged responses, noise PSDs, and sensitivities, while the T channel has much weaker response and sensitivity at low frequencies (2pi fL/clesssim3); (iv) except for the (alpha,beta,gamma) and zeta combinations and the T channel, all sensitivity curves exhibit a flat section in the range f_{n}<flesssim 1.5/(2pi L/c), where the noise-balance frequency f_{n} separates the proof-mass- and optical-path-dominated regimes, while the response-transition frequency sim 1.5/(2pi L/c) separates the response function's low- and high-frequency behaviors; (v) the averaged response, noise PSD, and sensitivity of zeta scales with those of the T channel. These analytical and approximate formulations provide useful benchmarks for instrument optimization and data-analysis studies for future space-based GW detectors.
Nuclear Explosions for Large Scale Carbon Sequestration
Confronting the escalating threat of climate change requires innovative and large-scale interventions. This paper presents a bold proposal to employ a buried nuclear explosion in a remote basaltic seabed for pulverizing basalt, thereby accelerating carbon sequestration through Enhanced Rock Weathering (ERW). By precisely locating the explosion beneath the seabed, we aim to confine debris, radiation, and energy while ensuring rapid rock weathering at a scale substantial enough to make a meaningful dent in atmospheric carbon levels. Our analysis outlines the parameters essential for efficient carbon capture and minimal collateral effects, emphasizing that a yield on the order of gigatons is critical for global climate impact. Although this approach may appear radical, we illustrate its feasibility by examining safety factors, preservation of local ecosystems, political considerations, and financial viability. This work argues for reimagining nuclear technology not merely as a destructive force but as a potential catalyst for decarbonization, thereby inviting further exploration of pioneering solutions in the fight against climate change.
The Binary Fraction of Red Supergiants in the Magellanic Clouds
Red supergiants (RSGs), as the descendants of OB-type stars and the progenitors of supernovae, provide crucial insights into the evolution of massive stars, particularly in binary systems. Previous studies show that the binary fraction of RSGs (approx 15% - 40%) is significantly lower than that of their predecessors (approx 50% - 70%). In this work, we investigate the binary fraction of RSGs with the recently selected largest samples of 4695 and 2097 RSGs in the Large Magellanic Cloud (LMC) and Small Magellanic Cloud (SMC), respectively. The binary system with a hot companion (O-, B- and A-type star) is identified by detecting the ultraviolet (UV) excess in the observed spectral energy distribution (SED) ranging from ultraviolet to mid-infrared after subtracting the model SED of RSG since RSGs are very weak in the UV band. It is found that the lower limit of binarity is 30.2% pm 0.7% and 32.2% pm 1% in the LMC and SMC, respectively. If the sample is limited to luminous RSGs with log L/L_{odot} > 4.0, the binary fraction becomes 26.6% pm 1.1% and 26.4% pm 1.7% in the LMC and SMC, respectively. The derived binary fraction is valid in the range of sim 2.3 < log P / [d] < sim 8. Our study suggests that roughly one-third of massive stars host a third companion within sim 30,000 AU. In addition, 15 RSGs are also identified as binary via HST/STIS spectra, and a handful of the binaries identified by the SED fitting are confirmed by their light curve and radial velocity dispersion. The stellar parameters of the companions, i.e. T_{eff}, R, L and log g, are calculated by model fitting.
Toward Open Earth Science as Fast and Accessible as Natural Language
Is natural-language-driven earth observation data analysis now feasible with the assistance of Large Language Models (LLMs)? For open science in service of public interest, feasibility requires reliably high accuracy, interactive latencies, low (sustainable) costs, open LLMs, and openly maintainable software -- hence, the challenge. What are the techniques and programming system requirements necessary for satisfying these constraints, and what is the corresponding development and maintenance burden in practice? This study lays the groundwork for exploring these questions, introducing an impactful earth science use-case, and providing a software framework with evaluation data and metrics, along with initial results from employing model scaling, prompt-optimization, and inference-time scaling optimization techniques. While we attain high accuracy (near 100%) across 10 of 11 metrics, the analysis further considers cost (token-spend), latency, and maintainability across this space of techniques. Finally, we enumerate opportunities for further research, general programming and evaluation framework development, and ongoing work for a comprehensive, deployable solution. This is a call for collaboration and contribution.
Stellar evolution and axion-like particles: new constraints and hints from globular clusters in the GAIA DR3 data
Axion-like particles (ALPs) are hypothetical pseudoscalar bosons, natural in extensions of the Standard Model. Their interactions with ordinary matter and radiation are suppressed, making it challenging to detect them in laboratory experiments. However, these particles, produced within stellar interiors, can provide an additional mechanism for energy loss, potentially influencing stellar evolution. Prominent methods for searching for such effects involve measuring the properties of red giants and helium-burning stars in globular clusters (GCs). Here we use published catalogs of stars selected as members of seven GCs on the basis of parallaxes and proper motions measured by Gaia (Data Realease 3). Making use of previously derived theoretical relations and the new data, we find the upper limit on the ALP-electron coupling, g_{ae}<5.2*10^{-14} (95% CL), and an indication (3.3 sigma) to nonzero ALP-photon coupling, g_{a\gamma}=(6.5+1.1-1.3)*10^{-11} GeV^{-1}. Given the precision of contemporary observational data, it is imperative to refine ALP constraints through more sophisticated analyses, which will be explored in detail elsewhere.
S2SNet: A Pretrained Neural Network for Superconductivity Discovery
Superconductivity allows electrical current to flow without any energy loss, and thus making solids superconducting is a grand goal of physics, material science, and electrical engineering. More than 16 Nobel Laureates have been awarded for their contribution to superconductivity research. Superconductors are valuable for sustainable development goals (SDGs), such as climate change mitigation, affordable and clean energy, industry, innovation and infrastructure, and so on. However, a unified physics theory explaining all superconductivity mechanism is still unknown. It is believed that superconductivity is microscopically due to not only molecular compositions but also the geometric crystal structure. Hence a new dataset, S2S, containing both crystal structures and superconducting critical temperature, is built upon SuperCon and Material Project. Based on this new dataset, we propose a novel model, S2SNet, which utilizes the attention mechanism for superconductivity prediction. To overcome the shortage of data, S2SNet is pre-trained on the whole Material Project dataset with Masked-Language Modeling (MLM). S2SNet makes a new state-of-the-art, with out-of-sample accuracy of 92% and Area Under Curve (AUC) of 0.92. To the best of our knowledge, S2SNet is the first work to predict superconductivity with only information of crystal structures. This work is beneficial to superconductivity discovery and further SDGs. Code and datasets are available in https://github.com/zjuKeLiu/S2SNet
An End-to-End Structure with Novel Position Mechanism and Improved EMD for Stock Forecasting
As a branch of time series forecasting, stock movement forecasting is one of the challenging problems for investors and researchers. Since Transformer was introduced to analyze financial data, many researchers have dedicated themselves to forecasting stock movement using Transformer or attention mechanisms. However, existing research mostly focuses on individual stock information but ignores stock market information and high noise in stock data. In this paper, we propose a novel method using the attention mechanism in which both stock market information and individual stock information are considered. Meanwhile, we propose a novel EMD-based algorithm for reducing short-term noise in stock data. Two randomly selected exchange-traded funds (ETFs) spanning over ten years from US stock markets are used to demonstrate the superior performance of the proposed attention-based method. The experimental analysis demonstrates that the proposed attention-based method significantly outperforms other state-of-the-art baselines. Code is available at https://github.com/DurandalLee/ACEFormer.
Classification-based detection and quantification of cross-domain data bias in materials discovery
It stands to reason that the amount and the quality of data is of key importance for setting up accurate AI-driven models. Among others, a fundamental aspect to consider is the bias introduced during sample selection in database generation. This is particularly relevant when a model is trained on a specialized dataset to predict a property of interest, and then applied to forecast the same property over samples having a completely different genesis. Indeed, the resulting biased model will likely produce unreliable predictions for many of those out-of-the-box samples. Neglecting such an aspect may hinder the AI-based discovery process, even when high quality, sufficiently large and highly reputable data sources are available. In this regard, with superconducting and thermoelectric materials as two prototypical case studies in the field of energy material discovery, we present and validate a new method (based on a classification strategy) capable of detecting, quantifying and circumventing the presence of cross-domain data bias.
AI for operational methane emitter monitoring from space
Mitigating methane emissions is the fastest way to stop global warming in the short-term and buy humanity time to decarbonise. Despite the demonstrated ability of remote sensing instruments to detect methane plumes, no system has been available to routinely monitor and act on these events. We present MARS-S2L, an automated AI-driven methane emitter monitoring system for Sentinel-2 and Landsat satellite imagery deployed operationally at the United Nations Environment Programme's International Methane Emissions Observatory. We compile a global dataset of thousands of super-emission events for training and evaluation, demonstrating that MARS-S2L can skillfully monitor emissions in a diverse range of regions globally, providing a 216% improvement in mean average precision over a current state-of-the-art detection method. Running this system operationally for six months has yielded 457 near-real-time detections in 22 different countries of which 62 have already been used to provide formal notifications to governments and stakeholders.
Ionospheric activity prediction using convolutional recurrent neural networks
The ionosphere electromagnetic activity is a major factor of the quality of satellite telecommunications, Global Navigation Satellite Systems (GNSS) and other vital space applications. Being able to forecast globally the Total Electron Content (TEC) would enable a better anticipation of potential performance degradations. A few studies have proposed models able to predict the TEC locally, but not worldwide for most of them. Thanks to a large record of past TEC maps publicly available, we propose a method based on Deep Neural Networks (DNN) to forecast a sequence of global TEC maps consecutive to an input sequence of TEC maps, without introducing any prior knowledge other than Earth rotation periodicity. By combining several state-of-the-art architectures, the proposed approach is competitive with previous works on TEC forecasting while predicting the TEC globally.
EGC: Image Generation and Classification via a Diffusion Energy-Based Model
Learning image classification and image generation using the same set of network parameters is a challenging problem. Recent advanced approaches perform well in one task often exhibit poor performance in the other. This work introduces an energy-based classifier and generator, namely EGC, which can achieve superior performance in both tasks using a single neural network. Unlike a conventional classifier that outputs a label given an image (i.e., a conditional distribution p(y|x)), the forward pass in EGC is a classifier that outputs a joint distribution p(x,y), enabling an image generator in its backward pass by marginalizing out the label y. This is done by estimating the energy and classification probability given a noisy image in the forward pass, while denoising it using the score function estimated in the backward pass. EGC achieves competitive generation results compared with state-of-the-art approaches on ImageNet-1k, CelebA-HQ and LSUN Church, while achieving superior classification accuracy and robustness against adversarial attacks on CIFAR-10. This work represents the first successful attempt to simultaneously excel in both tasks using a single set of network parameters. We believe that EGC bridges the gap between discriminative and generative learning.
A low-cost ultraviolet-to-infrared absolute quantum efficiency characterization system of detectors
We present a low-cost ultraviolet to infrared absolute quantum efficiency detector characterization system developed using commercial off-the-shelf components. The key components of the experiment include a light source,a regulated power supply, a monochromator, an integrating sphere, and a calibrated photodiode. We provide a step-by-step procedure to construct the photon and quantum efficiency transfer curves of imaging sensors. We present results for the GSENSE 2020 BSI CMOS sensor and the Sony IMX 455 BSI CMOS sensor. As a reference for similar characterizations, we provide a list of parts and associated costs along with images of our setup.
Solar System Experiments in the Search for Dark Energy and Dark Matter
We reassess the realistic discovery reach of Solar-System experiments for dark energy (DE) and dark matter (DM), making explicit the bridge from cosmology-level linear responses to local, screened residuals. In scalar-tensor frameworks with a universal conformal coupling A(phi) and chameleon/Vainshtein screening, we map cosmological responses {mu(z,k),Sigma(z,k)} inferred by DESI and Euclid to thin-shell or Vainshtein residuals in deep Solar potentials Phi_N. We emphasize a two-branch strategy. In a detection-first branch, a verified local anomaly -- an Einstein equivalence principle (EEP) violation, a Shapiro-delay signal with |gamma-1|simfewtimes 10^{-6}, an AU-scale Yukawa tail, or a ultralight DM (ULDM) line in clocks/atom interferometers in space (AIS) -- triggers a joint refit of cosmology and Solar-System data under a common microphysical parameterization {V(phi),A(phi)}. In a guardrail branch, Solar-System tests enforce constraints (EEP; PPN parameters gamma,beta; and dot G/G) and close unscreened or weakly screened corners indicated by cosmology. We forecast, per conjunction, |gamma-1|lesssim (2-5)times 10^{-6} (Ka-/X-band or optical Shapiro), eta_{EEP}sim (1--10)times 10^{-17} (drag-free AIS), |dot G/G|sim(3-5)times10^{-15},yr^{-1} (sub-mm-class LLR), a uniform ~2x tightening of AU-scale Yukawa/DM-density bounds, and (3-10)times improved ULDM-coupling reach from clocks. For a conformal benchmark, mu_{ lin,0}=0.10 implies chisimeq mu_{lin,0/2} and a Sun thin shell Delta R/Rlesssim (1/3chi)|gamma-1|/2=2.4times 10^{-3} at |gamma-1|=5times 10^{-6}; Vainshtein screening at 1 AU yields |gamma-1|lesssim 10^{-11}, naturally below near-term reach. We recommend a cost-effective guardrail+discovery portfolio with explicit triggers for escalation to dedicated missions.
Counting Carbon: A Survey of Factors Influencing the Emissions of Machine Learning
Machine learning (ML) requires using energy to carry out computations during the model training process. The generation of this energy comes with an environmental cost in terms of greenhouse gas emissions, depending on quantity used and the energy source. Existing research on the environmental impacts of ML has been limited to analyses covering a small number of models and does not adequately represent the diversity of ML models and tasks. In the current study, we present a survey of the carbon emissions of 95 ML models across time and different tasks in natural language processing and computer vision. We analyze them in terms of the energy sources used, the amount of CO2 emissions produced, how these emissions evolve across time and how they relate to model performance. We conclude with a discussion regarding the carbon footprint of our field and propose the creation of a centralized repository for reporting and tracking these emissions.
JARVIS-Leaderboard: A Large Scale Benchmark of Materials Design Methods
Lack of rigorous reproducibility and validation are major hurdles for scientific development across many fields. Materials science in particular encompasses a variety of experimental and theoretical approaches that require careful benchmarking. Leaderboard efforts have been developed previously to mitigate these issues. However, a comprehensive comparison and benchmarking on an integrated platform with multiple data modalities with both perfect and defect materials data is still lacking. This work introduces JARVIS-Leaderboard, an open-source and community-driven platform that facilitates benchmarking and enhances reproducibility. The platform allows users to set up benchmarks with custom tasks and enables contributions in the form of dataset, code, and meta-data submissions. We cover the following materials design categories: Artificial Intelligence (AI), Electronic Structure (ES), Force-fields (FF), Quantum Computation (QC) and Experiments (EXP). For AI, we cover several types of input data, including atomic structures, atomistic images, spectra, and text. For ES, we consider multiple ES approaches, software packages, pseudopotentials, materials, and properties, comparing results to experiment. For FF, we compare multiple approaches for material property predictions. For QC, we benchmark Hamiltonian simulations using various quantum algorithms and circuits. Finally, for experiments, we use the inter-laboratory approach to establish benchmarks. There are 1281 contributions to 274 benchmarks using 152 methods with more than 8 million data-points, and the leaderboard is continuously expanding. The JARVIS-Leaderboard is available at the website: https://pages.nist.gov/jarvis_leaderboard
Interpretation of Intracardiac Electrograms Through Textual Representations
Understanding the irregular electrical activity of atrial fibrillation (AFib) has been a key challenge in electrocardiography. For serious cases of AFib, catheter ablations are performed to collect intracardiac electrograms (EGMs). EGMs offer intricately detailed and localized electrical activity of the heart and are an ideal modality for interpretable cardiac studies. Recent advancements in artificial intelligence (AI) has allowed some works to utilize deep learning frameworks to interpret EGMs during AFib. Additionally, language models (LMs) have shown exceptional performance in being able to generalize to unseen domains, especially in healthcare. In this study, we are the first to leverage pretrained LMs for finetuning of EGM interpolation and AFib classification via masked language modeling. We formulate the EGM as a textual sequence and present competitive performances on AFib classification compared against other representations. Lastly, we provide a comprehensive interpretability study to provide a multi-perspective intuition of the model's behavior, which could greatly benefit the clinical use.
Evidence for Widespread Hydrogen Sequestration within the Moon's South Polar Cold Traps
The measured neutron flux from the Moons south polar region shows evidence of locally enhanced hydrogen concentrations, likely in the form of water ice, within most permanently shadowed regions (PSR), poleward of 77 deg S latitude. Results are consistent with the original findings of Watson et al, 1961, which found that the PSRs cryogenic surfaces create exclusive conditions for the sequestration of water ice, due to their extremely low sublimation rates. Widespread PSR hydrogenation is demonstrated in several studies by showing that the contrasting PSR area distribution is being instrumentally blurred. The PSRs expected hydrogen observations are correlated by their area fraction of the fixed 30 km diameter footprint area of the Collimated Sensor for Epithermal Neutrons (CSETN), which is part of the Lunar Exploration Neutron Detector (LEND) onboard the Lunar Reconnaissance Orbiter (LRO). The correlation indicates that the PSRs are similarly hydrogenated, with an expected concentration = 0.27 wt%, relative to that of the anhydrous reference terrain (lower bounds). Hydrogen concentrations are demonstrated to be correlated to maximum temperature distributions within the basins of Haworth, Shoemaker and Faustini PSRs. Cabeus-1 PSR shows an anomalously enhanced hydrogen concentration indicating a second process contributes to its hydrogen budget. Results are consistent with ongoing processes that introduce volatiles to the surface including outgassing, solar wind production with regolith silicates, and mixing from small scale meteor impacts and diurnal temperature variation. We validate the bandpass filter used to subtract CSETNs detection of uncollimated neutrons with profiles of several PSRs neutron suppression before and after processing. Keywords: Moon, Epithermal Neutron, Hydrogen, Water, Ice, Volatiles, LRO, LEND, Diviner, LOLA
Quantifying the Carbon Emissions of Machine Learning
From an environmental standpoint, there are a few crucial aspects of training a neural network that have a major impact on the quantity of carbon that it emits. These factors include: the location of the server used for training and the energy grid that it uses, the length of the training procedure, and even the make and model of hardware on which the training takes place. In order to approximate these emissions, we present our Machine Learning Emissions Calculator, a tool for our community to better understand the environmental impact of training ML models. We accompany this tool with an explanation of the factors cited above, as well as concrete actions that individual practitioners and organizations can take to mitigate their carbon emissions.
Prediction of superconducting properties of materials based on machine learning models
The application of superconducting materials is becoming more and more widespread. Traditionally, the discovery of new superconducting materials relies on the experience of experts and a large number of "trial and error" experiments, which not only increases the cost of experiments but also prolongs the period of discovering new superconducting materials. In recent years, machine learning has been increasingly applied to materials science. Based on this, this manuscript proposes the use of XGBoost model to identify superconductors; the first application of deep forest model to predict the critical temperature of superconductors; the first application of deep forest to predict the band gap of materials; and application of a new sub-network model to predict the Fermi energy level of materials. Compared with our known similar literature, all the above algorithms reach state-of-the-art. Finally, this manuscript uses the above models to search the COD public dataset and identify 50 candidate superconducting materials with possible critical temperature greater than 90 K.
SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning
Progress toward the United Nations Sustainable Development Goals (SDGs) has been hindered by a lack of data on key environmental and socioeconomic indicators, which historically have come from ground surveys with sparse temporal and spatial coverage. Recent advances in machine learning have made it possible to utilize abundant, frequently-updated, and globally available data, such as from satellites or social media, to provide insights into progress toward SDGs. Despite promising early results, approaches to using such data for SDG measurement thus far have largely evaluated on different datasets or used inconsistent evaluation metrics, making it hard to understand whether performance is improving and where additional research would be most fruitful. Furthermore, processing satellite and ground survey data requires domain knowledge that many in the machine learning community lack. In this paper, we introduce SustainBench, a collection of 15 benchmark tasks across 7 SDGs, including tasks related to economic development, agriculture, health, education, water and sanitation, climate action, and life on land. Datasets for 11 of the 15 tasks are released publicly for the first time. Our goals for SustainBench are to (1) lower the barriers to entry for the machine learning community to contribute to measuring and achieving the SDGs; (2) provide standard benchmarks for evaluating machine learning models on tasks across a variety of SDGs; and (3) encourage the development of novel machine learning methods where improved model performance facilitates progress towards the SDGs.
