Logic-based Test Research Articles

Large language models (LLMs) have revolutionized language processing, but face critical challenges with security, privacy, and generating hallucinations — coherent but factually inaccurate outputs. A major issue is fact-conflicting hallucination (FCH), where LLMs produce content contradicting ground truth facts. Addressing FCH is difficult due to two key challenges: 1) Automatically constructing and updating benchmark datasets is hard, as existing methods rely on manually curated static benchmarks that cannot cover the broad, evolving spectrum of FCH cases. 2) Validating the reasoning behind LLM outputs is inherently difficult, especially for complex logical relations. To tackle these challenges, we introduce a novel logic-programming-aided metamorphic testing technique for FCH detection. We develop an extensive and extensible framework that constructs a comprehensive factual knowledge base by crawling sources like Wikipedia, seamlessly integrated into Drowzee. Using logical reasoning rules, we transform and augment this knowledge into a large set of test cases with ground truth answers. We test LLMs on these cases through template-based prompts, requiring them to provide reasoned answers. To validate their reasoning, we propose two semantic-aware oracles that assess the similarity between the semantic structures of the LLM answers and ground truth. Our approach automatically generates useful test cases and identifies hallucinations across six LLMs within nine domains, with hallucination rates ranging from 24.7% to 59.8%. Key findings include LLMs struggling with temporal concepts, out-of-distribution knowledge, and lack of logical reasoning capabilities. The results show that logic-based test cases generated by Drowzee effectively trigger and detect hallucinations. To further mitigate the identified FCHs, we explored model editing techniques, which proved effective on a small scale (with edits to fewer than 1000 knowledge pieces). Our findings emphasize the need for continued community efforts to detect and mitigate model hallucinations.

Read full abstract

The increased size of embedded memory for system-on-chip (SoC) and multicore processors has a positive impact on performance yet poses a big challenge for chip yield, power consumption, and overall cost. Big percentage (>60%) of today’s processors and SoC area in both 2-D (planar) and 3-D technologies, such as through silicon via (TSV), are dedicated to memory. Most of today’s embedded memories are not as simple as a storage area with single interface of data, address, and control, but rather they compromise complex logic on their interface due to timing constrains and interconnect technologies (NoC and TSV). Memory core testing strategy is well understood and has mature tools and methodologies to screen for defects such as built-in-self-test (BIST). In addition, core and logic-based testing using scan and automatic test pattern generation (ATPG) tools and methodologies are intended for flop-based design. However, interface logic and complex interconnect like the one in 3-D chips are not thoroughly tested using BIST or ATPG as they are not designed for such logic. This becomes even more important for 3-D chips where a stack memory could have different testing strategies other than the base layer core which is interfacing with it. This brief presents a design for test methodology to achieve good coverage on interface logic for embedded and stack memory. The proposed approach uses modified ATPG and scan methodology to test the memory logic interface with minimum impact to existing design.

Read full abstract

Logic-based Test Research Articles

Related Topics

Articles published on Logic-based Test

Drowzee: Metamorphic Testing for Fact-Conflicting Hallucination Detection in Large Language Models

Enhancing logic-based testing with EvoDomain: A search-based domain-oriented test suite generation approach

Analytical Models for the Evaluation of Resistive Short Defect Detectability in Presence of Process Variations: Application to 28nm Bulk and FDSOI Technologies

What to expect of predicates: An empirical analysis of predicates in real world programs

Embedded Memory Interface Logic and Interconnect Testing

Passive performance testing of network protocols

Improving logic-based testing

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Logic-based Test Research Articles

Related Topics

Articles published on Logic-based Test

Drowzee: Metamorphic Testing for Fact-Conflicting Hallucination Detection in Large Language Models

Enhancing logic-based testing with EvoDomain: A search-based domain-oriented test suite generation approach

Analytical Models for the Evaluation of Resistive Short Defect Detectability in Presence of Process Variations: Application to 28nm Bulk and FDSOI Technologies

What to expect of predicates: An empirical analysis of predicates in real world programs

Embedded Memory Interface Logic and Interconnect Testing

Passive performance testing of network protocols

Improving logic-based testing