Connection Table Research Articles

This commentary describes two simple procedures using commercially available software packages that greatly facilitate the creation of and replication of data sets intended for quantitative structure activity relationship (QSAR) and quantitative structure property relationship (QSPR) studies. Used properly, the procedures allow the capture of individual chemical structures from the Chemical Abstracts Service (CAS) SciFinder software in a computer readable format that is recognized by most chemical database and computational calculation software packages. The researcher need not draw in a chemical structure to create a Molecular Design Limited (MDL) mol file, the 2D connection table format most commonly used to create the chemical depiction of compound or drug. The MDL mol format is needed so that properties can be calculated from the chemical structure alone. All that is required is that the compound or drug be located in SciFinder. The procedures are described in considerable detail because the key procedures for capturing structures from Chemical Abstracts Service (CAS) SciFinder through the use of Accelrys’Accord for Excel software are undocumented in either software. Also described is a batch procedure that allows search of CAS SciFinder for the exact chemical structure of up to 25 compounds. Without use of this procedure, Scifinder can only be searched for an exact chemical structure a single compoundat a time using a query consisting of a drawn in structure. Both the single-mode structure retrieval and batch-mode compound search procedures result in very significant time savings to the researcher creating or replicating QSAR/QSPR data sets and likelymay enable structure searches that previously might not have been attempted because of researcher time constraints. These procedures do not affect positively or negatively the cost to the user of the searches against the SciFinder software. These costs are determined by CAS policy, and depend on the numbers of structures/compounds searched. Locating a compound or drug in SciFinder is most accurately done using the CAS Registry Number. The CAS Registry Number uniquely identifies a specific compound and salt form. Older deleted CAS Registry Numbers for a specific compound may be encountered, but a search on the current (or older) CAS Registry Numbers will always bring up the correct compound. If different salt forms of the same compound exist in the CAS databases, they will have different CAS Registry Numbers. By contrast, a search by compound or drug namemay fail. IUPAC names for compounds are of no value in searches against SciFinder because IUPAC Names are not listed in the records. Commonly used drug names work fairly well. However, it is frequent to find variant or misspelled drug names in the scientific literature. A deviation between a search input name and the names stored in CAS databases in only a single letter or number will result in a search failure. Searchesusingdrug tradenames (as invery recent drugs) or company code numbers (as in early discovery stage compounds) fail more frequently than searches using common names.

Read full abstract

Both dictionary-based and rule-based methods on grapheme-to-phoneme conversion have their own advantages and limitations. For example, a large sized phonetic dictionary and complex morphophonemic rules are required for the dictionary-based method and the LTS (letter to sound) rule-based method itself cannot model the complete morphophonemic constraints.This paper describes a grapheme-to-phoneme conversion method for Korean using a dictionary-based and rule-based hybrid method with a phonetic pattern dictionary and CCV (consonant consonant vowel) LTS (letter to sound) rules. The phonetic pattern dictionary, standing for the dictionary-based method, contains entries in the form of a morpheme pattern and its phonetic pattern. The patterns represent candidate phonological changes in left and right boundaries of morphemes. Obviously, the CCV LTS rules stand for the rule-based method. The rules are in charge of grapheme-to-phoneme conversion within morphemes.The conversion method consists of mainly two steps including morpheme to phoneme conversion and morphophonemic connectivity check, and two preprocessing steps including phrase break prediction and morpheme normalization. Phrase break prediction presumes phrase breaks using the stochastic method on part-of-speech (POS) information. Morpheme normalization is to replace non-Korean symbols with their corresponding standard Korean graphemes. In the morpheme-phoneticizing module, each morpheme in the phrase is converted into phonetic patterns by looking it up in the phonetic pattern dictionary. Graphemes within a morpheme are grouped into CCV units and converted into phonemes by the CCV LTS rules. The morphophonemic connectivity table supports grammaticality checking of the two adjacent phonetic morphemes.In experiments with a non-Korean symbol free corpus of 4,973 sentences, we achieved a 99.98% grapheme-to-phoneme conversion performance rate and a 99.0% sentence conversion performance rate. With a broadcast news corpus of 621 sentences, 99.7% of the graphemes and 86.6% of the sentences are correctly converted. The full Korean TTS (Text-to-Speech) system is now being implemented using this conversion method.

Read full abstract

Connection Table Research Articles

Related Topics

Articles published on Connection Table

Permuting input for more effective sampling of 3D conformer space

Depicting combinatorial complexity with the molecular interaction map notation

Conformational Boosting

Communication and re-use of chemical information in bioscience.

A Simple Algorithm for Unique Representation of Chemical StructuresCyclic/Acyclic Functionalized Achiral Molecules

QSAR analyses of conformationally restricted 1,5-diaryl pyrazoles as selective COX-2 inhibitors: application of connection table representation of ligands

QSAR Modeling Based on Structure-Information for Properties of Interest in Human Health*

Development and Application of XyM2Mol System for Converting Structural Data by XyM Notation into Connection Tables

Selecting contact particles in dynamics granular mechanics systems

A connectivity table for cluster similarity checking in the evolutionary optimization method

A Structure‐Information Approach to the Prediction of Biological Activities and Properties

Enabling the exploration of biochemical pathways

The Challenges with Substance Databases and Structure Search Engines

Derivation and applications of molecular descriptors based on approximate surface area.

Chemical machine vision: automated extraction of chemical metadata from raster images.

Conformational sampling by self-organization.

Single-Mode Compound Retrieval for QSAR, QSPR Data Sets, and Batch Mode Exact Structure Searching

Software for automating analysis of encoded combinatorial libraries.

Morpheme-based grapheme to phoneme conversion using phonetic patterns and morphophonemic connectivity information

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Connection Table Research Articles

Related Topics

Articles published on Connection Table

Permuting input for more effective sampling of 3D conformer space

Depicting combinatorial complexity with the molecular interaction map notation

Conformational Boosting

Communication and re-use of chemical information in bioscience.

A Simple Algorithm for Unique Representation of Chemical StructuresCyclic/Acyclic Functionalized Achiral Molecules

QSAR analyses of conformationally restricted 1,5-diaryl pyrazoles as selective COX-2 inhibitors: application of connection table representation of ligands

QSAR Modeling Based on Structure-Information for Properties of Interest in Human Health*

Development and Application of XyM2Mol System for Converting Structural Data by XyM Notation into Connection Tables

Selecting contact particles in dynamics granular mechanics systems

A connectivity table for cluster similarity checking in the evolutionary optimization method

A Structure‐Information Approach to the Prediction of Biological Activities and Properties

Enabling the exploration of biochemical pathways

The Challenges with Substance Databases and Structure Search Engines

Derivation and applications of molecular descriptors based on approximate surface area.

Chemical machine vision: automated extraction of chemical metadata from raster images.

Conformational sampling by self-organization.

Single-Mode Compound Retrieval for QSAR, QSPR Data Sets, and Batch Mode Exact Structure Searching

Software for automating analysis of encoded combinatorial libraries.

Morpheme-based grapheme to phoneme conversion using phonetic patterns and morphophonemic connectivity information