This track is powered by the MSR BioNLP Lab as part of the Literome Project at Microsoft Research. UCSC collaborators at Microsoft Research (Hoifung Poon, Chris Quirk) implemented an end-to-end natural-language processing (NLP) system to extract pathway interactions and processed all 20 million Pubmed abstracts. The results were mapped to the genome by HGNC gene symbols.
HGNC symbols are highlighted and clickable.
Pubmed abstracts were retrieved from the NLM website. They were tokenized and parsed syntactically using the SPLAT toolkit. Proteins were identified and normalized, and potential interactions were extracted using the MSR Protein and Pathway Extractors.