Watson是IBM公司推出的一套人工智能服务系统,功能十分强大。本系列文章包括两部分,第一部分是翻译Watson中的服务;第二部分是基于其中的几个服务实现的一个demo。在这里,我使用的语言是python。
Watson服务--AlchemyLanguage
AlchemyLanguage服务是一套文本分析函数,可以从文本内容中提取语义信息。你可以输入文本,HTML或者是一个公开的网站,通过复杂的自然语言处理很快就可以获得对你文本的理解以及更细节的信息,比如情感或是检测单位元和关键字。
在分析HTML或者网页信息之前,AlchemyLanguage会自动的移除广告,标题,或者其他不期望的信息,只留下最重要的源文件。HTML文件最大尺寸是600KB,而在净化后得到的文本大小最大是50KB。
pip install --upgrade watson-developer-cloud
(同时还提供了node,Java,ios sdk可供下载)
认证
在使用AlchemyLanguage之前,你需要得到一个API key:
注册一个IBM bluemix账号
登陆bluemix并进入AlchemyLanguage页面
点击创建按钮
在AlchemyLanguage页面点击“服务凭证”查看你的API key。
import jsonfrom watson_developer_cloud import AlchemyLanguageV1alchemy_language = AlchemyLanguage(api_key='API_KEY')
方法
组合查询(Combined Call)
combined(parameter=value,...)
使用多文本分析操作分析文本,HTML,或者网页内容。
如果你想从同一来源获得单元值和关键词信息,你可以调用一次“Combined Call”.定义合法的“exact”参数以便在分析中使用它们,所有应用于个人方法的使用费用将会在组合请求中反映。
任何extract方法的参数都可以使用,你可以参考相应的应用部分来查看哪些操作是可用的,举个例子,你可以对实例和关键字启用特定的情感信息服务,通过设定sentiment=1,因为这个参数使用了附加的使用费用并且是对实例和关键字这两个部分。所以在请求中会是双倍的费用。
import jsonfrom watson_developer_cloud import AlchemyLanguageV1alchemy_language = AlchemyLanguageV1(api_key='API_KEY')print(json.dumps( alchemy_language.combined( url='https://www.ibm.com/us-en/', extract='entities,keywords', sentiment=1, max_items=1), indent=2))
这是这个调用的返回信息:
{ "status": "OK",\\表示请求是否成功 "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://www.ibm.com/us-en/", "totalTransactions": "4", "language": "english", "keywords": [ { "text": "NoSQL cloud database", "relevance": "0.940807", "sentiment": { "type": "positive", "score": "0.46058" } } ],//分析结果 "entities": [ { "type": "Company", "relevance": "0.805754", "sentiment": { "type": "positive", "score": "0.526551" }, "count": "4", "text": "IBM", "disambiguated": { "subType": [ "SoftwareLicense", "OperatingSystemDeveloper", "ProcessorManufacturer", "SoftwareDeveloper", "CompanyFounder", "ProgrammingLanguageDesigner", "ProgrammingLanguageDeveloper" ], "name": "IBM", "website": "http://www.ibm.com/", "dbpedia": "http://dbpedia.org/resource/IBM", "freebase": "http://rdf.freebase.com/ns/m.03sc8", "opencyc": "http://sw.opencyc.org/concept/Mx4rvViMoJwpEbGdrcN5Y29ycA", "yago": "http://yago-knowledge.org/resource/IBM", "crunchbase": "http://www.crunchbase.com/company/ibm" } } ]}
作者(Author)
从网页或者HTML中得到作者姓名
authors(parameter=value,...)import jsonfrom watson_developer_cloud import AlchemyLanguageV1alchemy_language = AlchemyLanguageV1(api_key='API_KEY')print(json.dumps( alchemy_language.authors( url='https://www.ibm.com/us-en/'), indent=2))
返回结果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://techcrunch.com/2016/01/29/ibm-watson-weather-company-sale/", "authors": { "names": [ "Author Name 1", "Author Name 2" ] }}
概念(Concepts)
从网页,HTML,纯文本中提取概念。目前支持英语和西班牙语。
concepts(patameter=value,...)import jsonfrom watson_developer_cloud import AlchemyLanguageV1alchemy_language = AlchemyLanguageV1(api_key='API_KEY')print(json.dumps( alchemy_language.concepts( url='https://www.ibm.com/watson/', knowledgeGraph=1), indent=2))//knowledgeGraph可以提供联想信息,是附加收费参数。
返回结果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://www.ibm.com/watson/", "totalTransactions": "2", "language": "english", "concepts": [ { "text": "Thomas J. Watson", "relevance": "0.926128", "knowledgeGraph": { "typeHierarchy": "/people/thomas j. watson" }, "dbpedia": "http://dbpedia.org/resource/Thomas_J._Watson", "freebase": "http://rdf.freebase.com/ns/m.07qkt", "yago": "http://yago-knowledge.org/resource/Thomas_J._Watson" }, { "text": "Science", "relevance": "0.902652", "knowledgeGraph": { "typeHierarchy": "/fields/subjects/science" }, "dbpedia": "http://dbpedia.org/resource/Science", "freebase": "http://rdf.freebase.com/ns/m.06mq7", "opencyc": "http://sw.opencyc.org/concept/Mx4rwKQK2JwpEbGdrcN5Y29ycA" }, ... ]}
日期(Date Extraction)
从网页,HTML,纯文本中提取日期,目前仅支持英语。
dates(parameter=value,...)import jsonfrom watson_developer_cloud import AlchemyLanguageV1alchemy_language = AlchemyLanguageV1(api_key='API_KEY')print(json.dumps( alchemy_language.dates( text='Set a reminder for my appointment next Tuesday', anchor_date='2016-03-22 00:00:00'), indent=2)){ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "totalTransactions": "1", "language": "english", "dates": [ { "date": "20160329T000000", "text": "next tuesday" } ]}
情感分析(Emotion Analysis)
从网页,HTML,纯文本中提取情感,目前仅支持英语。
emotion(parameter=value,...)import jsonfrom watson_developer_cloud import AlchemyLanguageV1alchemy_language = AlchemyLanguageV1(api_key='API_KEY')print(json.dumps( alchemy_language.emotion( url='charliechaplin.com/en/synopsis/articles/29-The-Great-Dictator-s-Speech'), indent=2))
返回结果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://www.charliechaplin.com/en/synopsis/articles/29-The-Great-Dictator-s-Speech", "totalTransactions": "0", "language": "english", "docEmotions": { "anger": "0.639028", "disgust": "0.009711", "fear": "0.037295", "joy": "4e-05", "sadness": "0.002552" }}
实例(Entities)
从网页,HTML,纯文本中提取实例,标准实例表.你也可以自己创建用户模型来得到你想要的实例。
支持语言:英,德,法,意,葡,俄,西,瑞典。
entities(parameter=value,...)import jsonfrom watson_developer_cloud import AlchemyLanguageV1alchemy_language = AlchemyLanguageV1(api_key='API_KEY')print(json.dumps( alchemy_language.entities( url='http://www-03.ibm.com/press/us/en/pressrelease/49384.wss'), indent=2))
返回结果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://www-03.ibm.com/press/us/en/pressrelease/49384.wss", "language": "english", "entities": [ { "type": "Company", "relevance": "0.89792", "count": "12", "text": "IBM Cloud" }, { "type": "Company", "relevance": "0.590382", "count": "13", "text": "IBM", "disambiguated": { "subType": [ "SoftwareLicense", "OperatingSystemDeveloper", "ProcessorManufacturer", "SoftwareDeveloper", "CompanyFounder", "ProgrammingLanguageDesigner", "ProgrammingLanguageDeveloper" ], "name": "IBM", "website": "http://www.ibm.com/", "dbpedia": "http://dbpedia.org/resource/IBM", "freebase": "http://rdf.freebase.com/ns/m.03sc8", "opencyc": "http://sw.opencyc.org/concept/Mx4rvViMoJwpEbGdrcN5Y29ycA", "yago": "http://yago-knowledge.org/resource/IBM", "crunchbase": "http://www.crunchbase.com/company/ibm" } }, { "type": "Facility", "relevance": "0.252495", "count": "1", "text": "London Bluemix Garage" } ]}
Feed Detection
从网页或html中提取RSS/ATOM 馈入链接。
feeds(parameter=values,...)import jsonfrom watson_developer_cloud import AlchemyLanguageV1alchemy_language = AlchemyLanguageV1(api_key='API_KEY')print(json.dumps( alchemy_language.feeds( url='news.ycombinator.com'), indent=2))
返回结果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "https://news.ycombinator.com/", "feeds": [ { "feed": "https://news.ycombinator.com/rss" } ]}
关键词(keywords)
从文本,网页,HTML中提取关键词
keywords(parameter=value,...)import jsonfrom watson_developer_cloud import AlchemyLanguageV1alchemy_language = AlchemyLanguageV1(api_key='API_KEY')print(json.dumps( alchemy_language.keywords( url='twitter.com/ibmwatson'), indent=2))
返回结果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "https://mobile.twitter.com/ibmwatson", "totalTransactions": "1", "language": "english", "keywords": [ { "relevance": "0.936546", "text": "Watson" }, { "relevance": "0.823589", "text": "Watson Developer Cloud" }, ... ]}
语言识别(Language Detection)
从网页,HTML,纯文本中识别语言。 API会自动识别语言,但这个方法提供了关于语言识别的更多信息,为了让它有效工作,待识别的文本最好超过100个单词。详细的可识别语言列表参见
language(parameter=value,...)import jsonfrom watson_developer_cloud import AlchemyLanguageV1alchemy_language = AlchemyLanguageV1(api_key='API_KEY')print(json.dumps( alchemy_language.language( url='ibm.com/us-en'), indent=2))
返回结果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://www.ibm.com/us-en/", "language": "english", "iso-639-1": "en", "iso-639-2": "eng", "iso-639-3": "eng", "ethnologue": "http://www.ethnologue.com/show_language.asp?code=eng", "native-speakers": "309-400 million", "wikipedia": "http://en.wikipedia.org/wiki/English_language"}
微格式(Microformats)
从网页或HTML中提取微格式,更多关于微格式请见
microformats(parameter=value,...)import jsonfrom watson_developer_cloud import AlchemyLanguageV1alchemy_language = AlchemyLanguageV1(api_key='API_KEY')print(json.dumps( alchemy_language.microformats( url='http://microformats.org/wiki/hcard'), indent=2))
返回结果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://microformats.org/wiki/hcard", "microformats": [ { "field": "RelTagLink", "data": "/wiki/Category:Specifications" }, { "field": "RelTag", "data": "Category:Specifications" }, { "field": "NameGivenName", "data": "Tantek" }, { "field": "NameFamilyName", "data": "Çelik" }, { "field": "FormattedName", "data": "Tantek Çelik" }, { "field": "Role", "data": "Editor" }, { "field": "Role", "data": "Author" } ]}
关系(Relations)
从网页,文本,HTML中提取主谓宾关系
relations(parameter=value,...)import jsonfrom watson_developer_cloud import AlchemyLanguageV1alchemy_language = AlchemyLanguageV1(api_key='API_KEY')print(json.dumps( alchemy_language.relations( url='https://www.whitehouse.gov/the-press-office/2016/03/19/weekly-address-president-obamas-supreme-court-nomination', max_items=1), indent=2))
返回结果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "https://www.whitehouse.gov/the-press-office/2016/03/19/weekly-address-president-obamas-supreme-court-nomination", "language": "english", "relations": [ { "sentence": " WASHINGTON, DC — In this week's address, the President discussed his decision to nominate Chief Judge Merrick Garland to the Supreme Court of the United States.", "subject": { "text": "the President" }, "action": { "text": "to nominate", "lemmatized": "to nominate", "verb": { "text": "nominate", "tense": "future" } }, "object": { "text": "Merrick Garland" } } ]}
情感分析(Sentiment Analysis)
这个情感分析不同于之前的“Emotion Analysis”,它是对于整个文本的页面的整体情感态度分析。
sentiment(parameter=value,...)import jsonfrom watson_developer_cloud import AlchemyLanguageV1alchemy_language = AlchemyLanguageV1(api_key='API_KEY')print(json.dumps( alchemy_language.sentiment( url='http://www.huffingtonpost.com/2010/06/22/iphone-4-review-the-worst_n_620714.html'), indent=2))
返回结果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://www.huffingtonpost.com/2010/06/22/iphone-4-review-the-worst_n_620714.html", "totalTransactions": "1", "language": "english", "docSentiment": { "mixed": "1", "score": "-0.24582", "type": "negative" }}
目标情感(Targeted Sentiment)
在网页,HTML,纯文本中分析特定词语的情感
targeted_sentiment(parameter=value,...)import jsonfrom watson_developer_cloud import AlchemyLanguageV1alchemy_language = AlchemyLanguageV1(api_key='API_KEY')print(json.dumps( alchemy_language.targeted_sentiment( url='http://www.zacks.com/stock/news/207968/stock-market-news-for-february-19-2016', targets=['NASDAQ', 'Dow']), indent=2))
返回结果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://www.zacks.com/stock/news/207968/stock-market-news-for-february-19-2016", "totalTransactions": "1", "language": "english", "results": [ { "sentiment": { "score": "-0.387744", "type": "negative" }, "text": "NASDAQ" }, { "sentiment": { "score": "-0.416076", "type": "negative" }, "text": "Dow" } ]}
分类(Taxonomy)
将网页,HTML,纯文本分成5类。
taxonomy(parameter=value,...)import jsonfrom watson_developer_cloud import AlchemyLanguageV1alchemy_language = AlchemyLanguageV1(api_key='API_KEY')print(json.dumps( alchemy_language.taxonomy( url='cnn.com'), indent=2))
返回结果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://www.cnn.com/", "totalTransactions": "1", "language": "english", "taxonomy": [ { "label": "/news", "score": "0.994385"//识别类型 }, { "label": "/art and entertainment/movies and tv/television", "score": "0.706355" }, { "confident": "no", "label": "/sports/football", "score": "0.471388" } ]}
文本(简化版)Text (cleaned)
从网页中提取主要文本。
text(parameter=value,...)import jsonfrom watson_developer_cloud import AlchemyLanguageV1alchemy_language = AlchemyLanguageV1(api_key='API_KEY')print(json.dumps( alchemy_language.text( url='techcrunch.com/2016/01/29/ibm-watson-weather-company-sale'), indent=2))
返回结果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://techcrunch.com/2016/01/29/ibm-watson-weather-company-sale/", "language": "english", "text": " IBM is taking another step to expand its Watson AI business and build its presence in areas like IoT: today the company announced ... "}
文本(原生)Text (raw)
从网页中提取原生文本
text(parameter=value,...)import jsonfrom watson_developer_cloud import AlchemyLanguageV1alchemy_language = AlchemyLanguageV1(api_key='API_KEY')print(json.dumps( alchemy_language.raw_text( url='techcrunch.com/2016/01/29/ibm-watson-weather-company-sale'), indent=2))
文档原文: