将带有音标的拼音拼音转换为拼音

是否有任何脚本,库或程序使用PythonBASH工具(如awkperlsed )能正确地将数字拼音(例如dian4 nao3)转换为UTF-8拼音(例如diànnǎo)?

我发现了下面的例子,但是他们需要PHP或者#C

  • PHP 转换编号,突出拼音?
  • C 用拼音标记将拼音拼音转换成拼音的任何库?

我也发现了各种在线工具,但是他们不能处理大量的转换。

我有一些Python 3的代码可以做到这一点,它足够小,直接放在答案里。

 PinyinToneMark = { 0: "aoeiuv\u00fc", 1: "\u0101\u014d\u0113\u012b\u016b\u01d6\u01d6", 2: "\u00e1\u00f3\u00e9\u00ed\u00fa\u01d8\u01d8", 3: "\u01ce\u01d2\u011b\u01d0\u01d4\u01da\u01da", 4: "\u00e0\u00f2\u00e8\u00ec\u00f9\u01dc\u01dc", } def decode_pinyin(s): s = s.lower() r = "" t = "" for c in s: if c >= 'a' and c <= 'z': t += c elif c == ':': assert t[-1] == 'u' t = t[:-1] + "\u00fc" else: if c >= '0' and c <= '5': tone = int(c) % 5 if tone != 0: m = re.search("[aoeiuv\u00fc]+", t) if m is None: t += c elif len(m.group(0)) == 1: t = t[:m.start(0)] + PinyinToneMark[tone][PinyinToneMark[0].index(m.group(0))] + t[m.end(0):] else: if 'a' in t: t = t.replace("a", PinyinToneMark[tone][0]) elif 'o' in t: t = t.replace("o", PinyinToneMark[tone][1]) elif 'e' in t: t = t.replace("e", PinyinToneMark[tone][2]) elif t.endswith("ui"): t = t.replace("i", PinyinToneMark[tone][3]) elif t.endswith("iu"): t = t.replace("u", PinyinToneMark[tone][4]) else: t += "!" r += t t = "" r += t return r 

这处理üu:v ,所有我遇到过的。 Python 2的兼容性需要进行一些修改。

cjklib库确实涵盖了你的需求:

使用Python shell:

 >>> from cjklib.reading import ReadingFactory >>> f = ReadingFactory() >>> print f.convert('Bei3jing1', 'Pinyin', 'Pinyin', sourceOptions={'toneMarkType': 'numbers'}) Běijīng 

或者只是命令行:

 $ cjknife -m Bei3jing1 Běijīng 

免责声明:我开发了这个库。

我写了另外一个Python函数,它不区分大小写,并保留空格,标点符号和其他文本(当然,除非有误报):

 # -*- coding: utf-8 -*- import re pinyinToneMarks = { u'a': u'āáǎà', u'e': u'ēéěè', u'i': u'īíǐì', u'o': u'ōóǒò', u'u': u'ūúǔù', u'ü': u'ǖǘǚǜ', u'A': u'ĀÁǍÀ', u'E': u'ĒÉĚÈ', u'I': u'ĪÍǏÌ', u'O': u'ŌÓǑÒ', u'U': u'ŪÚǓÙ', u'Ü': u'ǕǗǙǛ' } def convertPinyinCallback(m): tone=int(m.group(3))%5 r=m.group(1).replace(u'v', u'ü').replace(u'V', u'Ü') # for multple vowels, use first one if it is a/e/o, otherwise use second one pos=0 if len(r)>1 and not r[0] in 'aeoAEO': pos=1 if tone != 0: r=r[0:pos]+pinyinToneMarks[r[pos]][tone-1]+r[pos+1:] return r+m.group(2) def convertPinyin(s): return re.sub(ur'([aeiouüvÜ]{1,3})(n?g?r?)([012345])', convertPinyinCallback, s, flags=re.IGNORECASE) print convertPinyin(u'Ni3 hao3 ma0?') 

我把代码从dani_l移植到Kotlin(在java中的代码应该非常相似)。 它是:

 import java.util.regex.Pattern val pinyinToneMarks = mapOf( 'a' to "āáǎà", 'e' to "ēéěè", 'i' to "īíǐì", 'o' to "ōóǒò", 'u' to "ūúǔù", 'ü' to "ǖǘǚǜ", 'A' to "ĀÁǍÀ", 'E' to "ĒÉĚÈ", 'I' to "ĪÍǏÌ", 'O' to "ŌÓǑÒ", 'U' to "ŪÚǓÙ", 'Ü' to "ǕǗǙǛ" ) fun toPinyin(asciiPinyin: String) :String { val pattern = Pattern.compile("([aeiouüvÜ]{1,3})(n?g?r?)([012345])")!! val matcher = pattern.matcher(asciiPinyin) val s = StringBuilder() var start = 0 while (matcher.find(start)) { s.append(asciiPinyin, start, matcher.start(1)) val tone = Integer.parseInt(matcher.group(3)!!) % 5 val r = matcher.group(1)!!.replace("v", "ü").replace("V", "Ü") // for multple vowels, use first one if it is a/e/o, otherwise use second one val pos = if (r.length >1 && r[0].toString() !in "aeoAEO") 1 else 0 if (tone != 0) s.append(r, 0, pos).append(pinyinToneMarks[r[pos]]!![tone - 1]).append(r, pos + 1, r.length) else s.append(r) s.append(matcher.group(2)) start = matcher.end(3) } if (start != asciiPinyin.length) s.append(asciiPinyin, start, asciiPinyin.length) return s.toString() } fun test() = print(toPinyin("Ni3 hao3 ma0?")) 

我遇到了一个VBA宏,它在Microsoft Word中, pinyinjoe.com

我报告了一个小小的缺陷,他回答说,他会尽快把我的建议纳入到2014年1月初。 我没有任何动机检查,因为它已经在我的副本。

    Interesting Posts