Friday, February 17, 2012

Moving from Google Translate API to Microsoft Translate API in Scala

Google Translate used to be gods gift to developers who want to verify their internationalization works. It made localizing to some random locale for test purposes downright trivial. And then the bastards went and deprecated it (http://googlecode.blogspot.com/2011/05/spring-cleaning-for-some-of-our-apis.html), ultimately making it it pay to play.

The rates for Google Translate are so low it is unlikely we can justify switching to Microsoft Translate based on cost but we love "free" so we'll spend a few expensive hours of developer time on it anyway. M$ translate is free for your first 2M characters and you can pay for more.

The first thing that one notices upon trying to call a Microsoft web service is that it isn't as easy as you'd like. From reading the http://api.microsofttranslator.com/V2/Http.svc/Translate API to getting a successful translation call through from code took WAY longer than Google Translate (or other APIs) and had more bumps in the road. I'm sure others have differing experiences but for me I had programmatic access to Google Translate working maybe 30-60 minutes (it was a while ago) after I decided to do it versus 2-3 hours to get Microsoft Translate working.

Trying to get Microsoft Translate working I ran into the following complications:
  1. A multi-step registration process yielding numerous constants you send in to their API’s
    1. https://code.google.com/apis/console beats the hell out of https://datamarket.azure.com/account (the Microsoft equivalent as far as I can tell)
  2. An extra http call to get a token that you have to modify before you send it back to them (addRequestHeader("Authorization", "Bearer " + tok.access_token))
    1. Bonus points for inconsistent description of how this worked, although I believe this is now fixed
  3. Inaccurate documentation of arguments
    1. I believe this is now fixed
  4. Unhelpful error messages
  5. Inconsistent documentation of how to send in authorization data 
    1. I believe this is now fixed
  6. ISO-639 2-char language codes for all languages except Chinese.
    1. Chinese requires use of zh-CHS or zh-CHT to distinguish traditional vs simplified. Apparently having "zh" default to one or the other (probably simplified) is less trouble than having this be the exception case to how everything else works.
In case this is useful to someone else, here is the code (Scala 2.9.1.final, . First up, the API our clients will call into:

class Translator(val client: HttpClient) extends Log {
  //We kept Google around just in case we decide to pay for the service one day
  private var translationServices = List(new Google(client), new Microsoft(client))

  def this() = this(new HttpClient())

  def apply(text: String, fromLang: String, toLang: String): String = {
    if (fromLang != toLang && StringUtils.isNotBlank(text))
      translate(text, fromLang, toLang)
    else
      text
  }

  private def translate(text: String, fromLang: String, toLang: String): String = {  
    for (svc <- translationServices) {
      try {
        val tr = svc(text, fromLang, toLang)
        if (StringUtils.isNotBlank(tr)) {
          return tr
        }
      } catch {
        case e: Exception =>
          logger.warn("Translation failed using " + svc.getClass().getSimpleName() + ": " + e.getMessage() + ", moving on...")
      }
    }
    
    return ""
  }
}

Consumers will call into the Translator using code similar to:

  //translate Hello, World from English to Chinese
  var tr = new Translator()
  tr("Hello, World", "en", "zh") 

The interesting part is of course the actual Microsoft implementation:

/**
 * The parent TranslationService just defines def apply(text: String, fromLang: String, toLang: String): String
 */
class Microsoft(client: HttpClient) extends TranslationService(client) with Log {

  private val tokenUri = "https://datamarket.accesscontrol.windows.net/v2/OAuth2-13"
  private val translateUri = "http://api.microsofttranslator.com/V2/Http.svc/Translate"
  private val encoding = "ASCII"

  private val appKey = Map("client_id" -> "THE NAME OF YOUR APP", "client_secret" -> "YOUR CLIENT SECRET")

  private var token = new MsAccessToken

  def this() = this(new HttpClient())

  /**
   * Ref http://msdn.microsoft.com/en-us/library/ff512421.aspx
   */
  override def apply(text: String, fromLang: String, toLang: String): String = {
    
    /**
     * Always try to re-use an existing token
     */
    val firstTry:Option[String] = try {
      Some(callTranslate(token, text, fromLang, toLang))
    } catch {
      case e: Exception =>
        logger.info("Failed to re-use token, will retry with a new one. " + e.getMessage())
        None
    }
    
    /**
     * If we didn't get it using our old token try try again.
     * 99% of the time we do a bunch in a row and it works first time; occasionally we end up
     * needing a new key.
     * Code in block won't run unless firstTry is None.
     */
    val response = firstTry getOrElse {  
      this.token = requestAccessToken()
      callTranslate(token, text, fromLang, toLang)
    }
    

    //response is similar to: <string xmlns="http://schemas.microsoft.com/2003/10/Serialization/">Hallo Welt</string>
    val translation = StringUtils.substringAfter(StringUtils.substringBeforeLast(response, "</string>"), ">")
    translation
  }

  private def callTranslate(tok: MsAccessToken, text: String, fromLang: String, toLang: String) = {
    val get = new GetMethod(translateUri)

    //Thanks MSFT, it's awesome that the language codes are *almost* ISO 639...
    //We need to specify our type of Chinese, http://www.emreakkas.com/internationalization/microsoft-translator-api-languages-list-language-codes-and-names
    val adjustedToLang = if (toLang.equalsIgnoreCase("zh")) "zh-CHS" else toLang

    val queryPairs = Array(
      new NameValuePair("appId", ""),
      new NameValuePair("text", text),
      new NameValuePair("from", fromLang),
      new NameValuePair("to", adjustedToLang))
    get.setQueryString(queryPairs)

    /**
     * http://msdn.microsoft.com/en-us/library/hh454950.aspx
     */
    get.addRequestHeader("Authorization", "Bearer " + tok.access_token)

    val rawResponse = try {
      val sc = client executeMethod get
      val response = get getResponseBodyAsString ()
      if (sc != HttpStatus.SC_OK) {
        throw new IllegalArgumentException("Error translating; Microsoft translate request '"
          + translateUri + "?" + get.getQueryString()
          + "' failed with unexpected code " + sc + ", response: " + response)
      }
      response
    }
    rawResponse
  }

  /**
   * Ref http://msdn.microsoft.com/en-us/library/hh454950.aspx
   */
  def requestAccessToken(): MsAccessToken = {

    val post = new PostMethod(tokenUri)
    post.setParameter("grant_type", "client_credentials")
    post.setParameter("client_id", appKey("client_id"))
    post.setParameter("client_secret", appKey("client_secret"))
    post.setParameter("scope", "http://api.microsofttranslator.com")

    val rawResponse = try {
      val sc = client executeMethod post
      val response = post getResponseBodyAsString ()
      if (sc != HttpStatus.SC_OK) {
        throw new IllegalArgumentException("Error translating; Microsoft access token request failed with unexpected code " + sc + ", response: " + response)
      }
      response
    } finally {
      post releaseConnection
    }

    val tok = Json.fromJson[MsAccessToken](rawResponse, classOf[MsAccessToken])

    tok
  }
}

The MsAccessToken is a rich and exciting class:
  /**
   * Ref http://msdn.microsoft.com/en-us/library/hh454950.aspx
   */
  class MsAccessToken(var access_token: String, var token_type: String, var expires_in: String, var scope: String) {
    def this() = this(null, null, null, null)
  }


A few third party libraries are in play here. For Http we are using the Apache HttpClient. For JSON we are using Google's excellent Gson library, with a simple implementation of 'using' to make working with java.io cleaner:

object Json {
 def writeJson(something: Any): String = new Gson().toJson(something) 
 
 def writeJson(something: Any, os: OutputStream):Unit 
  = using(new OutputStreamWriter(os)) { osw => new Gson().toJson(something, osw) }
 
 def fromJson[T](json: String, graphRoot: Type):T 
  = new Gson().fromJson(json, graphRoot) 
}

The using() function is in another class; it looks like this (ref http://whileonefork.blogspot.com/2011/03/c-using-is-loan-pattern-in-scala.html):
 def using[T <: {def close(): Unit}, R](c: T)(action: T => R): R = {
  try {
   action(c)
  } finally {
   if (null != c) c.close
  }
 }

And with that our Scala calling of M$ Translate is complete!

18 comments:

Unknown said...

Tentu anda pernah mengalaminya, mereka curang?, mereka bot?, mereka admin?. Jawabannya adalah BUKAN!!!. Lalu kenapa bisa seperti itu?.
asikqq
http://dewaqqq.club/
http://sumoqq.today/
interqq
pionpoker
bandar ceme
freebet tanpa deposit
paito warna terlengkap
syair sgp

luckys said...

english to punjabi typing 

Ekarzaen said...

Have you ever wondered how realtime text translation works? Google, Microsoft Machine learning algorithm is one of the best out there. Java is easy

Palanivel Raja said...

SEO - Pals Solutions

Sage 50 and QuickBooks Support said...

Well done! Great article...
QuickBooks Error 3371 status code 11104
Sage 50 keeps asking to update
QuickBooks unable to complete this operations and needs to restart
Cannot connect to Sage 50 database on network
Proforma invoice QuickBooks
Sage 50 import and upload exiting data
How to transfer Sage 50 files to another computer
Find missing QuickBooks 2019 license and product number
Reached to maximum allowed connection to Sage 50
Sage will not accept correct password

Josh said...

Apache cassandra training in chennai
performance tuning training in chennai
mongodb training in chennai
oracle training in chennai

aditya said...
This comment has been removed by the author.
Air Cargo Packers And Movers said...

Packers And Movers Delhi Get Shifting/Relocation
Quotation from ###Packers and Movers Delhi. Packers and Movers Delhi 100% Affordable and Reliable
***Household Shifting Services. Compare Transportation Charges and Save Time, ???Verified and Trusted Packers
and Movers in Delhi, Cheap and Safe Local, Domestic House Shifting @ Air Cargo Packers & Logistics
#PackersMoversDelhi Provides Packers and Movers Delhi, Movers And Packers Delhi, Local Shifting, Relocation,
Packing And Moving, Household Shifting, Office Shifting, Logistics and Transportation, Top Packers Movers, Best
Packers And Movers Delhi, Good Movers And Packers Delhi, Home Shifting, Household Shifting, Best Cheap Top
Movers And Packers Delhi, Moving and Packing Shifting Services Company.

www.webhealthmart.com said...

I found this is an informative latest trending topics youtube channel and website and also very useful and knowledgeable

Trending news spotlight


Trending news spotlight

QB DATA SERVICE SUPPORT said...

QuickBooks Error 6094: An Error that occurred when QuickBooks is opening the company file is the reason why your company file is located in the read-only folder. But if you are looking for resolutions on How to fix Error code 6094 in QuickBooks You can also Dial our Helpline number 800-579-9430 to connect with QB professionals.

pranisha said...

Infycle offers the solitary AWS Training in Chennai for the freshers, professionals, and students along with the additional course such as DevOps Training and Java training for making the candidate an all-rounder in the Software domain field. For a lucrative career, dial 7504633633.

Unknown said...
This comment has been removed by the author.
Unknown said...
This comment has been removed by the author.
Unknown said...

I found this is an informative blog and also very useful and knowledgeable
ativan 2mg online order

Martin said...

Are you searching for top SEO Company in Abu Dhabi which is provides the Best SEO Services in Abu Dhabi at Affordable Today I will Discusses you that UAE Digital Firm Offers. We are specializing in providing information about SEO Companies…..read more

BEGlobal(US)LLC said...

Thanks for sharing. BEglobalUS is a leading Digital Marketing Company in Los Angeles make sure that your web design, SEO, PPC, and email marketing are collaborating to increase traffic to your website and the success of your company.

TheBEGlobal said...

Thanks for sharing. BEglobal presenting Digital Marketing Services in Dubai. To improve our clients' online visibility and raise their return on investment, we create and put into action data-driven marketing strategies.

tripnomadic said...

Thanks for sharing. Many stunning beaches can be found in America, making for the ideal sunny getaway. Explore more about Best Beaches in America at Tripnomadic.

Post a Comment