Pages

Friday, February 17, 2012

Moving from Google Translate API to Microsoft Translate API in Scala

Google Translate used to be gods gift to developers who want to verify their internationalization works. It made localizing to some random locale for test purposes downright trivial. And then the bastards went and deprecated it (http://googlecode.blogspot.com/2011/05/spring-cleaning-for-some-of-our-apis.html), ultimately making it it pay to play.

The rates for Google Translate are so low it is unlikely we can justify switching to Microsoft Translate based on cost but we love "free" so we'll spend a few expensive hours of developer time on it anyway. M$ translate is free for your first 2M characters and you can pay for more.

The first thing that one notices upon trying to call a Microsoft web service is that it isn't as easy as you'd like. From reading the http://api.microsofttranslator.com/V2/Http.svc/Translate API to getting a successful translation call through from code took WAY longer than Google Translate (or other APIs) and had more bumps in the road. I'm sure others have differing experiences but for me I had programmatic access to Google Translate working maybe 30-60 minutes (it was a while ago) after I decided to do it versus 2-3 hours to get Microsoft Translate working.

Trying to get Microsoft Translate working I ran into the following complications:
  1. A multi-step registration process yielding numerous constants you send in to their API’s
    1. https://code.google.com/apis/console beats the hell out of https://datamarket.azure.com/account (the Microsoft equivalent as far as I can tell)
  2. An extra http call to get a token that you have to modify before you send it back to them (addRequestHeader("Authorization", "Bearer " + tok.access_token))
    1. Bonus points for inconsistent description of how this worked, although I believe this is now fixed
  3. Inaccurate documentation of arguments
    1. I believe this is now fixed
  4. Unhelpful error messages
  5. Inconsistent documentation of how to send in authorization data 
    1. I believe this is now fixed
  6. ISO-639 2-char language codes for all languages except Chinese.
    1. Chinese requires use of zh-CHS or zh-CHT to distinguish traditional vs simplified. Apparently having "zh" default to one or the other (probably simplified) is less trouble than having this be the exception case to how everything else works.
In case this is useful to someone else, here is the code (Scala 2.9.1.final, . First up, the API our clients will call into:

class Translator(val client: HttpClient) extends Log {
  //We kept Google around just in case we decide to pay for the service one day
  private var translationServices = List(new Google(client), new Microsoft(client))

  def this() = this(new HttpClient())

  def apply(text: String, fromLang: String, toLang: String): String = {
    if (fromLang != toLang && StringUtils.isNotBlank(text))
      translate(text, fromLang, toLang)
    else
      text
  }

  private def translate(text: String, fromLang: String, toLang: String): String = {  
    for (svc <- translationServices) {
      try {
        val tr = svc(text, fromLang, toLang)
        if (StringUtils.isNotBlank(tr)) {
          return tr
        }
      } catch {
        case e: Exception =>
          logger.warn("Translation failed using " + svc.getClass().getSimpleName() + ": " + e.getMessage() + ", moving on...")
      }
    }
    
    return ""
  }
}

Consumers will call into the Translator using code similar to:

  //translate Hello, World from English to Chinese
  var tr = new Translator()
  tr("Hello, World", "en", "zh") 

The interesting part is of course the actual Microsoft implementation:

/**
 * The parent TranslationService just defines def apply(text: String, fromLang: String, toLang: String): String
 */
class Microsoft(client: HttpClient) extends TranslationService(client) with Log {

  private val tokenUri = "https://datamarket.accesscontrol.windows.net/v2/OAuth2-13"
  private val translateUri = "http://api.microsofttranslator.com/V2/Http.svc/Translate"
  private val encoding = "ASCII"

  private val appKey = Map("client_id" -> "THE NAME OF YOUR APP", "client_secret" -> "YOUR CLIENT SECRET")

  private var token = new MsAccessToken

  def this() = this(new HttpClient())

  /**
   * Ref http://msdn.microsoft.com/en-us/library/ff512421.aspx
   */
  override def apply(text: String, fromLang: String, toLang: String): String = {
    
    /**
     * Always try to re-use an existing token
     */
    val firstTry:Option[String] = try {
      Some(callTranslate(token, text, fromLang, toLang))
    } catch {
      case e: Exception =>
        logger.info("Failed to re-use token, will retry with a new one. " + e.getMessage())
        None
    }
    
    /**
     * If we didn't get it using our old token try try again.
     * 99% of the time we do a bunch in a row and it works first time; occasionally we end up
     * needing a new key.
     * Code in block won't run unless firstTry is None.
     */
    val response = firstTry getOrElse {  
      this.token = requestAccessToken()
      callTranslate(token, text, fromLang, toLang)
    }
    

    //response is similar to: <string xmlns="http://schemas.microsoft.com/2003/10/Serialization/">Hallo Welt</string>
    val translation = StringUtils.substringAfter(StringUtils.substringBeforeLast(response, "</string>"), ">")
    translation
  }

  private def callTranslate(tok: MsAccessToken, text: String, fromLang: String, toLang: String) = {
    val get = new GetMethod(translateUri)

    //Thanks MSFT, it's awesome that the language codes are *almost* ISO 639...
    //We need to specify our type of Chinese, http://www.emreakkas.com/internationalization/microsoft-translator-api-languages-list-language-codes-and-names
    val adjustedToLang = if (toLang.equalsIgnoreCase("zh")) "zh-CHS" else toLang

    val queryPairs = Array(
      new NameValuePair("appId", ""),
      new NameValuePair("text", text),
      new NameValuePair("from", fromLang),
      new NameValuePair("to", adjustedToLang))
    get.setQueryString(queryPairs)

    /**
     * http://msdn.microsoft.com/en-us/library/hh454950.aspx
     */
    get.addRequestHeader("Authorization", "Bearer " + tok.access_token)

    val rawResponse = try {
      val sc = client executeMethod get
      val response = get getResponseBodyAsString ()
      if (sc != HttpStatus.SC_OK) {
        throw new IllegalArgumentException("Error translating; Microsoft translate request '"
          + translateUri + "?" + get.getQueryString()
          + "' failed with unexpected code " + sc + ", response: " + response)
      }
      response
    }
    rawResponse
  }

  /**
   * Ref http://msdn.microsoft.com/en-us/library/hh454950.aspx
   */
  def requestAccessToken(): MsAccessToken = {

    val post = new PostMethod(tokenUri)
    post.setParameter("grant_type", "client_credentials")
    post.setParameter("client_id", appKey("client_id"))
    post.setParameter("client_secret", appKey("client_secret"))
    post.setParameter("scope", "http://api.microsofttranslator.com")

    val rawResponse = try {
      val sc = client executeMethod post
      val response = post getResponseBodyAsString ()
      if (sc != HttpStatus.SC_OK) {
        throw new IllegalArgumentException("Error translating; Microsoft access token request failed with unexpected code " + sc + ", response: " + response)
      }
      response
    } finally {
      post releaseConnection
    }

    val tok = Json.fromJson[MsAccessToken](rawResponse, classOf[MsAccessToken])

    tok
  }
}

The MsAccessToken is a rich and exciting class:
  /**
   * Ref http://msdn.microsoft.com/en-us/library/hh454950.aspx
   */
  class MsAccessToken(var access_token: String, var token_type: String, var expires_in: String, var scope: String) {
    def this() = this(null, null, null, null)
  }


A few third party libraries are in play here. For Http we are using the Apache HttpClient. For JSON we are using Google's excellent Gson library, with a simple implementation of 'using' to make working with java.io cleaner:

object Json {
 def writeJson(something: Any): String = new Gson().toJson(something) 
 
 def writeJson(something: Any, os: OutputStream):Unit 
  = using(new OutputStreamWriter(os)) { osw => new Gson().toJson(something, osw) }
 
 def fromJson[T](json: String, graphRoot: Type):T 
  = new Gson().fromJson(json, graphRoot) 
}

The using() function is in another class; it looks like this (ref http://whileonefork.blogspot.com/2011/03/c-using-is-loan-pattern-in-scala.html):
 def using[T <: {def close(): Unit}, R](c: T)(action: T => R): R = {
  try {
   action(c)
  } finally {
   if (null != c) c.close
  }
 }

And with that our Scala calling of M$ Translate is complete!

22 comments:

  1. Tentu anda pernah mengalaminya, mereka curang?, mereka bot?, mereka admin?. Jawabannya adalah BUKAN!!!. Lalu kenapa bisa seperti itu?.
    asikqq
    http://dewaqqq.club/
    http://sumoqq.today/
    interqq
    pionpoker
    bandar ceme
    freebet tanpa deposit
    paito warna terlengkap
    syair sgp

    ReplyDelete
  2. Have you ever wondered how realtime text translation works? Google, Microsoft Machine learning algorithm is one of the best out there. Java is easy

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Packers And Movers Delhi Get Shifting/Relocation
    Quotation from ###Packers and Movers Delhi. Packers and Movers Delhi 100% Affordable and Reliable
    ***Household Shifting Services. Compare Transportation Charges and Save Time, ???Verified and Trusted Packers
    and Movers in Delhi, Cheap and Safe Local, Domestic House Shifting @ Air Cargo Packers & Logistics
    #PackersMoversDelhi Provides Packers and Movers Delhi, Movers And Packers Delhi, Local Shifting, Relocation,
    Packing And Moving, Household Shifting, Office Shifting, Logistics and Transportation, Top Packers Movers, Best
    Packers And Movers Delhi, Good Movers And Packers Delhi, Home Shifting, Household Shifting, Best Cheap Top
    Movers And Packers Delhi, Moving and Packing Shifting Services Company.

    ReplyDelete
  5. I found this is an informative latest trending topics youtube channel and website and also very useful and knowledgeable

    Trending news spotlight


    Trending news spotlight

    ReplyDelete
  6. QuickBooks Error 6094: An Error that occurred when QuickBooks is opening the company file is the reason why your company file is located in the read-only folder. But if you are looking for resolutions on How to fix Error code 6094 in QuickBooks You can also Dial our Helpline number 800-579-9430 to connect with QB professionals.

    ReplyDelete
  7. Infycle offers the solitary AWS Training in Chennai for the freshers, professionals, and students along with the additional course such as DevOps Training and Java training for making the candidate an all-rounder in the Software domain field. For a lucrative career, dial 7504633633.

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. I found this is an informative blog and also very useful and knowledgeable
    ativan 2mg online order

    ReplyDelete
  11. Are you searching for top SEO Company in Abu Dhabi which is provides the Best SEO Services in Abu Dhabi at Affordable Today I will Discusses you that UAE Digital Firm Offers. We are specializing in providing information about SEO Companies…..read more

    ReplyDelete
  12. Thanks for sharing. BEglobalUS is a leading Digital Marketing Company in Los Angeles make sure that your web design, SEO, PPC, and email marketing are collaborating to increase traffic to your website and the success of your company.

    ReplyDelete
  13. Thanks for sharing. BEglobal presenting Digital Marketing Services in Dubai. To improve our clients' online visibility and raise their return on investment, we create and put into action data-driven marketing strategies.

    ReplyDelete
  14. Thanks for sharing. Many stunning beaches can be found in America, making for the ideal sunny getaway. Explore more about Best Beaches in America at Tripnomadic.

    ReplyDelete
  15. Thanks for Sharing. Google Video Search allows users to find video content across the web efficiently. By leveraging Google's powerful algorithms, it delivers highly relevant results from various sources, including YouTube. This tool enhances content discovery, making it easier for users to access specific video information quickly.

    ReplyDelete
  16. This is a fantastic guide for anyone working with Spring and Hibernate! Using an in-memory database for unit testing is such an efficient way to ensure a faster and cleaner testing environment without the overhead of managing a real database. Your explanation of the setup and benefits is spot-on. One thing I'd love to see is a deeper dive into handling complex test cases or edge scenarios, like testing transactions or lazy-loaded entities. Overall, a very practical and insightful post!

    Digital Marketing Course In Hyderabad

    ReplyDelete
  17. "Great article! The detailed comparison between Google and Microsoft Translate APIs is very helpful for developers considering a transition. I appreciate the inclusion of Scala-specific code examples—they make it easier to understand the practical aspects of implementation. The breakdown of Microsoft API's unique features and its pricing model compared to Google’s is insightful. Could you also explore any potential limitations with Microsoft Translate that developers should be aware of, especially for large-scale applications? Thanks for the informative guide!"

    Digital Marketing Course In Ameerpet

    ReplyDelete
  18. This guide is excellent for anyone working with Spring and Hibernate! Utilizing an in-memory database for unit testing is an incredibly efficient way to create a faster, cleaner testing environment without the hassle of managing a real database. Your explanation of the setup and its advantages is perfect. I'd love to see a more in-depth exploration of handling complex test cases or edge scenarios, such as testing transactions or lazy-loaded entities. Overall, it's a highly practical and insightful post!
    Digital Marketing Training In Ameerpet

    ReplyDelete