Class MakeLmBinaryFromGoogle


  • public class MakeLmBinaryFromGoogle
    extends java.lang.Object
    Given a directory in Google n-grams format, builds a binary representation of a stupid-backoff language model language model and writes it to disk. Language model binaries are significantly smaller and faster to load. Note: actually running this code on the full Google-ngrams corpus can be very slow and memory intensive -- on our machines, it takes about 32GB of memory and 15 hours.

    Note that if the input/output files have a .gz suffix, they will be unzipped/zipped as necessary.

    Author:
    adampauls
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static void main​(java.lang.String[] argv)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • MakeLmBinaryFromGoogle

        public MakeLmBinaryFromGoogle()
    • Method Detail

      • main

        public static void main​(java.lang.String[] argv)