Differences

This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+====== File name encoding problem in File Compression Method ======
+  * Problem: specially when cross-platform, file name encoding is difference in different compression tool, like Zip, Gz, Bz2.
+    * Solution 1: using tar to compress
+    * Solution 2 in linux: unzip -O CP936 non_english_name.zip (means using GBK, GB18030 Chinese code)
+    * Solution 3 use Java: jar xvf non_english_name.zip
+      * ref: http://www.111cn.net/sys/linux/72590.htm
+  * Problem: when zip or winrar uncompress a non-english encoding archive file, sometimes require set system locale language and reboot to get name uncompressed right.
+    * Solution 1 (windows method): using already-to-use-build zip and unzip tool from DotNetZip library, which support encoding and decode option <code>Unzip.exe -cp 936 chinese_name_content.zip</code>
+      * download and it's under its tool folder: https://dotnetzip.codeplex.com/
+      * ref: http://www.chengxuyuans.com/Ruby/41584.html
+      * windows code page: https://en.wikipedia.org/wiki/Windows_code_page
+        * 936 and 1386 for GBK
+        * 932 and 943 for shift JIS
+    * Solution 2 (cross platform) using Python (sometime works, sometimes error encoding): <code>python xZip.py non_english.zip decode_language</code> (such as gbk for Chinese, decode_language code refer to this https://docs.python.org/2/library/codecs.html )
+      * here is the python code for xZip.py<code python xZip.py>
+# full list of codec: https://docs.python.org/2/library/codecs.html
+# note:
+# - input from command line is using commandline system default locale encoding
+# - it read the zip file path in unicode format with the given decode method
+# - if you use python print method to print those unicode path in window command windows,
+#   it may error when system default locale codec can't print those unicode characters
+import zipfile
+import os.path
+import os
+import sys
+class ZFile(object):
+    def __init__(self, filename, mode='r', basedir=''):
+        self.filename = filename
+        self.mode = mode
+        if self.mode in ('w', 'a'):
+            self.zfile = zipfile.ZipFile(filename, self.mode, compression=zipfile.ZIP_DEFLATED)
+        else:
+            self.zfile = zipfile.ZipFile(filename, self.mode)
+        self.basedir = basedir
+        if not self.basedir:
+            self.basedir = os.path.dirname(filename)
+    def addfile(self, path, arcname=None):
+        path = path.replace('//', '/')
+        if not arcname:
+            if path.startswith(self.basedir):
+                arcname = path[len(self.basedir):]
+            else:
+                arcname = ''
+        self.zfile.write(path, arcname)
+    def addfiles(self, paths):
+        for path in paths:
+            if isinstance(path, tuple):
+                self.addfile(*path)
+            else:
+                self.addfile(path)
+    def close(self):
+        self.zfile.close()
+    def extract_to(self, path, decode):
+        for p in self.zfile.namelist():
+            self.extract(p, path, decode)
+    def extract(self, filename, path, decode):
+        if not filename.endswith('/'):
+            f = os.path.join(path, filename.decode(decode))   #gbk,gb18030, GB2312, utf-8
+            dir = os.path.dirname(f)
+            if not os.path.exists(dir):
+                os.makedirs(dir)
+            file(f, 'wb').write(self.zfile.read(filename))
+def create(zfile, files):
+    z = ZFile(zfile, 'w')
+    z.addfiles(files)
+    z.close()
+def extract(zfile, path, decode):
+    z = ZFile(zfile)
+    z.extract_to(path, decode)
+    z.close()
+if __name__=="__main__":
+    extract(unicode(sys.argv[1]), u'.', sys.argv[2])
+</code>
+  * Alternative solution: extract normally with wrong-encoding names, then fixing those name using python decode and encode
+  * Site Notes:
+    * in windows commands, chcp is used to change display page code (file name encoding) [[https://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396|page code list]]
+  * additional reading:
+    * http://www.docin.com/p-739332424.html
+    * http://www.cnblogs.com/qq78292959/archive/2013/03/27/2985310.html
+    * https://allencch.wordpress.com/2010/12/06/how-to-extract-zip-file-which-contains-filenames-with-shift_jis-encoding-in-ubuntu/
+    * https://www.mkssoftware.com/docs/man1/unzip.1.asp
+====== Common Problem on compressed File and Solution ======
+  * Problem: Winrar has update the version recently, only winrar can't open some new winrar file.
+    * Solution: get latest 7z to uncompress it, will be fine. http://www.7-zip.org/
+====== Winrar ======
+  * extract with cmd <code>unrar x Pack.rar</code>