appwiki:compression

Differences

This shows you the differences between two versions of the page.


Previous revision
appwiki:compression [2022/11/03 09:33] (current) – [Common Problem on compressed File and Solution] ying
Line 1: Line 1:
 +====== File name encoding problem in File Compression Method ======
 +
 +  * Problem: specially when cross-platform, file name encoding is difference in different compression tool, like Zip, Gz, Bz2.
 +    * Solution 1: using tar to compress
 +    * Solution 2 in linux: unzip -O CP936 non_english_name.zip (means using GBK, GB18030 Chinese code)
 +    * Solution 3 use Java: jar xvf non_english_name.zip
 +      * ref: http://www.111cn.net/sys/linux/72590.htm
 +
 +  * Problem: when zip or winrar uncompress a non-english encoding archive file, sometimes require set system locale language and reboot to get name uncompressed right.
 +    * Solution 1 (windows method): using already-to-use-build zip and unzip tool from DotNetZip library, which support encoding and decode option <code>Unzip.exe -cp 936 chinese_name_content.zip</code>
 +      * download and it's under its tool folder: https://dotnetzip.codeplex.com/
 +      * ref: http://www.chengxuyuans.com/Ruby/41584.html
 +      * windows code page: https://en.wikipedia.org/wiki/Windows_code_page
 +        * 936 and 1386 for GBK
 +        * 932 and 943 for shift JIS
 +    * Solution 2 (cross platform) using Python (sometime works, sometimes error encoding): <code>python xZip.py non_english.zip decode_language</code> (such as gbk for Chinese, decode_language code refer to this https://docs.python.org/2/library/codecs.html )
 +      * here is the python code for xZip.py<code python xZip.py>
 +# full list of codec: https://docs.python.org/2/library/codecs.html
 +# note: 
 +# - input from command line is using commandline system default locale encoding
 +# - it read the zip file path in unicode format with the given decode method
 +# - if you use python print method to print those unicode path in window command windows,
 +#   it may error when system default locale codec can't print those unicode characters
 +import zipfile   
 +import os.path   
 +import os
 +import sys
 +   
 +class ZFile(object):   
 +    def __init__(self, filename, mode='r', basedir=''):   
 +        self.filename = filename   
 +        self.mode = mode   
 +        if self.mode in ('w', 'a'):   
 +            self.zfile = zipfile.ZipFile(filename, self.mode, compression=zipfile.ZIP_DEFLATED)   
 +        else:   
 +            self.zfile = zipfile.ZipFile(filename, self.mode)   
 +        self.basedir = basedir   
 +        if not self.basedir:   
 +            self.basedir = os.path.dirname(filename)   
 +          
 +    def addfile(self, path, arcname=None):   
 +        path = path.replace('//', '/'  
 +        if not arcname:   
 +            if path.startswith(self.basedir):   
 +                arcname = path[len(self.basedir):  
 +            else:   
 +                arcname = ''   
 +        self.zfile.write(path, arcname)   
 +              
 +    def addfiles(self, paths):   
 +        for path in paths:   
 +            if isinstance(path, tuple):   
 +                self.addfile(*path)   
 +            else:   
 +                self.addfile(path)   
 +              
 +    def close(self):   
 +        self.zfile.close()   
 +          
 +    def extract_to(self, path, decode):   
 +        for p in self.zfile.namelist():   
 +            self.extract(p, path, decode)   
 +              
 +    def extract(self, filename, path, decode):   
 +        if not filename.endswith('/'):   
 +            f = os.path.join(path, filename.decode(decode))   #gbk,gb18030, GB2312, utf-8
 +            dir = os.path.dirname(f)
 +            if not os.path.exists(dir):
 +                os.makedirs(dir)   
 +            file(f, 'wb').write(self.zfile.read(filename))   
 +              
 +          
 +def create(zfile, files):   
 +    z = ZFile(zfile, 'w'  
 +    z.addfiles(files)   
 +    z.close()   
 +      
 +def extract(zfile, path, decode):   
 +    z = ZFile(zfile)   
 +    z.extract_to(path, decode)   
 +    z.close() 
 +
 +if __name__=="__main__":
 +    extract(unicode(sys.argv[1]), u'.', sys.argv[2])
 +</code>
 +
 +  * Alternative solution: extract normally with wrong-encoding names, then fixing those name using python decode and encode
 +
 +  * Site Notes:
 +    * in windows commands, chcp is used to change display page code (file name encoding) [[https://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396|page code list]]
 +
 +
 +  * additional reading:
 +    * http://www.docin.com/p-739332424.html
 +    * http://www.cnblogs.com/qq78292959/archive/2013/03/27/2985310.html
 +    * https://allencch.wordpress.com/2010/12/06/how-to-extract-zip-file-which-contains-filenames-with-shift_jis-encoding-in-ubuntu/
 +    * https://www.mkssoftware.com/docs/man1/unzip.1.asp
 +
 +====== Common Problem on compressed File and Solution ======
 +
 +  * Problem: Winrar has update the version recently, only winrar can't open some new winrar file.
 +    * Solution: get latest 7z to uncompress it, will be fine. http://www.7-zip.org/
 +
 +====== Winrar ======
 +
 +  * extract with cmd <code>unrar x Pack.rar</code>
 +
 +