File operation
IOThe operation is generally referred to as file IO. If it refers to network IO, it will directly talk about network IO.
cp936 = codepage936It’s GBK, the way Microsoft codes.
China’s information technology is relatively late, the earliest Chinese coding is Big5 code, and then unified use of GB2312, now the general is GBK code
,Using two-bit byte encoding, encoding range 0-65535, retaining the ASCII code 0-127, Chinese characters use two bytes, by looking up the encoding table, to determine the meaning of a string of bytes.
You can see the meaning of 0 and 1 on disk by encoding.
UnicodeCoding system, Unicode and international standard organization’s global coding method. Using two-byte encoding, which wastes storage space for single-byte Latin-alphabet countries, results in Unicode transport standard utf8.
utf-8 (8-bit Unicode Transformation Format) Multi byte, from one to six bytes, most of the 3 bytes in Chinese.
File IO commonly used operation
open(file, mode=‘r’, buffering=–1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
The file system is a subsystem of the operating system, the file can not find the return error, is the system returned OSError, operating system file system error reporting
open Through the path system, using system calls, operating system functions, call API interface, return a file object
The file address finds the file, which is interpreted as the stream object, and reads the file by encoding.
openThe default mode is RT text mode read-only open, no OsError thrown.
The file must be closed when the file is used, and the flow object can not be closed. F.close ().
Access mode: text mode, binary mode
open Parameter,
file File name, if the path is not specified, the default current path. Absolute path, relative path.
The operation involves pointer operation, and the pointer to the current byte location.
mode r,w,a,x,b, t,+
r,Read only mode, not write operation, pointer to 0.
w,Write-only operation, overwrite execution, unreadable, equivalent to a new file, pointer to 0, write to create a new file f.write, buffering may not be able to see directly, after the flush operation is performed after closing, can catch view, if seek, will also perform a flush.
x,Do not overwrite, exist the same name file thrown exception, there is no new, create
a,Appending the tail does not exist to create new files. You can write logs.
t,Character Stream, Character, Return Character Stream, According to some kind of coding understanding, default None, will judge for themselves, but still have to determine their own coding.
b,Byte stream, byte mode, and return byte stream, which are independent of encoding. Byte stream is operated by bytes.
+,Append function, append no function, + provide rwax with missing read or write function.
r+ Read it from scratch, and write it over again.
w+ Write directly overwrite, clear directly, readable, but if the data is characters, it may be in the buffer, read does not come out
t,b,+ It can not be used independently.
File type extension, the default will be associated with some operations, the system through the file extension to view the file, read the information in the file header, to determine whether it can be executed, such as you use a text reader, open MP3 format, can not open, modify the extension, will also display a series of scrambled code.
When scrambling occurs, it is best to specify encoding when creating a file. CP936 is used by default under Windows, UTF-8 is used, and UTF-8 is converted to linux, so scrambling does not occur.
buffering Buffer
It is a memory space, most of which are FIFO queues, exceptions are rings. When the buffer is full or the threshold is reached, the data is only flush to disk.
-1Default, binary, and text indicate default, default buffer size 8192 or 4096 bytes
0Used only in binary, which means no buffer, can be regarded as a FIFO file, written directly to disk.
1Only for text mode, line buffer mode, text mode only with the default buffer, see the newline character on the newline, after the same line content, write to disk together.
Greater than 1 indicates the size of the specified buffer, and the text pattern has no effect.
seekFlush once, close will call flush once, batch processing is a better way to use computers.
Just remember:
Text mode generally uses the default buffer.
Binary mode is a byte operation, which can specify the size of buffer.
In general, the default buffer size is a good choice, and the tower is not adjusted unless it is clearly known. Parameters do not understand, do not move.
In general programming, you can call flush manually once knowing you need to write a disk, but using it less, unlike computer habits, can degrade performance
encoding Encoding, only in text mode
NoneIt means default, depending on the operating system, GBK under windows. UTF-8 under Linux, if you don’t write, you may be confused.
When you cross platform, you must write the coding way, and the best way is to write it all.
error ignoreIndicates that the ValueError is thrown when None and strict indicate errors.
newline New line, text mode, line changing.
newline = None,According to the operating system wrap, ( r, n, r n = linux, ios, windows), the’empty string r, n, r \ n’in the file is converted to the current system wrap, and under Linux is \ n
newline = ” It means no conversion.
Others indicate branches using specified line breaks. \n or “\n” does not replace. Other legal characters are replaced by the specified characters.
File descriptor, is a limited resource, using IO under the file open file id, the default 0, 1, 2 used to, standard input, standard output, standard error
closefdClose the file descriptor = True, keep Flase closed until the next time you use it, fileobj. fileno () looks at the file descriptor
read(size)
Read operation, specify size, do not specify default to EOF, end.
read(size=-1),Indicates that the number of bytes read starts from the pointer. -1 is the default, and reads to the end.
readlineLine by line read
readlines(hint=-1) Immediately returns a list, which can directly iterate the file, and also handle it in line, dealing only with text mode.
write Write in
writelines,Write a list that provides its own newline character, writelines (map (lambd…, list), or defaults to the operating system
f.close() flashAnd close the file.
f.closed()To determine whether to open, seekable readable, writeable, to determine whether it can be indexed, readable, and writable.
tell() Displays the pointer’s current location, mode = R, when the pointer is 0 and moda=a, the pointer is EOF.
seek(offset,[whence])
In text mode, offset byte position movement distance can only be positive integer, when position (0, 1, 2 = header, current, tail), default value is 0, set to 1, offset can only be 0, when is 2, move pointer to tail, offset isZero
In binary mode, when 0 is default, offset can only be positive, when 1 means that offset can be positive or negative from the current position, when 2 means that offset can be positive or negative from the EOF position, offset can be positive or negative, can not exceed the left limit, can exceed the right limit. Byte offset. Binary supports offset at any starting point.
b0a1 gbkThe first is Chinese character.
Context management
lsof View all open files LS open file, without using Yum install lsof command to install
The opening limit of files can be viewed with ulimit -a. The number of file descriptors is at the upper limit of 1024.
Experiments can be done to open 1024 files, put them in a container, but finally be sure to close, using the container to record intermediate variables.
lsof -p 1427 | grep test |wc -l
Using containers to store files to open directory can be closed in batches.
try
f = open('test') try: f.write("abc") finally: f.close() print("~~~~")
try…finally The structure can ensure that the file is closed, but it is guaranteed that the previous sentence f = open (“test”) must be established.
with Powerful way to close files
f2 = open('test2') with f2: f2.write('ssss') print('wwwww')
withObjects can be returned after the function returns the statement executed, or even before the program exits, and the with statement closes the object after execution. Sys.exit (-1)
with Context-managed statement blocks do not start a new scope, similar to if’s, but do not open a new scope
with…as
del f2 with open("test2") as f2: f2.write('ssss') print(f2.closed,'!!!!!')
It can be understood that the open object is named F2.
For IO objects like file objects, they usually need to be closed and canceled when they are not used to release resources.
IOWhen opened, a temporary file descriptor is obtained, and the computer resources are limited, so the operating system is limited. It is to protect computer resources from exhaustion, computer resources are shared, not exclusive.
Under normal circumstances, unless the resources are clearly known, the resource limit should not be easily raised.