public class BOMInputStream extends InputStream
To use create an new BOMInputStream() with the InputStream as the constructor parameter. The BOMInputStream will then immediately scan the input for any of the recognized BOM markers. You can then call getCharSet() to get the encoding type detected by the BOMInputStream with null being returned if no BOM was recognized.
Normally, the bytes of a recognized BOM will not be passed to the normal read() methods, but calling transmitBOM() will tell the BOMInputStream to pass the BOM before the other bytes.
Calling getReader() will automatically create the correct Reader for the InputStream and return it. You should not use this with transmitBOM().
You can also choose to ignore BOM markers at a particular level using the constructor BOMInputStream(InputStream in, int ignoreLevel).
Modifier and Type | Field and Description |
---|---|
static int |
IGNORE_ALL
Ignore UTF-32 and UTF-16 and UTF-8 markers.
|
static int |
IGNORE_NONE
Don't ignore any BOM marker types.
|
static int |
IGNORE_UTF32
Ignore UTF-32 markers but handle UTF-16 and UTF-8 markers.
|
static int |
IGNORE_UTF32_UTF16
Ignore UTF-32 and UTF-16 markers but handle UTF-8 markers.
|
static int |
IGNORE_UTF32_UTF16_UTF8
Ignore UTF-32 and UTF-16 and UTF-8 markers.
|
protected int |
ignoreLevel |
Constructor and Description |
---|
BOMInputStream(InputStream in)
This is the same as BOMInputStream(InputStream in, int ignoreLevel) with an ignoreLevel value of IGNORE_NONE (0).
|
BOMInputStream(InputStream in,
int ignoreLevel) |
Modifier and Type | Method and Description |
---|---|
int |
available()
This method returns the number of bytes that can be read from this
stream before a read can block.
|
protected void |
checkBOM()
You can override this to check for the BOM using the checkBOM method.
|
protected boolean |
checkBOM(int[] markerBytes,
String set)
This matches the bytes at the start of the input stream with the provided markerBytes.
|
void |
close()
This method closes the stream.
|
String |
getCharset()
Get the detected charset - which may be null if no BOM marker was detected.
|
Reader |
getReader() |
Reader |
getReader(String defaultCharset) |
protected int |
inBOMBuffer()
Return the number of bytes in the internal BOM buffer.
|
int |
read()
This method reads an unsigned byte from the input stream and returns it
as an int in the range of 0-255.
|
int |
read(byte[] buff)
This method reads bytes from a stream and stores them into a caller
supplied buffer.
|
int |
read(byte[] buff,
int offset,
int length)
This method read bytes from a stream and stores them into a
caller supplied buffer.
|
protected void |
resetBOM()
This can be called within checkBOM() to move any possibly found BOM bytes back into the buffer
and to set the charset value to null.
|
void |
transmitBOM() |
finalize, hashCode, mark, markSupported, reset, skip
protected int ignoreLevel
public static final int IGNORE_NONE
public static final int IGNORE_UTF32
public static final int IGNORE_UTF32_UTF16
public static final int IGNORE_UTF32_UTF16_UTF8
public static final int IGNORE_ALL
public BOMInputStream(InputStream in) throws IOException
in
- the InputStream to scan for the BOM.IOException
public BOMInputStream(InputStream in, int ignoreLevel) throws IOException
in
- the InputStream to scan for the BOM.ignoreLevel
- one of the IGNORE_XXX values.IOException
protected int inBOMBuffer()
protected boolean checkBOM(int[] markerBytes, String set)
markerBytes
- the marker bytes (as ints) to match.set
- the name of the char set to set if there is a match.protected void resetBOM()
protected void checkBOM()
public String getCharset()
public Reader getReader(String defaultCharset) throws UnsupportedEncodingException
UnsupportedEncodingException
public Reader getReader()
public void transmitBOM()
public final int available() throws IOException
InputStream
This method always returns 0 in this class
available
in class InputStream
IOException
- If an error occurspublic final int read() throws IOException
InputStream
This method will block until the byte can be read.
read
in class InputStream
IOException
- If an error occurspublic final int read(byte[] buff) throws IOException
InputStream
This method will block until some data can be read.
This method operates by calling an overloaded read method like so:
read(b, 0, b.length)
read
in class InputStream
buff
- The buffer into which the bytes read will be stored.IOException
- If an error occurs.public final int read(byte[] buff, int offset, int length) throws IOException
InputStream
off
into the buffer and attempts to read
len
bytes. This method can return before reading the
number of bytes requested. The actual number of bytes read is
returned as an int. A -1 is returned to indicate the end of the
stream.
This method will block until some data can be read.
This method operates by calling the single byte read()
method
in a loop until the desired number of bytes are read. The read loop
stops short if the end of the stream is encountered or if an IOException
is encountered on any read operation except the first. If the first
attempt to read a bytes fails, the IOException is allowed to propagate
upward. And subsequent IOException is caught and treated identically
to an end of stream condition. Subclasses can (and should if possible)
override this method to provide a more efficient implementation.
read
in class InputStream
buff
- The array into which the bytes read should be storedoffset
- The offset into the array to start storing byteslength
- The requested number of bytes to readIOException
- If an error occurs.public final void close() throws IOException
InputStream
IOException
This method does nothing in this class, but subclasses may override this method in order to provide additional functionality.
close
in class InputStream
IOException
- If an error occurs, which can only happen
in a subclass