Understanding Java's Substring Method in Java

prachi sharma
Posted on 1st Apr 2024 1:02 AM | 10 min Read | 60 min Implementation
#java #substring #String

Introduction


In Java, a string is an object that represents a sequence of characters. It is a fundamental data type used to store and manipulate text. Strings in Java are instances of the java.lang.String class, which provides various methods for working with strings.


Strings in Java are immutable, meaning once created, their values cannot be changed. However, you can perform various operations on strings such as concatenation, substring extraction, searching, replacing, and more, which will result in new string objects.One such method we are going to talk about is the Substring method.


In Java, Substring is a subset of String. It means a desired part of the String is extracted from the original String.Strings are frequently used in Java programs and can occupy a significant portion of the computer's memory. Because of this, they have been improved many times to make them work better.


How to Use the substring Method:


There are two ways you can use the Substring method,

Let's say you have a sentence:


Hello I am learning Java.


A) When you specify starting index and ending index(which is not included)

public String substring(int beginIndex, int endIndex)


Example:

if you want to get "Hello" from the sentence, you'd specify the starting index as 0 (because indexing in Java starts from 0) and the ending index as 5 (exclusive, meaning it stops before the character at the ending index).

So you'd call substring(0, 5) , will return Hello


B) When you only provide starting index

public String substring(int beginIndex)


Example:

if you call substring(7), it will return "I am learning Java." because it starts from index 7 ("I") and goes all the way to the end of the string.




Internal Workings of Substring Method:


The substring method has been significantly altered. This is how the substring works internally. This is the only method that does not follow camel casing.


A) When you specify starting index and ending index(which is not included)

public String substring(int beginIndex, int endIndex) {
intlength= length();
checkBoundsBeginEnd(beginIndex, endIndex, length);
if (beginIndex == 0 && endIndex == length) {
return this;
}
intsubLen= endIndex - beginIndex;
return isLatin1()
? StringLatin1.newString(value, beginIndex, subLen)
: StringUTF16.newString(value, beginIndex, subLen);
}


B) When you only provide starting index

public String substring(int beginIndex) {
return substring(beginIndex, length());
}


The above snippet A) can be explained as:


  1. beginIndex == 0: Checks if the beginning index is at the start of the string or array.
  2. endIndex == length: Checks if the ending index is at the end of the string or array.

If both conditions are true, it means the entire string or array is selected, so the method returns the original object (this).


Here return this condition checks if the beginIndex is 0 and the endIndex is equal to the length of string (probably an array or a string). If this condition is true, it returns the object itself (this).

This condition represents a scenario where you're trying to extract a substring from the beginning to the end of a string or array. By checking if beginIndex is 0 and endIndex is the length, you ensure that the entire string or array is selected, and thus, there's no need to create a new substring or subarray. Instead, you can simply return the original object.



Method for Bounds Checking:

A static method checkBoundsBeginEnd ensures the validity of the specified indices against the length of the string or array. It throws a "StringIndexOutOfBoundsException" if any index is out of bounds.


static void checkBoundsBeginEnd(int begin, int end, int length) {
if (begin < 0 || begin > end || end > length) {
throw new StringIndexOutOfBoundsException(
"begin " + begin + ", end " + end + ", length " + length);
}
}


This code snippet defines a static method checkBoundsBeginEnd that takes three integer parameters: begin, end, and length. The purpose of this method is to validate the bounds specified by begin and end against the length of some container, such as a string or an array.

Description is as follows:


begin < 0: This condition checks if the begin index is less than 0, which would indicate an out-of-bounds start index.

begin > end: This condition ensures that the begin index is not greater than the end index, as it would indicate invalid bounds.

end > length: This condition checks if the end index is greater than the length of the container, which would indicate an out-of-bounds end index.


If any of these conditions are true, it throws a StringIndexOutOfBoundsException with a message indicating the values of begin, end, and length that caused the exception.




Evolution of Substring Method:


Until Java 6, when using the 'substring()' method, the resulting substring would share the same character array as the original string. The starting position and length of the substring were stored within the original string's 'offset' and 'count' fields.


Below is the excerpt of the 'substring()' method from Java versions 1 through 6:


public String substring(int beginIndex, int endIndex) {
return ((beginIndex == 0) && (endIndex == count))
? this
: new String(offset + beginIndex, endIndex - beginIndex, value);
}


If the substring encompassed the entire original string, the original string itself would be returned. Otherwise, a new 'String' object would be created using the following constructor:


String(int offset, int count, char value[]) {
this.value = value;
this.offset = offset;
this.count = count;
}



DataType

Parameter

Description

char[]

value

This parameter is a character array (char[]) containing the characters from which the new string will be created.

int

offset

The offset parameter indicates the starting position within the data array from which characters will be included in the new string. It represents the index of the first character to include.

int

count

This parameter specifies the number of characters to include in the new string, starting from the offset. It determines how many characters will be copied from the data array to form the string.



Consequently, the substring and the original string would share the same character array, differing only in the 'offset' and 'count' values defining the specific section of the character array. The developers of the JDK got two benefits from this approach:


1. Reduced memory consumption on the heap.

2. Faster execution of the 'substring' method compared to copying the character array.


However, an important drawback was overlooked:


If the original string was no longer required, the garbage collector would be unable to reclaim its character array because the substring still held a reference to it. For instance, if the original string contained 10,000 characters and the substring only contained ten characters, then approximately 9,990 characters (or nearly 20 KB, considering each `char` occupies two bytes) of heap space would be wasted.

For rectifying this, the substring method was modified.


Time Complexity:


The complexity of the substring() method in Java versions up to and including Java 6 is typically considered to be O(1), or constant time complexity.This is because the substring() method primarily involves creating a new String object, which involves copying only the necessary part of the character array from the original string. The length of the substring (endIndex - beginIndex) determines the number of characters that need to be copied. Since this length is independent of the size of the original string, the time taken to create the substring is constant, regardless of the length of the string.

However, it's worth noting that while the creation of the substring itself is O(1), the memory usage associated with substrings can become a concern in certain scenarios, as discussed earlier. This is due to the fact that substrings share the underlying character array of the original string, potentially leading to increased memory consumption and preventing garbage collection of unused portions of the original string's array.


Usecase:


1) Processing Text:Extracting a specific word from a sentence.

String sentence = "Java is a powerful programming language.";
String word = sentence.substring(0, 4); // Extract "Java"
System.out.println("Word: " + word);


2) Extraction of Important Data:Extracting a date from a string containing various details.

String details = "Date of Birth: 1990-05-15, Name: John Doe";
String dateOfBirth = details.substring(14, 24); // Assuming date format is YYYY-MM-DD
System.out.println("Date of Birth: " + dateOfBirth);


3) String Modification:Removing or replacing part of a string.

String sentence = "Thiss is a example of string modification.";
String correctedSentence = sentence.substring(0, 5) + " " + sentence.substring(7); // Correcting the typo
System.out.println("Corrected Sentence: " + correctedSentence);


4) Pattern Matching:Checking if a string follows a specific pattern.


String pattern = "123456";
if (pattern.substring(0).matches("[0-9]+")) {
System.out.println("Pattern matches: " + pattern);
} else {
System.out.println("Pattern does not match: " + pattern);
}


5) Validation:Checking if a piece of text follows a certain structure.


String email = "Techelliptica.education@gmail.com";
if (email.substring(0).matches("[a-z,A-Z,0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}")) {
System.out.println("Email is valid: " + email);
} else {
System.out.println("Email is invalid: " + email);
}



All Comments ()
Do You want to add Comment in this Blog? Please Login ?