Java multi-threading for text file processing [duplicate]
This question already has an answer here:
How can I pass a parameter to a Java Thread?
18 answers
I have a java program that reads and iterates through each text file in a directory, makes a word index (word: which pages it appears on), and prints the output for each file into an output directory. I would like to convert this to a program that utilizes multi-threading for each file (start a new thread for each file). I am pretty new to Java and completely new to multithreading in Java. The input is:
java Index inputFolder outputFolder pageLength
Here is my working code without multi-threading:
import java.io.File;
import java.io.IOException;
import java.util.Map;
import java.util.Scanner;
import java.util.TreeMap;
import java.io.PrintStream;
public class Index {
public static void main(String args) {
long startTime = System.nanoTime();
PrintStream stdout = System.out;
try {
File folder = new File(args[0]);
File files = folder.listFiles();
for (File file : files) {
String name = file.getName();
int pos = name.lastIndexOf(".");
if (pos > 0) {
name = name.substring(0, pos);
}
Scanner sc;
sc = new Scanner(file);
Map<String, String> wordCount = new TreeMap<String, String>();
int count = 0;
while(sc.hasNext()) {
String word = sc.next();
word = word.trim().toLowerCase();
int len = word.length();
count = (int) count + len;
int pageNumber = (int) Math.ceil(count / Float.valueOf(args[2]));
if(!wordCount.containsKey(word))
wordCount.put(word, Integer.toString(pageNumber));
else
wordCount.put(word, wordCount.get(word) + ", " + Integer.toString(pageNumber));
}
// show results
sc.close();
PrintStream outputFile = new PrintStream(args[1]+"/"+name+"_output.txt");
System.setOut(outputFile);
for(String word : wordCount.keySet())
System.out.println(word + " " + wordCount.get(word));
}
}
catch(IOException e) {
System.out.println("Unable to read from file.");
}
long endTime = System.nanoTime();
long totalTime = endTime - startTime;
System.setOut(stdout);
System.out.println(totalTime / 1000000);
}
}
To reiterate, I would like to adapt this so that each file iteration starts a new thread.
java multithreading
marked as duplicate by Andreas
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 19 '18 at 17:38
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
How can I pass a parameter to a Java Thread?
18 answers
I have a java program that reads and iterates through each text file in a directory, makes a word index (word: which pages it appears on), and prints the output for each file into an output directory. I would like to convert this to a program that utilizes multi-threading for each file (start a new thread for each file). I am pretty new to Java and completely new to multithreading in Java. The input is:
java Index inputFolder outputFolder pageLength
Here is my working code without multi-threading:
import java.io.File;
import java.io.IOException;
import java.util.Map;
import java.util.Scanner;
import java.util.TreeMap;
import java.io.PrintStream;
public class Index {
public static void main(String args) {
long startTime = System.nanoTime();
PrintStream stdout = System.out;
try {
File folder = new File(args[0]);
File files = folder.listFiles();
for (File file : files) {
String name = file.getName();
int pos = name.lastIndexOf(".");
if (pos > 0) {
name = name.substring(0, pos);
}
Scanner sc;
sc = new Scanner(file);
Map<String, String> wordCount = new TreeMap<String, String>();
int count = 0;
while(sc.hasNext()) {
String word = sc.next();
word = word.trim().toLowerCase();
int len = word.length();
count = (int) count + len;
int pageNumber = (int) Math.ceil(count / Float.valueOf(args[2]));
if(!wordCount.containsKey(word))
wordCount.put(word, Integer.toString(pageNumber));
else
wordCount.put(word, wordCount.get(word) + ", " + Integer.toString(pageNumber));
}
// show results
sc.close();
PrintStream outputFile = new PrintStream(args[1]+"/"+name+"_output.txt");
System.setOut(outputFile);
for(String word : wordCount.keySet())
System.out.println(word + " " + wordCount.get(word));
}
}
catch(IOException e) {
System.out.println("Unable to read from file.");
}
long endTime = System.nanoTime();
long totalTime = endTime - startTime;
System.setOut(stdout);
System.out.println(totalTime / 1000000);
}
}
To reiterate, I would like to adapt this so that each file iteration starts a new thread.
java multithreading
marked as duplicate by Andreas
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 19 '18 at 17:38
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
So in your research into how to code with multiple threads, you couldn't find a single example of how to start threads or use thread pools? --- idownvotedbecau.se/noresearch
– Andreas
Nov 19 '18 at 17:20
I did, but they are all too simple (print name of thread, etc... ) for me to figure out how to apply multithreading to the problem at hand. I know I need a class that implements Runnable, and a public void run() for the processing, but I'm stumped how to connect it all together so that it works in this context.
– Daniel Gizzi
Nov 19 '18 at 17:28
add a comment |
This question already has an answer here:
How can I pass a parameter to a Java Thread?
18 answers
I have a java program that reads and iterates through each text file in a directory, makes a word index (word: which pages it appears on), and prints the output for each file into an output directory. I would like to convert this to a program that utilizes multi-threading for each file (start a new thread for each file). I am pretty new to Java and completely new to multithreading in Java. The input is:
java Index inputFolder outputFolder pageLength
Here is my working code without multi-threading:
import java.io.File;
import java.io.IOException;
import java.util.Map;
import java.util.Scanner;
import java.util.TreeMap;
import java.io.PrintStream;
public class Index {
public static void main(String args) {
long startTime = System.nanoTime();
PrintStream stdout = System.out;
try {
File folder = new File(args[0]);
File files = folder.listFiles();
for (File file : files) {
String name = file.getName();
int pos = name.lastIndexOf(".");
if (pos > 0) {
name = name.substring(0, pos);
}
Scanner sc;
sc = new Scanner(file);
Map<String, String> wordCount = new TreeMap<String, String>();
int count = 0;
while(sc.hasNext()) {
String word = sc.next();
word = word.trim().toLowerCase();
int len = word.length();
count = (int) count + len;
int pageNumber = (int) Math.ceil(count / Float.valueOf(args[2]));
if(!wordCount.containsKey(word))
wordCount.put(word, Integer.toString(pageNumber));
else
wordCount.put(word, wordCount.get(word) + ", " + Integer.toString(pageNumber));
}
// show results
sc.close();
PrintStream outputFile = new PrintStream(args[1]+"/"+name+"_output.txt");
System.setOut(outputFile);
for(String word : wordCount.keySet())
System.out.println(word + " " + wordCount.get(word));
}
}
catch(IOException e) {
System.out.println("Unable to read from file.");
}
long endTime = System.nanoTime();
long totalTime = endTime - startTime;
System.setOut(stdout);
System.out.println(totalTime / 1000000);
}
}
To reiterate, I would like to adapt this so that each file iteration starts a new thread.
java multithreading
This question already has an answer here:
How can I pass a parameter to a Java Thread?
18 answers
I have a java program that reads and iterates through each text file in a directory, makes a word index (word: which pages it appears on), and prints the output for each file into an output directory. I would like to convert this to a program that utilizes multi-threading for each file (start a new thread for each file). I am pretty new to Java and completely new to multithreading in Java. The input is:
java Index inputFolder outputFolder pageLength
Here is my working code without multi-threading:
import java.io.File;
import java.io.IOException;
import java.util.Map;
import java.util.Scanner;
import java.util.TreeMap;
import java.io.PrintStream;
public class Index {
public static void main(String args) {
long startTime = System.nanoTime();
PrintStream stdout = System.out;
try {
File folder = new File(args[0]);
File files = folder.listFiles();
for (File file : files) {
String name = file.getName();
int pos = name.lastIndexOf(".");
if (pos > 0) {
name = name.substring(0, pos);
}
Scanner sc;
sc = new Scanner(file);
Map<String, String> wordCount = new TreeMap<String, String>();
int count = 0;
while(sc.hasNext()) {
String word = sc.next();
word = word.trim().toLowerCase();
int len = word.length();
count = (int) count + len;
int pageNumber = (int) Math.ceil(count / Float.valueOf(args[2]));
if(!wordCount.containsKey(word))
wordCount.put(word, Integer.toString(pageNumber));
else
wordCount.put(word, wordCount.get(word) + ", " + Integer.toString(pageNumber));
}
// show results
sc.close();
PrintStream outputFile = new PrintStream(args[1]+"/"+name+"_output.txt");
System.setOut(outputFile);
for(String word : wordCount.keySet())
System.out.println(word + " " + wordCount.get(word));
}
}
catch(IOException e) {
System.out.println("Unable to read from file.");
}
long endTime = System.nanoTime();
long totalTime = endTime - startTime;
System.setOut(stdout);
System.out.println(totalTime / 1000000);
}
}
To reiterate, I would like to adapt this so that each file iteration starts a new thread.
This question already has an answer here:
How can I pass a parameter to a Java Thread?
18 answers
java multithreading
java multithreading
asked Nov 19 '18 at 17:14
Daniel GizziDaniel Gizzi
31
31
marked as duplicate by Andreas
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 19 '18 at 17:38
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by Andreas
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 19 '18 at 17:38
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
So in your research into how to code with multiple threads, you couldn't find a single example of how to start threads or use thread pools? --- idownvotedbecau.se/noresearch
– Andreas
Nov 19 '18 at 17:20
I did, but they are all too simple (print name of thread, etc... ) for me to figure out how to apply multithreading to the problem at hand. I know I need a class that implements Runnable, and a public void run() for the processing, but I'm stumped how to connect it all together so that it works in this context.
– Daniel Gizzi
Nov 19 '18 at 17:28
add a comment |
So in your research into how to code with multiple threads, you couldn't find a single example of how to start threads or use thread pools? --- idownvotedbecau.se/noresearch
– Andreas
Nov 19 '18 at 17:20
I did, but they are all too simple (print name of thread, etc... ) for me to figure out how to apply multithreading to the problem at hand. I know I need a class that implements Runnable, and a public void run() for the processing, but I'm stumped how to connect it all together so that it works in this context.
– Daniel Gizzi
Nov 19 '18 at 17:28
So in your research into how to code with multiple threads, you couldn't find a single example of how to start threads or use thread pools? --- idownvotedbecau.se/noresearch
– Andreas
Nov 19 '18 at 17:20
So in your research into how to code with multiple threads, you couldn't find a single example of how to start threads or use thread pools? --- idownvotedbecau.se/noresearch
– Andreas
Nov 19 '18 at 17:20
I did, but they are all too simple (print name of thread, etc... ) for me to figure out how to apply multithreading to the problem at hand. I know I need a class that implements Runnable, and a public void run() for the processing, but I'm stumped how to connect it all together so that it works in this context.
– Daniel Gizzi
Nov 19 '18 at 17:28
I did, but they are all too simple (print name of thread, etc... ) for me to figure out how to apply multithreading to the problem at hand. I know I need a class that implements Runnable, and a public void run() for the processing, but I'm stumped how to connect it all together so that it works in this context.
– Daniel Gizzi
Nov 19 '18 at 17:28
add a comment |
1 Answer
1
active
oldest
votes
If you're using Java 1.8+ you could use the streams
API.
.parallelStream()
will execute the tasks in parallel, assigning a thread to each task.
You'll need a List to invoke the streams API
List<File> files = new ArrayList<>(); //initialization
//populate list here
files.parallelStream()
.forEach(x->{
//logic goes here
});
Example Repl.it
Documentation about paralellism
thanks, this is much simpler than what I was originally trying to do
– Daniel Gizzi
Nov 19 '18 at 18:27
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
If you're using Java 1.8+ you could use the streams
API.
.parallelStream()
will execute the tasks in parallel, assigning a thread to each task.
You'll need a List to invoke the streams API
List<File> files = new ArrayList<>(); //initialization
//populate list here
files.parallelStream()
.forEach(x->{
//logic goes here
});
Example Repl.it
Documentation about paralellism
thanks, this is much simpler than what I was originally trying to do
– Daniel Gizzi
Nov 19 '18 at 18:27
add a comment |
If you're using Java 1.8+ you could use the streams
API.
.parallelStream()
will execute the tasks in parallel, assigning a thread to each task.
You'll need a List to invoke the streams API
List<File> files = new ArrayList<>(); //initialization
//populate list here
files.parallelStream()
.forEach(x->{
//logic goes here
});
Example Repl.it
Documentation about paralellism
thanks, this is much simpler than what I was originally trying to do
– Daniel Gizzi
Nov 19 '18 at 18:27
add a comment |
If you're using Java 1.8+ you could use the streams
API.
.parallelStream()
will execute the tasks in parallel, assigning a thread to each task.
You'll need a List to invoke the streams API
List<File> files = new ArrayList<>(); //initialization
//populate list here
files.parallelStream()
.forEach(x->{
//logic goes here
});
Example Repl.it
Documentation about paralellism
If you're using Java 1.8+ you could use the streams
API.
.parallelStream()
will execute the tasks in parallel, assigning a thread to each task.
You'll need a List to invoke the streams API
List<File> files = new ArrayList<>(); //initialization
//populate list here
files.parallelStream()
.forEach(x->{
//logic goes here
});
Example Repl.it
Documentation about paralellism
answered Nov 19 '18 at 17:28
CheloideCheloide
659418
659418
thanks, this is much simpler than what I was originally trying to do
– Daniel Gizzi
Nov 19 '18 at 18:27
add a comment |
thanks, this is much simpler than what I was originally trying to do
– Daniel Gizzi
Nov 19 '18 at 18:27
thanks, this is much simpler than what I was originally trying to do
– Daniel Gizzi
Nov 19 '18 at 18:27
thanks, this is much simpler than what I was originally trying to do
– Daniel Gizzi
Nov 19 '18 at 18:27
add a comment |
So in your research into how to code with multiple threads, you couldn't find a single example of how to start threads or use thread pools? --- idownvotedbecau.se/noresearch
– Andreas
Nov 19 '18 at 17:20
I did, but they are all too simple (print name of thread, etc... ) for me to figure out how to apply multithreading to the problem at hand. I know I need a class that implements Runnable, and a public void run() for the processing, but I'm stumped how to connect it all together so that it works in this context.
– Daniel Gizzi
Nov 19 '18 at 17:28