fs.default.name specifies the default filesystem
fs.checkpoint.dir used by secondary namenode to store filesystem metadata during checkpoint operation
fs.trash.interval specifies the no of minutes the file will be available in the .Trash before final deletion
topology.script.file.name absolute path of the script to make cluster rack aware
hadoop.log.dir The directory in which log data should be written. This should be the same path as specified in HADOOP_LOG_DIR in the hadoop-env.sh file.
io.file.buffer.size (core-site.xml) general purpose buffer size to enhance read/write IO and network IO
dfs.block.size specifies default block size to store on HDFS
dfs.name.dir specifies a comma separated directories to store namenode metadata
dfs.data.dir list of directories where datanodes will store HDFS block data
dfs.datanode.du.reserved disk space reserved for the non HDFS use
dfs.namenode.handler.count count of worker thread to process RPC request by clients as well as other cluster deamon
dfs.datanode.failed.volumes.tolerated specifies the number of disks that are permitted to die before failing the entire datanode
dfs.hosts list of hostname or datanode that are allowed to communicate with the namenode.
dfs.host.exclude for decommisioning the datanode or to block the host to communicate with the namenode
dfs.permissions.supergroup specify group of user whose privileges equivalent to the super user
dfs.balance.bandwidthPerSec use by datanode to limit the bandwidth
mapred.job.tracker specifies the job tracker hostname and port
mapred.local.dir mapReduce job use the machine’s local disk to store their intermediate output to the specified directories
mapred.java.child.opts specifies the jvm heap properties like initial heap size,max heap size etc.
mapred.child.ulimit it a limit on how much virtual memory a process may consume before it is terminated.
mapred.tasktracker.map.tasks.maximum maximum no of map task can be supported by the workeer node in parallel
mapred.tasktracker.reduce.tasks.maximum maximum no of reduce task can be supported by the workeer node in parallel
io.sort.mb specifies the size of circular buffer to have intermediate key-value pair emitted by the mapper
io.sort.factor specifies the number of files/streams to merge at once
mapred.compress.map.output true/false depending on whether you want to compress the mapper emitted data
mapred.map.output.compression.codec specifies the codec that you want to use to compress the intermediate data
mapred.output.compression.type RECORD/BLOCK level compression
mapred.job.tracker.handler.count jobtracker maintains a pool of worker thread to handle RPC requests
mapred.jobtracker.taskScheduler The mapred.jobtracker.taskScheduler parameter specifies the Java class name of the scheduler plugin that should be used by the jobtracker
mapred.reduce.parallel.copies which controls the number of copies each reduce task initiates in parallel during the shuffle phase
mapred.reduce.tasks control the no of reduce tasks
tasktracker.http.threads no of threads avaiable to handle http request concurrently
mapred.reduce.slowstart.completed.maps indicates when to begin allocating reducers as a percentage of completed map tasks
mapred.acls.enabled Access control lists must be globally enabled prior to use
fs.checkpoint.dir used by secondary namenode to store filesystem metadata during checkpoint operation
fs.trash.interval specifies the no of minutes the file will be available in the .Trash before final deletion
topology.script.file.name absolute path of the script to make cluster rack aware
hadoop.log.dir The directory in which log data should be written. This should be the same path as specified in HADOOP_LOG_DIR in the hadoop-env.sh file.
io.file.buffer.size (core-site.xml) general purpose buffer size to enhance read/write IO and network IO
dfs.block.size specifies default block size to store on HDFS
dfs.name.dir specifies a comma separated directories to store namenode metadata
dfs.data.dir list of directories where datanodes will store HDFS block data
dfs.datanode.du.reserved disk space reserved for the non HDFS use
dfs.namenode.handler.count count of worker thread to process RPC request by clients as well as other cluster deamon
dfs.datanode.failed.volumes.tolerated specifies the number of disks that are permitted to die before failing the entire datanode
dfs.hosts list of hostname or datanode that are allowed to communicate with the namenode.
dfs.host.exclude for decommisioning the datanode or to block the host to communicate with the namenode
dfs.permissions.supergroup specify group of user whose privileges equivalent to the super user
dfs.balance.bandwidthPerSec use by datanode to limit the bandwidth
mapred.job.tracker specifies the job tracker hostname and port
mapred.local.dir mapReduce job use the machine’s local disk to store their intermediate output to the specified directories
mapred.java.child.opts specifies the jvm heap properties like initial heap size,max heap size etc.
mapred.child.ulimit it a limit on how much virtual memory a process may consume before it is terminated.
mapred.tasktracker.map.tasks.maximum maximum no of map task can be supported by the workeer node in parallel
mapred.tasktracker.reduce.tasks.maximum maximum no of reduce task can be supported by the workeer node in parallel
io.sort.mb specifies the size of circular buffer to have intermediate key-value pair emitted by the mapper
io.sort.factor specifies the number of files/streams to merge at once
mapred.compress.map.output true/false depending on whether you want to compress the mapper emitted data
mapred.map.output.compression.codec specifies the codec that you want to use to compress the intermediate data
mapred.output.compression.type RECORD/BLOCK level compression
mapred.job.tracker.handler.count jobtracker maintains a pool of worker thread to handle RPC requests
mapred.jobtracker.taskScheduler The mapred.jobtracker.taskScheduler parameter specifies the Java class name of the scheduler plugin that should be used by the jobtracker
mapred.reduce.parallel.copies which controls the number of copies each reduce task initiates in parallel during the shuffle phase
mapred.reduce.tasks control the no of reduce tasks
tasktracker.http.threads no of threads avaiable to handle http request concurrently
mapred.reduce.slowstart.completed.maps indicates when to begin allocating reducers as a percentage of completed map tasks
mapred.acls.enabled Access control lists must be globally enabled prior to use