gawk（Gawk的接口）_语法_示例_Unix&Linux命令

Gawk命令是Gawk的接口，Gawk是一种功能强大的模式匹配和处理语言。它是基于AWK语言的。

查看英文版

Unix&Linux

使用文本文件通常需要重复的任务。您可能要提取某些行，然后丢弃其余的行。或者，您可能需要在出现某些特定模式的地方进行更改，但要保留文件的其余部分。用诸如C，C ++或Java之类的语言为这些任务编写一次性使用的程序既费时又不方便。使用awk常常使此类工作更容易。在AWK实用解释的专用编程语言，可以很容易地处理简单的数据格式化工作。

awk的GNU实现称为gawk；如果您使用适当的选项或环境变量来调用它（请参阅选项），则它与awk语言的POSIX规范以及Brian Kernighan维护的Unix版本的awk完全兼容。

使用awk（或gawk）使您能够：

管理小型个人数据库
产生报告
验证数据
产生索引并执行其他文档准备任务
与实验结果的算法以后可以适应其他计算机语言

此外，gawk提供的设施使您可以轻松进行以下操作：

提取的比特和用于处理数据的片
排序数据
执行简单的网络通讯

查看英文版

gawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
gawk [ POSIX or GNU style options ] [ -- ] program-text file ...
pgawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
pgawk [ POSIX or GNU style options ] [ -- ] program-text file ...
dgawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...

格式选项

Gawk选项可以是传统的POSIX样式的一个字母选项，也可以是GNU样式的长选项。POSIX选项以单个“-”开头，而长选项以“--”开头。GNU特定特性和POSIX强制特性都提供了长选项。

特定于gawk的选项通常以长选项形式使用。长选项的参数可以通过=符号与选项连接，且中间没有空格，或者可以在下一个命令线性参数中提供它们。只要缩写保持唯一，就可以省略长选项。

此外，每个长选项都有对应的短选项，因此可以在＃！中使用该选项的功能。可执行脚本。

选项

-f program-file, --file program-file	从文件program-file而不是从第一个命令行参数读取AWK程序源。可以使用多个-f（或--file）选项。
-F fs，-- field-separator fs	将fs用作输入字段分隔符（FS预定义变量的值）。
-v var = val，-- assign var = val	在程序开始执行之前，将值val分配给变量var。这样的变量值可用于AWK程序的BEGIN块。
-b，-- characters-as-bytes	将所有输入数据视为单字节字符。换句话说，在尝试将字符串作为多字节字符处理时，不要特别注意语言环境信息。该--posix选项将覆盖此选项。
-c, --traditional	在兼容模式下运行。在兼容模式下，gawk的行为与UNIX awk相同。没有特定于GNU的扩展。有关更多信息，请参见下面的GNU扩展。
-C, --copyright	在标准输出上打印GNU版权信息消息的简短版本，然后成功退出。
-d[file], --dump-variables[=file]	将全局变量，它们的类型和最终值的排序列表打印到file。如果未提供文件，则gawk在当前目录中使用名为awkvars.out的文件。列出所有全局变量是在程序中查找印刷错误的好方法。如果您的大型程序具有许多功能，并且还希望确保您的函数不会无意中使用本应是局部变量的全局变量，那么您也可以使用此选项。使用诸如i，j等简单变量名称时，这是一个特别容易犯的错误。
-e program-text, --source program-text	使用program-text作为AWK程序源代码。通过此选项，可以轻松地将库函数（通过-f和--file选项使用）与在Shell脚本中使用的命令程序上输入的源代码混合在一起。
-E file, --exec file	与-f相似，但是，这是最后一个处理的选项。这应该与＃一起使用！脚本，尤其是CGI应用程序脚本，以避免在命令行中从URL传递选项或源代码（!!!）。此选项禁用命令行变量分配。
-g，-- gen-pot	扫描并解析AWK程序，并在标准输出上生成GNU .pot（便携式对象模板）格式文件，其中包含该程序中所有可本地化字符串的条目。程序本身未执行。有关.pot文件的更多信息，请参见GNU gettext分发。
-h，--help	在标准输出上打印可用选项的简短摘要。根据GNU编码标准，这些选项会导致立即成功退出。
-L [value], --lint[=value]	提供有关可疑或不可移植到其他AWK实现的构造的警告。通过使用致命的可选参数，棉绒警告变为致命错误。这可能很激烈，但是使用它肯定会鼓励开发更清洁的AWK程序。使用无效的可选参数时，仅发出有关实际上无效的警告。注意：这尚未完全实现。
-n, --non-decimal-data	识别输入数据中的八进制和十六进制值。请谨慎使用此选项！
-N，-- use-lc-numeric	这迫使gawk在解析输入数据时使用语言环境的小数点字符。尽管POSIX标准要求此行为，并且当--posix生效时gawk会要求这样做，但默认情况是遵循传统行为，并使用句点作为小数点，即使在句点不是小数点字符的语言环境中也是如此。此选项将覆盖默认行为，而没有--posix选项的严格要求。
-O, --optimize	对程序的内部表示启用优化。当前，这仅包括简单的恒定折叠。该GAWK维护者希望随着时间的推移添加额外的优化。
-p [ prof_file ]，-- profile[ = prof_file ]	将分析数据发送到prof_file。默认值为awkprof.out。与gawk一起运行时，配置文件只是程序的“漂亮打印”版本。与pgawk一起运行时，配置文件在左边距中包含程序中每个语句的执行计数，以及每个用户定义函数的函数调用计数。
-P，-- posix	这将打开兼容模式，并具有以下其他限制： \ x无法识别转义序列 FS设置为单个空格时，只有空格和制表符充当字段分隔符，而换行符则不之后不能继续行吗？和：无法识别关键字函数的同义词func 不能使用运算符和 =代替^和^ = 该fflush（）函数不可用
-r，-- re-interval	在正则表达式匹配中启用间隔表达式的使用。传统上，间隔表达式在AWK语言中不可用。POSIX标准添加了它们，以使awk和egrep彼此一致。默认情况下启用它们，但是此选项仍可与--traditional一起使用。
-R, --command file	仅Dgawk。从文件中读取存储的调试器命令。
-S, --sandbox	在沙盒模式下运行gawk，禁用system（）函数，使用getline输入重定向，使用print和printf输出重定向，以及加载动态扩展。命令执行（通过管道）也被禁用。这有效地阻止了脚本访问本地资源（命令行上指定的文件除外）。
-t，-- lint-old	提供有关无法移植到Unix awk原始版本的构造的警告。
-V，-- version	在标准输出上打印该gawk特定副本的版本信息。这对于了解免费软件基金会正在分发的gawk的当前副本是否是最新的主要有用。报告错误时，这也很有用。根据GNU编码标准，这些选项会导致立即成功退出。
--	发出选项结束的信号。这对于允许AWK程序本身的其他参数以“ -” 开头很有用。这提供了与大多数其他POSIX程序使用的参数解析约定的一致性。

在兼容模式下，所有其他选项均标记为无效，否则将被忽略。在正常操作中，只要提供了程序文本，未知的选项就会传递给ARGV阵列中的AWK程序进行处理。这对于通过“ ＃！ ”可执行解释器机制运行AWK程序特别有用。

AWK程序执行

AWK程序由一系列模式操作语句和可选函数定义组成。

@include "file name" pattern { action statements }
function name(parameter list) { statements }

GAWK第一读取来自节目源的程序文件，如果指定的（一个或多个），从参数--source，或者从第一非选项参数在命令行上。的-f和--source选项可以在命令行上多次使用。gawk读取程序文本，就好像所有程序文件和命令行源文本都已连接在一起一样。这对于构建AWK函数库很有用，而不必在使用它们的每个新AWK程序中都包含它们。它还提供了将库功能与命令行程序混合的功能。

同样，以@include开头的行可以用于将其他源文件包含到您的程序中，从而使库的使用更加容易。

在环境变量 AWKPATH指定发现与命名的源文件时使用的搜索路径-f选项。如果此变量不存在，则默认路径为“ 。：/ usr / local / share / awk ”。实际目录可能会有所不同，具体取决于gawk的构建和安装方式。如果为-f选项提供的文件名包含“ / ”字符，则不执行路径搜索。

gawk按以下顺序执行AWK程序。首先，执行通过-v选项指定的所有变量分配。接下来，gawk将程序编译为内部形式。然后，gawk执行BEGIN块中的代码（如果有的话），然后继续读取ARGV 数组中命名的每个文件（最多ARGV [ARGC]）。如果在命令行上没有命名的文件，gawk将读取标准输入。

如果命令行上的文件名格式为var = val，则将其视为变量分配。将为变量var赋值val。在运行任何BEGIN块之后，就会发生这种情况。命令行变量分配对于将值动态分配给AWK用来控制如何将输入分为字段和记录的变量最有用。如果在单个数据文件上需要多次通过，则对于控制状态也很有用。

如果ARGV的特定元素的值为空（“”），则gawk将跳过它。

对于每个输入文件，如果存在BEGINFILE规则，则gawk将在处理文件内容之前执行关联的代码。同样，gawk在处理文件后执行与ENDFILE关联的代码。

对于输入中的每个记录，gawk进行测试以查看其是否与AWK程序中的任何模式匹配。对于记录匹配的每个模式，将执行关联的操作。模式按照它们在程序中出现的顺序进行测试。

最后，在所有输入都用完之后，gawk将执行END块中的代码（如果有）。

gawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
gawk [ POSIX or GNU style options ] [ -- ] program-text file ...
pgawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
pgawk [ POSIX or GNU style options ] [ -- ] program-text file ...
dgawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...

Option Format

gawk options may be either traditional POSIX-style one letter options, or GNU-style long options. POSIX options start with a single "-", while long options start with "--". Long options are provided for both GNU-specific features and for POSIX-mandated features.

gawk-specific options are typically used in long-option form. Arguments to long options are either joined with the option by an = sign, with no intervening spaces, or they may be provided in the next command lineargument. Long options may be abbreviated, as long as the abbreviation remains unique.

Additionally, each long option has a corresponding short option, so that the option's functionality may be used from within #! executable scripts.

Options

-f program-file, --file program-file	Read the AWK program source from the file program-file, instead of from the first command line argument. Multiple -f (or --file) options may be used.
-F fs, --field-separator fs	Use fs for the input field separator (the value of the FSpredefined variable).
-v var=val, --assign var=val	Assign the value val to the variable var, before execution of the program begins. Such variable values are available to the BEGIN block of an AWK program.
-b, --characters-as-bytes	Treat all input data as single-byte characters. In other words, don't pay any attention to the locale information when attempting to process strings as multibyte characters. The --posix option overrides this option.
-c, --traditional	Run in compatibility mode. In compatibility mode, gawk behaves identically to UNIX awk; none of the GNU-specific extensions are recognized. See GNU EXTENSIONS, below, for more information.
-C, --copyright	Print the short version of the GNU Copyright information message on the standard output and exit successfully.
-d[file], --dump-variables[=file]	Print a sorted list of global variables, their types and final values to file. If no file is provided, gawk uses a file named awkvars.out in the current directory. Having a list of all the global variables is a good way to look for typographical errors in your programs. You would also use this option if you have a large program with a lot of functions, and you want to be sure that your functions don't inadvertently use global variables that you meant to be local. This is a particularly easy mistake to make with simple variable names like i, j, and so on.
-e program-text, --source program-text	Use program-text as AWK program source code. This option allows the easy intermixing of library functions (used via the -f and --file options) with source code entered on the command programs used in shell scripts.
-E file, --exec file	Similar to -f, however, this is option is the last one processed. This should be used with #! scripts, particularly for CGI applications, to avoid passing in options or source code (!!!) on the command line from a URL. This option disables command-line variable assignments.
-g, --gen-pot	Scan and parse the AWK program, and generate a GNU .pot (Portable Object Template) format file on standard output with entries for all localizable strings in the program. The program itself is not executed. See the GNU gettext distribution for more information on .pot files.
-h, --help	Print a relatively short summary of the available options on the standard output. Per the GNU Coding Standards, these options cause an immediate, successful exit.
-L [value], --lint[=value]	Provide warnings about constructs that are dubious or non-portable to other AWK implementations. With an optional argument of fatal, lint warnings become fatal errors. This may be drastic, but its use will certainly encourage the development of cleaner AWK programs. With an optional argument of invalid, only warnings about things that are actually invalid are issued. Note: This is not fully implemented yet.
-n, --non-decimal-data	Recognize octal and hexadecimal values in input data. Use this option with great caution!
-N, --use-lc-numeric	This forces gawk to use the locale's decimal point character when parsing input data. Although the POSIX standard requires this behavior, and gawk does so when --posix is in effect, the default is to follow traditional behavior and use a period as the decimal point, even in locales where the period is not the decimal point character. This option overrides the default behavior, without the full draconian strictness of the --posix option.
-O, --optimize	Enable optimizations upon the internal representation of the program. Currently, this includes just simple constant-folding. The gawk maintainer hopes to add additional optimizations over time.
-p[prof_file], --profile[=prof_file]	Send profiling data to prof_file. The default is awkprof.out. When run with gawk, the profile is just a "pretty printed" version of the program. When run with pgawk, the profile contains execution counts of each statement in the program in the left margin and function call counts for each user-defined function.
-P, --posix	This turns on compatibility mode, with the following additional restrictions: \x escape sequences are not recognized Only space and tab act as field separators when FS is set to a single space, newline does not You cannot continue lines after ? and : The synonym func for the keyword function is not recognized The operators and = cannot be used in place of ^and ^= The fflush() function is not available
-r, --re-interval	Enable the use of interval expressions in regular expression matching. Interval expressions were not traditionally available in the AWK language. The POSIX standard added them, to make awk and egrep consistent with each other. They are enabled by default, but this option remains for use with --traditional.
-R, --command file	Dgawk only. Read stored debugger commands from file.
-S, --sandbox	Runs gawk in sandbox mode, disabling the system()function, input redirection with getline, output redirection with print and printf, and loading dynamic extensions. Command execution (through pipelines) is also disabled. This effectively blocks a script from accessing local resources (except for the files specified on the command line).
-t, --lint-old	Provide warnings about constructs that are not portable to the original version of Unix awk.
-V, --version	Print version information for this particular copy of gawk on the standard output. This is useful mainly for knowing if the current copy of gawk on your system is up to date with respect to whatever the Free Software Foundation is distributing. This is also useful when reporting bugs. Per the GNU Coding Standards, these options cause an immediate, successful exit.
--	Signal the end of options. This is useful to allow further arguments to the AWK program itself to start with a "-". This provides consistency with the argument parsing convention used by most other POSIX programs.

In compatibility mode, any other options are flagged as invalid, but are otherwise ignored. In normal operation, as long as program text has been supplied, unknown options are passed on to the AWK program in the ARGV array for processing. This is particularly useful for running AWK programs via the "#!" executable interpreter mechanism.

AWK Program Execution

An AWK program consists of a sequence of pattern-action statements and optional function definitions.

@include "file name" pattern { action statements }
function name(parameter list) { statements }

gawk first reads the program source from the program-file(s) if specified, from arguments to --source, or from the first non-option argument on the command line. The -f and --source options may be used multiple times on the command line. gawk reads the program text as if all the program files and command line source texts had been concatenated. This is useful for building libraries of AWK functions, without having to include them in each new AWK program that uses them. It also provides the ability to mix library functions with command line programs.

Also, lines beginning with @include may be used to include other source files into your program, making library use even easier.

The environment variable AWKPATH specifies a search path to use when finding source files named with the -f option. If this variable does not exist, the default path is ".:/usr/local/share/awk". The actual directory may vary, depending upon how gawk was built and installed. If a file name given to the -f option contains a "/" character, no path search is performed.

gawk executes AWK programs in the following order. First, all variable assignments specified via the -v option are performed. Next, gawkcompiles the program into an internal form. Then, gawk executes the code in the BEGIN block(s) (if any), and then proceeds to read each file named in the ARGV array (up to ARGV[ARGC]). If there are no files named on the command line, gawk reads the standard input.

If a file name on the command line has the form var=val it is treated as a variable assignment. The variable var will be assigned the value val. This happens after any BEGIN block(s) have been run. Command line variable assignment is most useful for dynamically assigning values to the variables AWK uses to control how input is broken into fields and records. It is also useful for controlling state if multiple passes are needed over a single data file.

If the value of a particular element of ARGV is empty (""), gawk skips over it.

For each input file, if a BEGINFILE rule exists, gawk executes the associated code before processing the contents of the file. Similarly, gawk executes the code associated with ENDFILE after processing the file.

For each record in the input, gawk tests to see if it matches any pattern in the AWK program. For each pattern that the record matches, the associated action is executed. The patterns are tested in the order they occur in the program.

Finally, after all the input is exhausted, gawk executes the code in the END block(s) (if any).

Official gawk User's Guide

If you want to learn more about this incredibly powerful language, check out the GNU gawk User Guide.

查看英文版

gawk '{ num_fields = num_fields + NF }
END { print num_fields }'

打印所有输入行中的字段总数。

gawk 'length($0) > 80'

打印超过80个字符的每行。唯一的规则有一个关系表达式作为其模式，并且没有操作(因此使用默认操作，打印记录)。

ls -l files | awk '{ x += $4 } ; END { print "total bytes: " x }'

打印文件使用的总字节数。

gawk '{ num_fields = num_fields + NF }
END { print num_fields }'

Print the total number of fields in all input lines.

gawk 'length($0) > 80'

Prints every line longer than 80 characters. The sole rule has a relational expression as its pattern, and has no action (so the default action, printing the record, is used).

ls -l files | awk '{ x += $4 } ; END { print "total bytes: " x }'

Prints the total number of bytes used by files.

查看英文版

gawk (Gawk的接口)

gawk 运行系统环境

gawk 描述

gawk 语法

gawk 示例

其他命令行